« Site watch: New Age and SMH online designs - much to like, but work to be done | Main | Web Blandards »
November 05, 2005
Semantics in the wild
Regular readers will I hope by now have guessed that I am less of a fan of "handwaving" and more of research when it comes to reasoning about and understanding what developers actually do.
For a long time now, I've been very interested in how developers use semantics in their pages - not just the semantics provided by HTML, like headings, but also the semantics developers create through the use of class and id values.
Afterall, for a long time developers have been using classes and ids to create informal semantics (as well as to provide "hooks" for CSS).
So, how are they actually doing this? In particular, I am interested to know whether common semantics emerge from what developers are doing in their code. Are there a set of frequently semantic class and id values we can "standardize" on (by convention)? Afterall, there are many common patterns (multi-column layouts, headers and footers, one, two and more order navigation systems, and many others) which sites feature over and over. You might think that perhaps by osmosis, conscious or unconscious emulation, or even morphic resonance, developers might have started using the same or similar class and id values for the same patterns.
So in my new found phase of experimentation, I wrote a very simple search engine. What it does is takes a URL, grabs all the class and id values for that page, then gets all the external links in that page, and visits these, and recursively does the same thing. I also tried to make sure that it did not visit any given domain (even where the server was different) more than once.
Thus equipped with this magnificent vessel of exploration (the equal at least of the Nina, Pinta or Santa Maria) I set sail for places unknown from my port of embarkation, stopdesign.com. I chose that domain because Douglas Bowman and I had discussed this whole issue in great detail last year while in New Zealand, and because I felt that at least in resources close to stopdesign I'd find sites that were developed by people with similar sensibilities.
So on this maiden voyage, what did I discover?
The crawl visited 1315 sites. In these sites it found 5374 different class values and 4789 different id values. So, what exactly emerged from all these values?
First that the noise to signal ratio is high - table 1 and 2 below, with their measure of how many id and class values appear a single time (88% of all id values, and 80% of all class values), as opposed to even 5 or more times (ids less than 1%, class about 5%) demonstrate that there is very little agreement as to specific class and id values for semantics.
Secondly, the class and id values reflect a very presentational way of viewing the contents of a page - with values including the words top, left, right, bottom, color names, sizes like small, words like spacer, style, all indicating that for most developers their informal "semantics" is highly presentational.
A very unscientific survey of values which appear 5 or more times finds that presentational class and ids values are at least as likely as non presentational ones, let alone strictly semantic one. People also appear frequently to be creating class and id values which are either identical to, or clearly very close to existing HTML elements, like menu, submit, heading, and even body. A good many of the values appear by and large to be meaningless, probably created by applications. The number 1 id value afterall is "D27CDB6E-AE6D-11cf-96B8-444553540000"
But why don't I throw this over to you now? Much like Dan Cederholm's simplequiz - based on the data here, and also your own experiences, what class and id values are (or at least should be) commonly used. In essence, what's missing from HTML/XHTML in terms of semantics - not in theory, but in the practical sense of the kinds of constructs developers use all the time.
Some pieces that that come straight to mind, based on this survey, include
- Navigation - this appears under many guises, as both id and class values, including navbar, nav, navigation (the #1 class value) and others. There are a number of different kinds of navigation, including site level navigation, page level navigation and doubtless others as well. Where these physically appear on a page should not be important (top, left and so on are presentational, not semantic)
- breadcrumb trails
- page header (is h1 enough for this, to revisit a SimpleQuiz?) In the sites surveyed this appears as all kinds of things like masthead, title, pagehead, header, logo
- page footer (is this really semantic, or just a presentational chunk? Is there a more semantic name like "fine print". Yes I know that the term fine print is ironic. ) "footer" appears as a class value 73 times, fourth highest, and as the second highest id value, with 89 occurrences, making it overall the winner when combining class and id values.
- Main content for a page. This appears quite frequently, in one guise or another.
- Author - would the HTML address element be better for this?
I could go on for some time I am sure, but I am much more interested in other people's take on this, based both on your experiences, and the data at hand. Fire away with your suggestions.
Table 1
Frequency id value appears | count |
---|---|
1 | 4203 |
2 | 302 |
3 | 220 |
4 | 41 |
5 | 26 |
6 | 13 |
7 | 17 |
8 | 8 |
9 | 25 |
10 | 5 |
11-20 | 15 |
21-30 | 6 |
31-40 | 2 |
41-50 | 1 |
51-60 | 5 |
61-70 | 3 |
71-80 | 0 |
81-90 | 1 |
91-100 | 1 |
100+ | 0 |
Table 2
Frequency class value appears | count |
---|---|
1 | 4325 |
2 | 507 |
3 | 174 |
4 | 108 |
5 | 55 |
6 | 29 |
7 | 16 |
8 | 22 |
9 | 34 |
10 | 12 |
11-20 | 43 |
21-30 | 11 |
31-40 | 4 |
41-50 | 9 |
51-60 | 5 |
61-70 | 4 |
71-80 | 4 |
81-90 | 0 |
91-100 | 0 |
100+ | 2 |
Table 3
Value | Count |
---|---|
clsidD27CDB6E-AE6D-11cf-96B8-444553540000 | 90 |
footer | 89 |
BTAMarker | 65 |
header | 63 |
content | 63 |
areaTitle | 57 |
layout | 52 |
noBulletContent | 50 |
sqBulletContent | 50 |
logo | 50 |
search | 42 |
main | 33 |
banner | 30 |
container | 29 |
Top | 26 |
sidebar | 26 |
nav | 23 |
wrapper | 23 |
+ | 23 |
left | 19 |
Table1 | 18 |
searchform | 17 |
right | 17 |
menu | 17 |
home | 16 |
Layer1 | 16 |
searchbox | 15 |
Copyright | 14 |
page | 13 |
topMenu | 12 |
AutoNumber1 | 12 |
Layer2 | 11 |
bottom | 11 |
submit | 11 |
Table2 | 11 |
Body | 10 |
Image1 | 10 |
news | 10 |
toptabs | 10 |
breadcrumbs | 10 |
ebGallery | 9 |
links | 9 |
menuitem | 9 |
topNav | 9 |
s | 9 |
Table3 | 9 |
id-10 | 9 |
id-11 | 9 |
all-m | 9 |
top-m | 9 |
id-8 | 9 |
visual | 9 |
l-m | 9 |
id-7 | 9 |
id-6 | 9 |
id-height | 9 |
id-b-1 | 9 |
id-b-2 | 9 |
id-b-3 | 9 |
id-b-4 | 9 |
id-b-5 | 9 |
id-9 | 9 |
Masthead | 9 |
q | 9 |
navBar | 9 |
globalNav | 9 |
Table_01 | 8 |
Image2 | 8 |
topmain | 8 |
topmenu-b | 8 |
gmo_foot | 8 |
gmo_logo | 8 |
main2 | 8 |
wrap | 8 |
orphus | 8 |
gmo_link | 7 |
Title | 7 |
Image4 | 7 |
password | 7 |
AutoNumber5 | 7 |
gmo_img | 7 |
gmo_ul | 7 |
tophban | 7 |
Table4 | 7 |
gmo_service | 7 |
current | 7 |
AutoNumber2 | 7 |
navigation | 7 |
podlogo | 7 |
binfo | 7 |
banbox | 7 |
7 | |
Form1 | 6 |
query | 6 |
Image5 | 6 |
searchsubmit | 6 |
about | 6 |
Image3 | 6 |
leftcol | 6 |
AutoNumber3 | 6 |
maincol | 6 |
jobs | 6 |
AutoNumber4 | 6 |
login | 6 |
gmo_copy | 6 |
pas-m | 6 |
contact | 5 |
_x0000_i1025 | 5 |
site-switch | 5 |
subhead | 5 |
navright | 5 |
intro | 5 |
inseguitore | 5 |
brdbott | 5 |
ROOF | 5 |
mainnav | 5 |
WTL_TAG | 5 |
udm | 5 |
Image6 | 5 |
name | 5 |
location | 5 |
headernav | 5 |
Company | 5 |
Products | 5 |
rightcol | 5 |
newsletter | 5 |
press | 5 |
navcontainer | 5 |
leftmenu | 5 |
CONTENTBODY | 5 |
calendar | 5 |
Table5 | 5 |
Table 4
Value | Count |
---|---|
navigation | 131 |
title | 113 |
help | 80 |
footer | 73 |
content | 71 |
pipe | 71 |
noMargin | 69 |
prefill | 68 |
ebay | 68 |
novisited | 68 |
button | 60 |
standard | 58 |
section | 58 |
Menu | 57 |
text | 55 |
searchFormSection | 50 |
rightAnchor | 50 |
catTable | 50 |
metaCat | 50 |
seeAllLink | 50 |
seeAllBullet | 50 |
adSpacer | 50 |
boxSection | 50 |
nav | 43 |
ocDCP | 42 |
white | 39 |
Date | 39 |
body | 37 |
module | 31 |
CIPpromo | 30 |
small | 29 |
main | 28 |
header | 28 |
copyright | 26 |
tiny | 26 |
hdr | 24 |
clear | 23 |
headline | 23 |
link | 22 |
search | 22 |
links | 22 |
v | 20 |
style1 | 20 |
v1 | 19 |
topMenu | 19 |
left | 19 |
more | 19 |
smalltext | 18 |
prnav | 18 |
prred | 18 |
copy | 18 |
box | 18 |
hide | 18 |
trans | 17 |
news | 17 |
logo | 17 |
top | 16 |
right | 16 |
center | 16 |
item | 16 |
entry | 15 |
gray | 15 |
style2 | 15 |
spacer | 15 |
description | 15 |
bodytext | 14 |
head | 14 |
blue | 14 |
MsoNormal | 14 |
searchbox | 13 |
post | 13 |
posted | 12 |
block | 12 |
style3 | 12 |
yellow | 12 |
gallery | 12 |
home | 11 |
subtitle | 11 |
submit | 11 |
leftnav | 11 |
inputbox | 11 |
topnav | 11 |
author | 11 |
back | 10 |
option | 10 |
caption | 10 |
black | 10 |
searchinput | 10 |
border | 10 |
side | 10 |
selected | 10 |
tbTitle | 10 |
s2 | 10 |
bold | 10 |
icons | 10 |
helpblk | 10 |
ebcPic | 9 |
ebPicture | 9 |
visual | 9 |
topmenu-spacer | 9 |
c-right | 9 |
block1 | 9 |
s1 | 9 |
read | 9 |
block2 | 9 |
ebDetails | 9 |
ebPrc | 9 |
projectsTable | 9 |
bottomLine1 | 9 |
bottomRed | 9 |
bottomAboutText | 9 |
bottomAbout | 9 |
bottomLine2 | 9 |
block-text | 9 |
bottom-menu | 9 |
style4 | 9 |
green | 9 |
normal | 9 |
style5 | 9 |
submenu | 9 |
input | 9 |
navbar | 9 |
calendar | 9 |
heading | 9 |
foot | 9 |
red | 9 |
category | 9 |
blogbody | 9 |
divider | 9 |
cnnLarge | 9 |
cnnLeft | 9 |
bottomSearchTable | 8 |
bottomInput100 | 8 |
bottomSelect | 8 |
BottomWhite | 8 |
bottomAdv | 8 |
summary | 8 |
smaller | 8 |
cnnEndCell | 8 |
bottomDot | 8 |
bottomServices | 8 |
formbut | 8 |
related | 8 |
breadcrumb | 8 |
style7 | 8 |
orange | 8 |
navlinks | 8 |
nwslink | 8 |
leftmenu | 8 |
maintext | 8 |
txt | 8 |
secondary | 8 |
rub1 | 7 |
whitetext | 7 |
container | 7 |
cbox | 7 |
ta-c | 7 |
formtext | 7 |
blurb | 7 |
highlight | 7 |
mainmenu | 7 |
cal | 7 |
searchtext | 7 |
sidebar | 7 |
s | 7 |
bottom | 7 |
blog | 7 |
style | 7 |
powered | 7 |
footertext | 6 |
mini | 6 |
pagetitle | 6 |
imagealign | 6 |
grey | 6 |
row1 | 6 |
catlist | 6 |
ckCol | 6 |
binImg | 6 |
rubr | 6 |
none | 6 |
tm | 6 |
LastItem | 6 |
tagline | 6 |
searchform | 6 |
separator | 6 |
btn | 6 |
menu2 | 6 |
titre | 6 |
foot_alt | 6 |
module-content | 6 |
bannerAd | 6 |
smalllist | 6 |
subject | 6 |
sidetitle | 6 |
subhead | 6 |
alignleft | 6 |
alignright | 6 |
cnnData1 | 6 |
tabs | 5 |
icomtb | 5 |
last | 5 |
ContentBorder | 5 |
timestamp | 5 |
Byline | 5 |
TextAd | 5 |
Label | 5 |
default | 5 |
banner | 5 |
navtext | 5 |
udm | 5 |
hr | 5 |
pagenav | 5 |
style6 | 5 |
info | 5 |
bottomnav | 5 |
B | 5 |
alt | 5 |
entry-header | 5 |
entry-body | 5 |
entry-footer | 5 |
permalink | 5 |
TextBox | 5 |
nav3 | 5 |
text2 | 5 |
m | 5 |
rights | 5 |
current | 5 |
bot | 5 |
narrowcolumn | 5 |
postmetadata | 5 |
cnnHeader | 5 |
cnnNoBold | 5 |
dark | 5 |
clickPath | 5 |
formbutt | 5 |
pt-10 | 5 |
lnav | 5 |
footmsg | 5 |
navcolor | 5 |
f-11 | 5 |
navMainSections | 5 |
portletContent | 5 |
stitle | 5 |
mmsmall | 5 |
mastertable | 5 |
sidebarad | 5 |
cattitle | 5 |
f-gray | 5 |
ens | 5 |
toptab | 5 |
fivevert | 5 |
disclaimer | 5 |
disclaimerlink | 5 |
tenvert | 5 |
November 5, 2005 | Permalink
TrackBack
TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341cbf7d53ef00e55021085b8833
Listed below are links to weblogs that reference Semantics in the wild:
» IceWeb 2006 notes - David Shea, "CSS Project Management" from Már Örlygsson
Uses his own site Mezzoblue.com as an hands-on example. Hey, Dave uses BBEdit to write CSS and HTML! I remember using BBEdit back in my Macintosh days (in the MacOs 7 8 era). Recommends commenting your CSS code.... [Read More]
Tracked on Apr 28, 2006 9:26:25 PM
Comments
John, are you proposing the adoption of a standardised set of naming conventions for CSS ids and classes to ensure semanticity, for lack of a better word (BTW that is a derivative of specificity which I picked up at the Canberra CSS workshop) in page layout and stylesheets?
Within reason if everyone was using the same naming conventions you could have some interesting experiments by applying other people's stylesheets to your site knowing that it will restyle your information fairly accurately.
Whilst I like the idea, is it going to be achievable, in a practical sense?
damo
Posted by: damo | Nov 7, 2005 9:49:33 AM
Damo,
got it in one.
Is it doable? I think that is in the realm of the imponderable. I certainly don't think that you could say it is impossible. It's really a social issue, in the sense that, what would motivate people to do it?
I think it would require
1. knowledge - people would need to know about the possibility, and the options
2. motivation - people would need to see a benefit in doing it
I think a "top down" approach, of getting the major blog apps to adopt a common set of semantics would be a great start - but would that fly?
I know one or two smart people looking at this aspect, hopefully there will be some word of their efforts soon.
Thanks for the note,
j
Posted by: John Allsopp | Nov 7, 2005 9:55:54 AM
john,
I agree that if it can be and does get adopted it will be fantastic. Unfortunately web design/development is still a fledgling industry and it is only geeks like ourselves that keep up to speed on standards and find new/better ways of implementing code. Unfortunately, a lot of "designers/developers" are unaware of better practice.
The biggest problem facing "us" is ensuring adoption and compliance. As you mention, there will be a need to filter down the why's, how's and where's of semantic CSS and XHTML. It is in the best interests of everyone that uses the Net from casual surfers to coders, if only we can agree on a standard and implement it. I agree that top down is probably going to be the only way to successfully implement it however we are still learning ourselves. I think we are off to a good start.
damo
Posted by: damo | Nov 7, 2005 12:19:31 PM
John, this is really interesting. Here's an idea from the "weird and wacky" shelf:
I wonder if one could take your search engine results and create a del.icio.us-style app where one could browse all the class or id values the same as del.icio.us tags. When doing so, you would be presented with a representation of the tagged code. Might be an interesting way to see *what* developers are applying these tags to.
Probably useless, but as you mentioned that this is a "social" problem, why not experiment with social solutions?
Posted by: Steve Ivy | Nov 8, 2005 6:29:55 AM
It may be worth noting that clsidD27CDB6E-AE6D-11cf-96B8-444553540000 generally refers to the SWF classid attribute. Most likely a parsing error with your spider.
Posted by: Craig Bovis | Nov 8, 2005 7:23:05 AM
This has all along been a concept that I have tried to work with... One standardized way to mark up all my website... That way I can swap style sheets on different sites all I want. I actually started my base nameing conventions from CSSZenGarden. It seemed they had more than enough blocks to create a universal layout framework.
Posted by: Rod Kesselring | Nov 8, 2005 7:27:23 AM
This sounds similar to microformats http://microformats.org/, tho defining an entire page structure is a bit more ambitious. As you say, there would need to be some benefit for it to take hold.
Posted by: Brian | Nov 8, 2005 8:25:17 AM
Just because quite a number of (english speaking) developers use the same class or id for something doesn't make it semantic.
Also, why would you? What would it help if all headers were marked 'header' ? (It'd also push people into a certain design mindset, resulting in sites that all look the same.)
Posted by: AkaXakA | Nov 8, 2005 8:36:03 AM
This is really great work - I've been thinking about subject this for ages too. It always seemed strange that XHTML and CSS are 'standards' but that they get applied in anything but a standard way when it comes to the links between them - the id and class names. I'd love to stop reinventing the wheel all the time with my reused names like 'nav' and 'footer' and 'container' etc. The benefits of standardising these things might be more than just cos you could swap stylesheets, and that things would be more efficient. It might be handy for search engines to only trawl the 'content' div, for example, rather than trawling though menus and adverts and other stuff.
My small comment (for what it's worth) is that the very broad, automated approach to your sampling may be less interesting than doing this sampling in a more directed way. By allowing the spider to go everywhere, you are looking at the web as a whole and the amount of noise in the data is inevitably going to drown out the revealing bits. Of course, it's interesting to know about the web as a whole, but for the purposes of finding out what the people who are actually *thinking* about it are doing, which I think would be more useful, your spider is too wide-ranging.
If, say, we knew that the 'title' 'nav' 'main' 'content' and 'footer' ids are found in the vast majority of sites where those id attributes were set by hand, then that might encourage more people to consider using those.
I don't know how an automated system might go about narrowing down to the more useful sites... maybe the spider should check to see if the page was done with BBEdit or similar editor before looking further? Not really possible in the vast majority of cases. Or maybe it could check for validity (which might be a good correlate with having been written by someone more deliberately?)... but that might not be very easy, and might lead to a small dataset.
All I'm getting at is that the sites that have automated id names (like 'Layer2' etc.) are just not that useful for what you're trying to find out - which is what people who are interested in semantics are using - and instead it tells us about the state of the web on average, semantic sites lumped in with the completely unsemantic.
What do you reckon?
Posted by: rich | Nov 8, 2005 8:45:27 AM
I would quite gladly adopt a standardized set of class and id naming rules.
Can it be achieved? Yes and No.
Like so many things we deal with penetration will only ever be limited. But like webstandards and microformats and so many other things we love, it can be done if we spread the word.
Posted by: Keri Henare | Nov 8, 2005 9:15:59 AM
Would you consider opening the source of your spider?
Slightly off topic, but I've been looking for a utility that would allow me to crawl a site and index all ids and classes. The output I seek is a list of all URLs where a given id or class occurs.
I need this because I don't always know which pages will be effected by a CSS edit.
I need an id/class search engine for a single (large) site. Your code is a step in that direction. Or can you suggest some other tool?
Posted by: Jon | Nov 8, 2005 9:33:12 AM
A standard, and semantically rich, format for naming CSS is a good idea, although I imagine the debate when trying to agree names will be lively.
I can't help thinking that although CSS is the only place to do this at present, it's fundamentally the wrong place - semantics should come from XHTML (or whichever markup language), not CSS. A page's header (if that's what you call it) should be it's -header-, not it's -div class="header"-. Shouldn't CSS be about applying styles to a page, and utilising the semantics of the page to do it? I'm not trying to devalue the exercise, just venting my frustration at HTML's lack of sophistication after all these years.
However, until XHTML modularisation and/or XAML comes along it definitely makes sense to agree this naming convention and then carry it across to the tags, otherwise we'll each end up modularising different markup code for exactly the same function - something we should really try avoid otherwise we'll have standards, and lots of them!
Posted by: Wholesome | Nov 8, 2005 11:18:30 AM
Sorry to reply to all the comments in one - is this bad?
Steve,
>John, this is really interesting. Here's an idea from the "weird and wacky" shelf:
>I wonder if one could take your search engine results and create a del.icio.us-style app where one could browse all the class or id values the same as del.icio.us tags. When doing so, you would be presented with a representation of the tagged code. Might be an interesting way to see *what* developers are applying these tags to.
>
>Probably useless, but as you mentioned that this is a "social" problem, why not experiment with social solutions?
I've actually been playing around with the idea of "semantic search" for a while now. Indeed this project was an initial step in the direction. At present it would seem to be of no great value, as the noise to signal is so high. If however, there were standardizes semantics we could rely on, then it would be a more useful possibility.
j
>It may be worth noting that clsidD27CDB6E-AE6D-11cf-96B8-444553540000 generally refers to the SWF classid attribute. Most likely a parsing error with your spider.
Craig,
thanks for that, I'll adapt the spider to ignore these (I knew it was something, just couldn't workout what)
>This has all along been a concept that I have tried to work with... One standardized way to mark up all my website... That way I can swap style sheets on different sites all I want. I actually started my base naming conventions from CSSZenGarden. It seemed they had more than enough blocks to create a universal layout framework.
Rod,
the problem is that Dave's original class and id values were much more structural/presentational than semantic. I also don't think they really reflect the most common patterns people are using (though I don't really know what they are, look out for a post soon on that).
Someone we all know is looking at this issue to do something like the Zen Garden but for semantics, I've spoken with him at length about this for some time. Hopefully this research and subsequent conversations will be useful for that project.
thanks,
john
>This sounds similar to microformats http://microformats.org/, tho defining an entire page structure is a bit more ambitious. As you say, there would need to be some benefit for it to take hold.
Brian,
Tantek and I have spoken at length about some of these issues, and I spoke in Japan and WWW2005 on Microformats at the developer day. I think that Microformats focus on a particular level of granularity, and also on existing published, or essentially formal semantics (like vCard and iCalendar). The interest I have is helping to formalize by convention existing conventional uses. A kind of bottom up semantics. Of course this simplifies µF considerably, and is far from accurate. I certainly think you are right that there is at the every least a lot of inspiration to be drawn from what Tantek and others with µF have achieved.
>Just because quite a number of (english speaking) developers use the same class or id for something doesn't make it semantic.
AkaXakA,
I appreciate the point that semantics is not about English. This actually came up with Tantek at SxSW last year when he was talking about microformats and other ways of adding semantics to HTML. Douglas Bowman in particular raised the issue of internationalization. I believe that an approach like this is amenable to internationalization. But that would require a whole 'nother conversation (which I am happy to have).
I do however think that laving aside the issue of language, "quite a number of developers using the same class or id for something" does indeed make it semantic - it seems to me almost the definition of language - using the same signs to signify the same things gives meaning, and hence semantics.
>Also, why would you? What would it help if all headers were marked 'header' ? (It'd also push people into a certain design mindset, resulting in sites that all look the same.)
There is a fair bit here to unpack. Why would you? People already use the same patterns over and over - both personally, in organizations, and indeed across the web. So this is merely a suggestion that perhaps there is value in formalizing current practice (what Adam Bosworth calls "paving the cowpaths").
It is not proscriptive - you must have sites that use these elements, but rather descriptive of what people actually do.
And unlike HTML, which is closed in terms of the elements is contains, this i pen ended, it allows for new patterns to be formalized with the existing HTML language and semantics unchanged.
Thanks for the thoughts,
john
Rich,
>This is really great work
many thanks
>I've been thinking about subject this for ages too. It always seemed strange that XHTML and CSS are 'standards' but that they get applied in anything but a standard way when it comes to the links between them - the id and class names. I'd love to stop reinventing the wheel all the time with my reused names like 'nav' and 'footer' and 'container' etc. The benefits of standardising these things might be more than just cos you could swap stylesheets, and that things would be more efficient. It might be handy for search engines to only trawl the 'content' div, for example, rather than trawling though menus and adverts and other stuff.
yes to all :-)
>My small comment (for what it's worth) is that the very broad, automated approach to your sampling may be less interesting than doing this sampling in a more directed way.
Ah, stage two :-) Yes, that is something I am currently working on. What I wanted to do with the automated approach was to see just how much signal there was out there. This would show me the cowpaths, which Tantek Celik and Adam Bosworth (Tantek quotes Adam in this regard) talk about paving - look to where people are walking, and make them the path, rather than arbitrarily laying down paths, and "keep off the grass" signs, and hoping they stick to your plan, rather than theirs.
Now, I have found that there aren't too may paths in the sense of id and class values, but there are patterns people are using over andover. What I want to do next (with your help dear readers) is start cataloguing these patterns (I know others have done this, most notably Martijn van Welie [http://www.welie.com/patterns], I'll talk more about why I think it warrants another go in a post shortly). Then, we can start looking for some consensus regarding these patterns.
%gt;By allowing the spider to go everywhere, you are looking at the web as a whole and the amount of noise in the data is inevitably going to drown out the revealing bits. Of course, it's interesting to know about the web as a whole, but for the purposes of finding out what the people who are actually *thinking* about it are doing, which I think would be more useful, your spider is too wide-ranging.
As mentioned above, I don't disagree at all (only my goal was perhaps a little different for this particular phase). One reason I tried to cluster the spidering around stopdesign was the hope that this would at least give us a reasonable chance of sites that knew about this stuff.
>If, say, we knew that the 'title' 'nav' 'main' 'content' and 'footer' ids are found in the vast majority of sites where those id attributes were set by hand, then that might encourage more people to consider using those.
>
Precisely. But as we see from my results, that while people are doing this, there is no consensus (for example, probably well over a dozen ways of people specifying navigation in id and class values)
>I don't know how an automated system might go about narrowing down to the more useful sites... maybe the spider should check to see if the page was done with BBEdit or similar editor before looking further? Not really possible in the vast majority of cases. Or maybe it could check for validity (which might be a good correlate with having been written by someone more deliberately?)... but that might not be very easy, and might lead to a small dataset.
Good thoughts, I'll have a post soon about the next phases, we can return to this issue then!
>All I'm getting at is that the sites that have automated id names (like 'Layer2' etc.) are just not that useful for what you're trying to find out - which is what people who are interested in semantics are using - and instead it tells us about the state of the web on average, semantic sites lumped in with the completely unsemantic.
Sure, but it is interesting to know what the id and class values are (my guess is that "layer" values come from dreamweaver, for instance). But as mentioned, the next phase should start focussing on meaningful semantics more specifically.
thanks for thoughts, hope to hear more in subsequent posts
Keri,
>I would quite gladly adopt a standardized set of class and id naming rules.
>
>Can it be achieved? Yes and No.
>Like so many things we deal with penetration will only ever be limited. But like webstandards and microformats and so many other things we love, it can be done if we spread the word.
Indeed. I think if there is value, people will do it. Is there value in it? The consensus of the people I speak to about it is yes, but it does remain to be seen whether in reality that is the case
thanks
john
Jon,
>Would you consider opening the source of your spider?
I'd love to but it is a platform specific executable at present. I really should do it in Rails or something to get some practice in that :-)
>Slightly off topic, but I've been looking for a utility that would allow me to crawl a site and index all ids and classes. The output I seek is a list of all URLs where a given id or class occurs.
What I did is easily adaptable to that. I was thinking too of which elements get which values (for example, are people using dis for fotters, paragraphs? ...)
>
>I need this because I don't always know which pages will be effected by a CSS edit.
makes sense
>
>I need an id/class search engine for a single (large) site. Your code is a step in that direction. Or can you suggest some other tool?
Not off the top of my head, sadly. But it really isn't all that hard to do. Afterall, I managed it :-)
j
Wholesome,
&>A standard, and semantically rich, format for naming CSS is a good idea, although I imagine the debate when trying to agree names will be lively.
indeed!
>I can't help thinking that although CSS is the only place to do this at present, it's fundamentally the wrong place - semantics should come from XHTML (or whichever markup language), not CSS. A page's header (if that's what you call it) should be it's -header-, not it's -div class="header"-. Shouldn't CSS be about applying styles to a page, and utilising the semantics of the page to do it? I'm not trying to devalue the exercise, just venting my frustration at HTML's lack of sophistication after all these years.
It's not really in the SS, its actually in the values of the attributes of your HTML. CSS just potentially uses this for styling hooks.
>
>However, until XHTML modularisation and/or XAML comes along it definitely makes sense to agree this naming convention and then carry it across to the tags, otherwise we'll each end up modularising different markup code for exactly the same function - something we should really try avoid otherwise we'll have standards, and lots of them!
I think in reality it will be a very very long time before we have XML on the web in any way shape or form, beyond the simplest mechanism of XHTML 1.x
The question is, do you really need semantics embedded in the elements, when there is a perfectly serviceable mechanism that works right now?
john
Posted by: john Allsopp | Nov 8, 2005 12:27:12 PM
All I know is... I'm worried about anyone who names a class "blue", "white", "gray", "orange" etc. Think a lot of people must still be missing the point of CSS...
Posted by: Alex | Nov 8, 2005 12:45:18 PM
In my current redevelopment of my site I'm taking this kind of information quite seriously. I found What's In a Name (http://www.stuffandnonsense.co.uk/archives/whats_in_a_name_pt2.html) by Andy Clarke quite good and the names were similar to the ones I was already using. I think these set of names are pretty standard.
Maybe someone should publish some best practices on these class names/ids
Posted by: trovster | Nov 8, 2005 9:04:03 PM
If everyone uses the same tags, the same attributes and the same attribute values, guess what you'll end up with? Either:
a) An entire internets worth of the CSS Zen Garden?
b) A boring bland internet where design can on longer get around the mediums limitations?
Which do you think is more likely? Are there not enough bland default-styled blogs out there already?
While interesting simply for nerd quotient value ("who can come up with the nerdiest meme this week?", I see no benefit whatsoever to suggesting conformity in the use of id attributes which are intended to be *unique* markers.
CSS is for visual style and presentation, not semantics.
Posted by: hostyle | Nov 8, 2005 10:37:30 PM
The conversation you had with Douglas Bowman must have been the result of you both drinking New Zealand water. I'm pretty sure it's where all the ingenuity comes from.
Posted by: Keri Henare | Nov 8, 2005 10:47:19 PM
Actually, hostyle, there is a c) alluded to by damo and Rod.
c)An Internet where you can see all the pages in your own favourite style.
Posted by: Mike WS | Nov 9, 2005 5:16:43 AM
Semantic name conventions could make things easier, also for reusing classes and id's in a team collaboration. Then you have to give meaning to your style-elements which every body understands immediatly.
I think we should take a look at the DTP industry to begin. They have been naming their layout-elements in a traditional manner for a long time.
http://www.tameri.com/dtp/elements.html
for ecample:
h1.hammer {...}
p.sig {...}
p.byline img {..}
a.kicker {..}
Posted by: wisbin | Nov 9, 2005 6:02:56 AM
Has anyone started a naming convention yet?
How about this?
div#wrapper
div#container
div#intro
h1#title
ul#nav_page
ul#nav_site
div#content
div#content_main
div#content_supp
div#supp
ul#nav_global
p#info_credits_photo
address#info_street_address
address#info_phone_number
address#info_email
address#info_copyright
Posted by: BrainO | Nov 9, 2005 7:04:41 AM
Semantic name conventions could make things easier, also for reusing classes and id's in a team collaboration. Then you have to give meaning to your style-elements which every body understands immediatly.
I think we should take a look at the DTP industry to begin. They have been naming their layout-elements in a traditional manner for a long time.
http://www.tameri.com/dtp/elements.html
for ecample:
h1.hammer {...}
p.sig {...}
p.byline img {..}
a.kicker {..}
Posted by: wisbin | Nov 9, 2005 7:12:08 AM
I agree with wisbin!
Posted by: Adrian D. | Nov 9, 2005 11:24:33 PM
So many of the DTP elements are identified based on their place within a visual design. Elements like header, footer and sidebar don't make sense because they don't express their relationship to the whole.
Posted by: BrainO | Nov 10, 2005 1:48:38 AM
Well, I am not sure what the div#content stands for, but in my opinion, you should be able to name the classes and id's in their context, the DOM tree. The place of the layout-elements is what makes them semantic imo.
For example, if you layout your template, you could give meaning to the tree by using meaningfull relations, in your code.
»like a logo and text within a Header to the top of your template:
div#header
p.nameplate
img.logo
-----------
» Or a Heading with some classifications (crumbs) and a subheader with a byline and a small summary to start an article:
div#article
a.kicker
h1.hammer
h2.subhead
p.byline
p.deck
-----------
I think you should not use #sidebar for a column, but instead .sidebar for containers you want to place besides the maintext:
div#bodycopy
div.sidebar
p.pullquote
a.teaser
...
Posted by: wisbin | Nov 10, 2005 9:11:30 AM
Very interesting, thank you for this analysis.
By the way, scanning the comments: Well, there is certain CSS semantics in respect of class and ID naming (compare "error" and "redtext"). Regarding naming conventions, no, there is no real agreed-on set of names for common page elements (qed). And finally, use class and ID names which describe the function of the element, not its visualization (colors, position, etc.), or use at least "generic" names (for example, "alt" or "aux", some of my favorites).
Posted by: Jens Meiert | Nov 11, 2005 9:44:11 PM