« October 2005 | Main | December 2005 »

November 24, 2005

please welcome Zoe Kate Allsopp Lander

Born 8.40pm 22.11.2005 (that’s her dad’s 39th birthday - and it’s my present to her - I’ll stay 39 from now on I’ve decided - very big of me.)

Mum and Baby doing very well - settled, happy, beautiful (thank goodness she takes after her mum).

Regular transmission to be resumed soon

john

November 24, 2005 | Permalink | Comments (16) | TrackBack

November 18, 2005

WebPatterns and WebSemantics

WebSemantics 1.0

Over the last few years, we've seen a growing awareness of the importance of semantics in HTML. Perhaps Dan Cederholm's rightly lauded Simplequiz was the coming of age of this idea, which while there at the very beginning of HTML, had been overlooked for a long time by the great majority of web developers. We could consider this "first generation semantics" or, in current parlance "WebSemantics 1.0".

But it's clear that the semantic expressiveness of HTML, even with HTML compounds is reasonably limited. And developers are voting with their feet, using class and id to create their own ad hoc semantics for HTML.

Even XHTML 2, while somewhat more semantically rich than the current generation of XHTML/HTML, will fall far short of the semantic complexity which developers are currently cooking up, or which is found in even the first generation of microformats (and who knows what we might see from microformats.org in the next year or two).

So, it seems that developers want/need the semantic richness that is only available through the use of class and id values in HTML/XHTML. My recent semi-random survey of around 1300 sites, found well over 5000 distinct class values, and almost 5000 distinct id values, which surely demonstrates that this is something developers are doing in significant numbers.

It's also something that has been talked about for some time by the likes of Tantek Çelik and Andy Clark (and of course many others).

But my recent survey also showed that there is very little consistency in the use of class and id values - no conventions, no accepted set of values, and so none of the potential benefits which might flow from that.

In the comments which followed that post, the overwhelming consensus was that some kind of standardization would be a good thing. This echoes my conversations with developers, some very well known, many in the trenches, in organizations big and small, on this issue over the last year or so.

The temptation is (as evidenced by a number of comments), that having identified the need for some kind of standardized class and id values, we should roll up the sleeves, jump in, and start a convention, any convention, before its too late.

But standardized (or "conventionalized", it's not necessary that this be some kind of formal standard or W3C recommendation) class and id values will only be adopted if they clearly meet the needs of developers. It's also worth noting that we don't need a "big bang" approach - we don't need to agree on every possible semantic value for class or id at once. But given how long it takes to develop standards, like XHTML2 and CSS3, how on earth could this be done in a reasonable amount of time, in an unstructured way?

Let's start by trying to understand the problem well, before rushing to an implementation (in my experience, that's the cause of a lot of bad decisions and bad solutions).

What's the problem?

As I am fond of asking, "what is the problem"? When developers use class and id values, what problem are they trying to solve?

One of a couple of things.

Often, developers are creating hooks for styling with CSS. "Classitis", aka "divitis" is a common malady on the web, as we all know. But when developers are using class and id in a more structural or semantic way, what are they trying to achieve? Commonly, developers use id and class to identify structural, or architectural components of a page. For example, the most common id and class values in my survey were things like header, footer, content and navigation.

I used the term "architectural" deliberately, because originating in the field of architecture (as in buildings), and having taken strong hold in many areas of computer science (object oriented analysis and design, Human Computer Interfaces, and more) is the idea of patterns, and pattern languages.

What I think web developers are doing instinctively is using patterns in web page and web site architecture. A small number of people, most notably Martijn van Welie have done work on patterns and web design, but it's a surprisingly under-researched area, given how powerful and valuable a patterns approach to web development could be.

So what are patterns?

But hang on, what are "patterns"?

On the web you'll find a considerable amount of information about patterns and pattern languages, in architecture, computer science, and other fields. I've got some links to resources at the end of this post, but here are just a few introductory notes.

In 1977, Christopher Alexander, Sara Ishikawa and Murray Silverstein published "A Pattern Language", which formalized their pattern based approach to architecture.

In a nutshell, a pattern describes a problem which occurs over and over again ... and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice (my emphasis).

The essence of a pattern is that it articulates a problem, then considers how that problem might be solved. It does not dictate a single solution, rather, a strategy for solving the problem.

The idea has been taken up in a number of computer related fields, most famously in object oriented analysis and design, the seminal work there being Design Patterns Elements of Reusable Object-Oriented Software, by "the gang of four" - Gamma, Helm, Johnson and Vlissides. In this context, Brad Appleton observed, Fundamental to any science or engineering discipline is a common vocabulary for expressing its concepts, and a language for relating them together. This really struck a chord with me. Can web design and development today rightly be called a discipline? Or is it, as I suspect, a practice in the process of becoming a discipline. If we consider Appleton's observation, do we have a common vocabulary for expressing [our] concepts, and a language for relating them together ? We certainly have technologies like HTML and CSS (though we still struggle to even use the terms "tag", "attribute" and "element" correctly - I'm still hearing "alt tag" frequently in my travels). But what about higher order, more complex structures and strategies? What do we call the parts of a page we use over and over again? What names do we have for particular navigation strategies (such as hierarchical trees, linear progressions through sections, and so on)? My survey, as well as a more detailed, though much more narrowly focussed one by François Briatte suggest that while it's clear that there are many structures we use over and over again, we lack that common vocabulary to talk about these structures, and to relate them to one another.

In short, we lack a pattern language for the web.

But is our discipline mature enough to develop this vocabulary and language? Only trying to do just that will tell.

If we return to the issue which began this post, of id and class values, how does that relate to the issue of patterns? Earlier I asked - when developers use class and id values, what problem are they trying to solve? . My answer is that we are struggling to find a vocabulary to talk about these common constructs and concepts. But before we quickly run out and invent a vocabulary, let's stop for a moment and think about what these patterns are. Not all of them right away, just the most common ones. I think this could help us determine what id and class values, what kinds of HTML element, and so on, to use when translating these patterns into HTML.

But how to go about this?

A couple of years back, Dan Cederholm published his very popular and helpful "SimpleQuiz". This sought to understand best semantic coding practices by asking developers what they thought the most semantic way of marking up various commonly used constructs were. In fact what Dan was doing was trying to understand patterns, particularly as they are implemented using semantic HTML, constructs like headings, lists, linking, and paragraphs. SimpleQuiz even touched on some patterns which go beyond straight HTML, such as how to markup a page header. But (and this is not a criticism) SimpleQuiz took the patterns for granted, and looked for solutions, while I'm interested in discovering and cataloguing the patterns developers currently use themselves. How these patterns are implemented is an issue for subsequent investigation and discussion.

Over the coming weeks, I'll be running (with Dan Cederholm's blessing) "PatternQuiz". The goal is to begin collectively understanding, naming and cataloguing the patterns which have already emerged in web development, and which we all use commonly. In Appleton's terms, to develop a "common vocabulary for expressing its concepts, and a language for relating them together". Or in Alexander's terms a pattern language for the web. Once the conversation is flowing, I hope it can be formalized in a wiki, which I am setting up at WebPatterns.org, a site I've established to promote the use of patterns in web development, similar in some way to Microformats.org

What makes a good pattern?

Others have done a lot of work in the area of patterns in many different fields over a long period of time. It's important to stand on their shoulders, rather than having to relearn from scratch the lessons they have learned.

James Coplien (often referred to as "Cope"), has this observation about what makes a good pattern, which might help think about webpatterns. A good pattern, says Cope,

  • solves a problem: Patterns capture solutions, not just abstract principles or strategies.
  • is a proven concept: Patterns capture solutions with a track record, not theories or speculation.
  • describes a relationship: Patterns don't just describe modules, but describe deeper system structures and mechanisms.
  • The pattern has a significant human component .... All software serves human comfort or quality of life; the best patterns explicitly appeal to aesthetics and utility.

Who are patterns for?

Cope's final point is one particularly worth emphasizing. The purpose of identifying patterns is to use them in our work as designers, information architects, and so on. But why should we go to all that effort? Who benefits if we do so?

Usually, the focus is on the benefits of patterns for the developers who use them. Now using patterns certainly has its benefits for developers - it helps solve complex design problems in better ways, more efficiently, while avoiding potential pitfalls, based on our own and others' previous work. Patterns help to facilitate larger development teams, which will become more common as the practice of designing and developing for the web continues to mature, by providing a common vocabulary and language to aid communication. They help site maintenance, where page and site logic and structure are much clearer when captured explicitly as a pattern. Compare reading the HTML for a WordPress (or other) blog with an identical looking table based design (such as a Geeklog page). While the patterns used in each are very similar, in a standards based, semantically marked up page they are explicit, making the code much more readable and maintainable.

But, there is another significant, often overlooked group who would benefit from a pattern based approach to web development - users. Users are often overlooked when it comes to patterns, perhaps because the most common and well developed work in patterns is in the area of object oriented analysis and design. Using design patterns in this field produces good outcomes for users, but users don't see or interact directly with the patterns which OO designers develop. These patterns lie behind the scenes, in the code.

But in web development, users benefit from the consistency, the learnability, the familiarity which comes from using patterns. Take blogs. When you visit a blog you can generally readily, without much conscious thought, work out how to get to the main page, how to contact the writer, how to leave a comment, all because the patterns in blogging are so well established. This is true regardless of whether the site is developed using Moveable Type, Blogger, WordPress or by hand. This is not nearly so true for many other common types of site. In part that's due to their greater complexity of many kinds of site - a newspaper site has possibly dozens of sections, and, many many different kinds of content, as compared with a blog's much more limited set of patterns. But in part it's true because there are very strong patterns in common use by blogging tool developers and bloggers.

This doesn't mean users will be consciously aware of the patterns which comprise a site they are visiting, or even that all of those patterns are directly accessible to the user. But patterns in web development benefit not only developers, but also site visitors, maybe even more so.

There's a final entity for whom a pattern language, and the standardization of semantics, could bring significant benefits - machines. Today, software discovery of information on the web is largely limited to plain text indexing, and very simple metadata like tags (as in technorati, not bold), and some other microformats like XFN, hCard and hCalendar. Imagine the kinds of data mining possible if much more of the content of the web was marked up in a semantically rich, standardized way. Understanding and cataloguing the current patterns in web development is an important first step in the development of robust, rich standardized semantics, for today's web, using todays technology, without waiting for The Semantic Web.

What can you do right now?

Think about patterns

Start thinking about patterns in your work, and in the sites you visit. Particularly start thinking about types of pattern. Immediately the following come to mind

  • different types of site - e.g. blogs, newspaper sites, search engines
  • different navigation strategies, like hierarchical trees, linear paths, or indexes
  • different kinds of pages within sites - front page, contact page, search page
  • different navigational page sub parts (like, site level navigation, site subsection navigation, page level navigation)
  • different page content sub parts - main text, pullout quotes, "side bars" and so on

and of course many many others. The fun part is finding them.

Notice I haven't talked about things in terms of appearance. I think a basic rule is to avoid any kind of presentational name to begin with (although header and footer are so commonly used that it would be perhaps counter productive to not continue using those terms).

Overall, looking at the list above, there seems to be a number of different categories of pattern. A number of the above patterns relate to page architecture. But that's not the case for sites, or for navigation strategies. As this work continues, I imagine several categories of web pattern will emerge and become clearer.

It's important to note that patterns aren't just restricted to pages and parts of pages. A site, for instance, is an open collection of pages, while a navigation strategy (or mode, or whatever we might call that kind of pattern) is even more ephemeral - it's a concept which the designer creates in the mind of the user by the use of other architectural patterns, like breadcrumb trails, tables of contents, next and previous links and so on. This is an example of the powerful concept of generative patterns, patterns which work together to build more complex patterns. It's this which makes a collection of patterns a language, rather than simply a catalogue.

When thinking about patterns, here are some ideas of things to think about

  • What name should we call the pattern? Do good names already exist for the pattern (perhaps from outside the web)
  • What is the problem which the pattern captures?
  • What solutions exist which attempt to address the problem? Remember Cope's observation that a pattern is a proven concept, not speculation.
  • How might you implement the solution (what concrete examples of the solution can you think of)?
  • What other patterns work with this pattern? What patterns does it play a part in? What other patterns form a part of it?
  • What category of pattern does it belong to?

What's next?

As mentioned, I'll very soon start my PatternQuiz, so subscribe to the RSS feed for this site, and get ready for that quiz. Soon too, the WebPattern Wiki should be up at webpatterns.org. In time I hope that can become a collaborative catalogue of patterns for the web.

You might also be interested in reading more about patterns. There are a lot of great resources online, here are just a few to get you started.

Footnote: Martijn van Welie's Web Design Patterns

It's very important for me to acknowledge the work that Martijn van Welie has done in this area, which in my experience is the most significant in the field. Martijn's site catalogues dozens of patterns, with several categories of pattern, including site types, navigation, page parts, page types and more. He also includes visual design patterns, such as liquid layout and alternating row colours, and interaction patterns, like stepping and wizard.

So, if Martijn has done all this work, why am I proposing to essentially start it all over again? While Martijn's work is significant, it is not without some significant shortcomings.

  • It is the work of an individual. As such it will always be to an extent idiosyncratic, and limited by the amount of work an individual can do. Even a moment's reflection will show that the number of patterns which have already emerged in web development is simply enormous. No individual can catalogue all these. The WebPatterns project I will outline in more detail shortly, aims to harness the collective knowledge and experience of many many web developers (in effect, anyone who cares to be involved).
  • It lacks a systematic structure. Each category of pattern, and in turn, each individual pattern is isolated. For example there is a category, "Page Elements", and yet other categories, such as "Navigation" contain page elements as well. My sense is that web patterns probably have both a hierarchical nature (so Navigation elements would be a subset of Page elements), and a compositional nature - patterns both comprise other patterns, and help make up larger patterns. This aspect of the relationship between patterns is strongly emphasized in most major pattern work, particularly that of Alexander et. al. and the Gang of Four.
  • Closely related to this is the idea that a pattern language should be generative, that is, taking a particular pattern within the language as a starting point, related patterns are explicitly referred to by the language itself, helping to identify more complex solutions more efficiently. Because van Welie's work lacks a strong structure, and explicitly articulated relationships, it is much less generative than other pattern languages.

Martijn himself identifies the importance of these second and third aspects in Generative Pattern Languages for Interaction Design [PDF] where he writes

When Alexander wrote his book on architecture design patterns, it did not just contain patterns; the patterns formed a language. His language was hierarchical and started out on the level of cities, then neighborhoods, houses until the level of windows or seats was reached. In Alexander’s idea, the language actually “generated” the design by traversing from high level patterns to the lowest level of patterns. From the design of cities down to the design of window seats, a hierarchy of scale.

I hope Martijn's detailed understanding of web patterns specifically, and expertise in pattern languages more generally will play the important role they should in the further development of our understanding of webpatterns.

Technorati Tags

I've taken the liberty of coining two as yet unused technorati tags for this post - and . While the tags Semantics and Pattern are both reasonably well used (with a little over 100 and 1000 results on Technorati respectively) they are each applied to many completely unrelated concepts, and so I suggest these more specialized tag is warranted to help facilitate finding posts specifically about these issues.

November 18, 2005 | Permalink | Comments (67) | TrackBack

November 15, 2005

How not to protest in the 21st Century

This one is particularly for my Australian readers.

Some background

Recently, in Australia, the federal government was returned with an unusual majority in both the Senate (upper house) and House of Representatives (like the Commons or Congress, the lower house).

All of a sudden, the Government found itself in the position to do pretty much whatever it wanted. Not being known for its moderation, (or frankly competence,or human decency either - oops, that might get me gaoled soon, given some of their new "anti terror" laws (really anti civil liberties laws, but anyhoo)) they have thrust on the Australian populace a deeply unpopular radical reworking of our entire industrial relations system, with very little debate in parliament, to the benefit of no one other than employer groups (and given I am a real capitalist, that is someone who risks their capital to build companies, you can't just dismiss my position on this as being self interested - in many ways it is the opposite of that).

But, all this is kind of beside the point.

Right now, many thousands (probably several hundreds of thousands) are marching in the streets to protest. And you know what this will achieve? $%&# All. What on earth could such an outmoded form of protest hope to achieve other than to make the opponents of the bill look antediluvian, obstructionist, and easily parodied? If you want to achieve an outcome, first you have to work out the force involved. Who can you co-opt or force into siding with you? Nearly a decade ago, a massive outpouring of public support for Aboriginal Australians, culminating in marches of hundreds of thousands achieved precisely nothing. You have to learn from the past, and not repeat its mistakes.

So who can you get on your side? Well, forget the Federal Opposition - out numbered, in no position to do anything for you whatsoever, and frankly pathetic. The only people whose actions matter to you are members of the government. And what chance do you have of getting them to change their mind? Plenty. Marginal members, the ones who hold their seats by margins which could easily be lost at the next election are always terrified of issues like this. But, you have to put the pressure on them. Here is a little program which would cost next to nothing in money or effort to organize.

  1. Identify the most marginal government seats, and their margins
  2. Get people who live in these seats to write to their members, with their names, pointing out that they live in their electorate, and that there is no chance of them voting for the member should the bill get passed
  3. Watch these members get very worried indeed, and put a lot of party room pressure on the Prime Minister

But instead we get 60's style marches, with all and sundry chiming in, chanting stupid slogans, with Trotskyists and Maoists carrying banners, making it ever so easy for everyone involved to be tarred with their brush.

How bloody stupid can you get?

November 15, 2005 in from the "if you always do what you always did department" | Permalink | Comments (6) | TrackBack

November 12, 2005

Music podCast

After WE05, I got the idea of editing up some of the moments from the conference which stuck in my mind, and put them to some music. So, not having done anything musical for over 15 years, I broke out GarageBand and started playing around a bit. A few weeks ago I podCast two tracks from that effort, and upbeat and a deeper version. Since then I've been playing around some more with GarageBand, in those many hours I have with nothing else to do (right).

I've got a few tracks in various states, but I've decided to try and make one available at least once a month. So, I've just uploaded "you may know my poetry", which is basically a sketch for something I don't have time to do a lot more with at the moment, but, might make for a nice couple of minutes listening.

I've also added a new podcast link to the right hand side of the page, where you can subscribe if you''d like to be kept up to date with tracks as I upload them.

Enjoy, "you may know my poetry".

john

November 12, 2005 | Permalink | Comments (2) | TrackBack

November 11, 2005

Stop whining about DRM, *IAA, and so forth - do something about it yourself right now

The rumbling about "Digital Rights Management" (DRM) increasingly built into digital media (music and video from the iTunes store, DVD Region encoding (deemed an illegal restraint of trade in Australia) and much more) long found at the "geek end" of the web user spectrum is about to tip, I suspect, into the wider community. It was only a matter of time before the stupid greed of the film, and particular music industries went too far, (as if the Digital Millennium Copyright Act of 2000 wasn't too far, in part criminalizing what has traditionally been considered "fair use" of copyright material - by the way, Australians, don't think you are safe, as our now year old "free trade agreement" (which has seen trade with the U.S. fall by 3% since implemented, requires copyright laws to be "harmonized" between the two countries)).

But SONY's recent arrogant, bone headed attempt to stop you using music your have purchased a license for in ways which may well be legal (in Australia, unlike many other places, we have few use rights when we purchase music, even ripping a CD to play on your computer or iPod is not permitted. Not sure what happens with portable CD players which cache your music to RAM to provide an anti skip buffer (you know, from companies like, um, SONY) - probably an illegal device in Australia) is hopefully the thing that will bring just how greedy, arrogant, contemptuous and stupid the mainstream recording industry is into widespread public cognizance.

A number of SONY music CDs, when played on your computer will install a rootkit. purportedly as a form of DRM. Rootkits are really bad things - processes which are intended to conceal running processes and files or system data, which helps an intruder maintain access to a system for malicious purposes.

Already EFF (the Electronic Frontier Foundation) is looking to bring legal action against SONY, and the state of California appears to be bringing an action against them, alleging

Sony's software violates at least three California statutes, including the "Consumer Legal Remedies Act," which governs unfair and/or deceptive trade acts; and the "Consumer Protection against Computer Spyware Act," which prohibits -- among other things -- software that takes control over the user's computer or misrepresents the user's ability or right to uninstall the program. The suit also alleges that Sony's actions violate the California Unfair Competition law, which allows public prosecutors and private citizens to file lawsuits to protect businesses and consumers from unfair business practices

On one level, SONY's action has so far overstepped the mark that it doubtless will rebound on them, and they will wish they had never even thought about doing this. But it underscores just what the music industry thinks of you. Very very little indeed. Not content with suing children for downloading music, they now want to install malicious software on your computer, without your knowledge, from CDs which you have purchased from them.

But what to do? I think the legal actions are fine - if you break the law, you should suffer the consequences. We are fine and dandy with that for 12 year old kids who download music, so we should be double plus good with that for the industry that wants to bring those kinds of actions. I hope (but doubt) they are fined into the stone age for this.

But you know, in essence there is a single action, that you as an individual can take, that will have an effect on the music industry. It's not whining. Its not suing, it's not writing to your congress person/representative/local member.

Don't buy their music. Don't buy music from any mainstream music publishing company. Don't buy music from anyone associated with the RIAA (or similar organization in your country).
Sure, you will miss out on the fine offerings of popular culture, and doubtless will feel less culturally enriched than if you could get access to Celine Dion, Ricky Martin and the rest of today's equivalents to Monteverdi, Bach and the like. But geez it's not like that is the only music available. Go looking for independents. Start with someone like emusic, broaden your horizons, "try something new Homer".

But above all, just stop whining how bad the music industry is then running out to buy more of their DRM laden crap. If you are going to drink the Kool Aid, go ahead, but don't complain about how bad it tastes.
The only thing that is going to change the behavior of this lot is a swift kick right where it hurts. Their bottom line.

So start kicking.

An update - must be something in the water

All of a sudden, articles about how to live without the music industry are springing up all over the place. Here is one you might like to check out.

More updates

So SONY has pulled the CDs featuring their rootkit on it. Well, that ought to cost them a few bucks, but hopefully they'll bear a higher burden than that. Hot on the heels of this announcement we find that software SONY released to get rid of the rootkit leaves your system highly vulnerable.

And, in the ultimate show of contempt (actually a demonstration that this is not even remotely about copyright or intellectual property at all, rather a strategy for maximising revenue from their paying customers, by limiting in software fair use rights that you may have under your countries copyright law (except in australia where you have little if any) their rootkit contains pieces of code that are identical to LAME, an open source mp3-encoder, and thereby breach the license).

And now with the US Attorney General proposing harsh criminalization and jail time for even trivial copyright infringement, I expect we'll be seeing SONY execs in orange jumpsuits and manacles some time soon? Not.

If this whole episode leaves you with any doubt about what SONY thinks of their customers, and intellectual property law (little and even less) then by all means keep giving them money.

November 11, 2005 in Rants | Permalink | Comments (19) | TrackBack

November 09, 2005

Web Blandards

To my recent post on semantics, a couple of comments touched on the wider issue (and one I have never really understood) that somehow by standardizing, you stifle creativity. It's an argument I've heard for many years in criticism of web standards, and one that I fail to see any logic in. But I've never really seen it discussed in detail, and as it simply won't go away, I want to address it here.

In a comment on Semantics in the Wild, hostyle wrote

If everyone uses the same tags, the same attributes and the same attribute values, guess what you'll end up with? Either:

a) An entire internets worth of the CSS Zen Garden?

b) A boring bland internet where design can on longer get around the mediums limitations?

Which do you think is more likely? Are there not enough bland default-styled blogs out there already?


I think this continues to be one of the great furphys (a kind of red herring, apparently a term unused in North America?) regarding standards, formalization, and the web.

The argument is that standards are bad, because somehow standardization will stifle innovation and creativity. I frankly can't see how that logic works.
All communication media (other than the web) are rigorously standardized. Radio, Television, film, sound recording, all have strict technical standards. This brings enormous economic benefits. While much of the output of these media is, I would be the first to agree, repetitive, uninspired, uncreative, it can hardly be blamed on the fact that if you buy a CD it will work in your CD player. Could someone explain to to me the logic that if somehow you had to buy specific CDs for each specific type of CD player you owned, then there would be more creativity in music?

But that is argument by analogy, which is rhetoric, and so, not very useful.

Let's take the proposal that we somehow standardize the class and id values used for common information patterns which developers are already using and have done for some time. hostyle and others argue that somehow this will lead to "boring bland internet where design can no longer get around the mediums limitations".

But, what I propose is built on top of a considerably more impoverished semantic framework, namely HTML - HTML has a very very limited set of elements with which to work. Surely the argument should follow that we need to get rid of HTML, and simply let people make up their own web languages to encourage creativity?

Constraints are a fact of any medium of expression - language, music, cinema, whatever. In fact, you might argue they define a medium. Is the music of J. S. Bach, constrained as it was by both conventional and self imposed rules, any the less creeative for that? The constraints of the fugue, for instance, define much of the music of Bach, among the most sublime, revolutionary, and indeed popular in western music.

Or am I still missing some vital piece that eludes me?

John

November 9, 2005 | Permalink | Comments (38) | TrackBack

November 05, 2005

Semantics in the wild

Regular readers will I hope by now have guessed that I am less of a fan of "handwaving" and more of research when it comes to reasoning about and understanding what developers actually do.

For a long time now, I've been very interested in how developers use semantics in their pages - not just the semantics provided by HTML, like headings, but also the semantics developers create through the use of class and id values.

Afterall, for a long time developers have been using classes and ids to create informal semantics (as well as to provide "hooks" for CSS).

So, how are they actually doing this? In particular, I am interested to know whether common semantics emerge from what developers are doing in their code. Are there a set of frequently semantic class and id values we can "standardize" on (by convention)? Afterall, there are many common patterns (multi-column layouts, headers and footers, one, two and more order navigation systems, and many others) which sites feature over and over. You might think that perhaps by osmosis, conscious or unconscious emulation, or even morphic resonance, developers might have started using the same or similar class and id values for the same patterns.

So in my new found phase of experimentation, I wrote a very simple search engine. What it does is takes a URL, grabs all the class and id values for that page, then gets all the external links in that page, and visits these, and recursively does the same thing. I also tried to make sure that it did not visit any given domain (even where the server was different) more than once.

Thus equipped with this magnificent vessel of exploration (the equal at least of the Nina, Pinta or Santa Maria) I set sail for places unknown from my port of embarkation, stopdesign.com. I chose that domain because Douglas Bowman and I had discussed this whole issue in great detail last year while in New Zealand, and because I felt that at least in resources close to stopdesign I'd find sites that were developed by people with similar sensibilities.

So on this maiden voyage, what did I discover?

The crawl visited 1315 sites. In these sites it found 5374 different class values and 4789 different id values. So, what exactly emerged from all these values?

First that the noise to signal ratio is high - table 1 and 2 below, with their measure of how many id and class values appear a single time (88% of all id values, and 80% of all class values), as opposed to even 5 or more times (ids less than 1%, class about 5%) demonstrate that there is very little agreement as to specific class and id values for semantics.

Secondly, the class and id values reflect a very presentational way of viewing the contents of a page - with values including the words top, left, right, bottom, color names, sizes like small, words like spacer, style, all indicating that for most developers their informal "semantics" is highly presentational.

A very unscientific survey of values which appear 5 or more times finds that presentational class and ids values are at least as likely as non presentational ones, let alone strictly semantic one. People also appear frequently to be creating class and id values which are either identical to, or clearly very close to existing HTML elements, like menu, submit, heading, and even body. A good many of the values appear by and large to be meaningless, probably created by applications. The number 1 id value afterall is "D27CDB6E-AE6D-11cf-96B8-444553540000"

But why don't I throw this over to you now? Much like Dan Cederholm's simplequiz - based on the data here, and also your own experiences, what class and id values are (or at least should be) commonly used. In essence, what's missing from HTML/XHTML in terms of semantics - not in theory, but in the practical sense of the kinds of constructs developers use all the time.

Some pieces that that come straight to mind, based on this survey, include

  • Navigation - this appears under many guises, as both id and class values, including navbar, nav, navigation (the #1 class value) and others. There are a number of different kinds of navigation, including site level navigation, page level navigation and doubtless others as well. Where these physically appear on a page should not be important (top, left and so on are presentational, not semantic)
  • breadcrumb trails
  • page header (is h1 enough for this, to revisit a SimpleQuiz?) In the sites surveyed this appears as all kinds of things like masthead, title, pagehead, header, logo
  • page footer (is this really semantic, or just a presentational chunk? Is there a more semantic name like "fine print". Yes I know that the term fine print is ironic. ) "footer" appears as a class value 73 times, fourth highest, and as the second highest id value, with 89 occurrences, making it overall the winner when combining class and id values.
  • Main content for a page. This appears quite frequently, in one guise or another.
  • Author - would the HTML address element be better for this?


I could go on for some time I am sure, but I am much more interested in other people's take on this, based both on your experiences, and the data at hand. Fire away with your suggestions.

Table 1

Frequency of id values
Frequency id value appearscount
14203
2302
3220
441
526
613
717
88
925
105
11-2015
21-306
31-402
41-501
51-605
61-703
71-800
81-901
91-1001
100+0

Table 2

Frequency of class values
Frequency class value appearscount
14325
2507
3174
4108
555
629
716
822
934
1012
11-2043
21-3011
31-404
41-509
51-605
61-704
71-804
81-900
91-1000
100+2

Table 3

id values which appeared 5 or more times (a couple of syntax errors have been removed)
ValueCount
clsidD27CDB6E-AE6D-11cf-96B8-44455354000090
footer89
BTAMarker65
header63
content63
areaTitle57
layout52
noBulletContent50
sqBulletContent50
logo50
search42
main33
banner30
container29
Top26
sidebar26
nav23
wrapper23
+23
left19
Table118
searchform17
right17
menu17
home16
Layer116
searchbox15
Copyright14
page13
topMenu12
AutoNumber112
Layer211
bottom11
submit11
Table211
Body10
Image110
news10
toptabs10
breadcrumbs10
ebGallery9
links9
menuitem9
topNav9
s9
Table39
id-109
id-119
all-m9
top-m9
id-89
visual9
l-m9
id-79
id-69
id-height9
id-b-19
id-b-29
id-b-39
id-b-49
id-b-59
id-99
Masthead9
q9
navBar9
globalNav9
Table_018
Image28
topmain8
topmenu-b8
gmo_foot8
gmo_logo8
main28
wrap8
orphus8
gmo_link7
Title7
Image47
password7
AutoNumber57
gmo_img7
gmo_ul7
tophban7
Table47
gmo_service7
current7
AutoNumber27
navigation7
podlogo7
binfo7
banbox7
email7
Form16
query6
Image56
searchsubmit6
about6
Image36
leftcol6
AutoNumber36
maincol6
jobs6
AutoNumber46
login6
gmo_copy6
pas-m6
contact5
_x0000_i10255
site-switch5
subhead5
navright5
intro5
inseguitore5
brdbott5
ROOF5
mainnav5
WTL_TAG5
udm5
Image65
name5
location5
headernav5
Company5
Products5
rightcol5
newsletter5
press5
navcontainer5
leftmenu5
CONTENTBODY5
calendar5
Table55

Table 4

Class values which appeared 5 or more times (a couple of syntax errors have been removed)
ValueCount
navigation131
title113
help80
footer73
content71
pipe71
noMargin69
prefill68
ebay68
novisited68
button60
standard58
section58
Menu57
text55
searchFormSection50
rightAnchor50
catTable50
metaCat50
seeAllLink50
seeAllBullet50
adSpacer50
boxSection50
nav43
ocDCP42
white39
Date39
body37
module31
CIPpromo30
small29
main28
header28
copyright26
tiny26
hdr24
clear23
headline23
link22
search22
links22
v20
style120
v119
topMenu19
left19
more19
smalltext18
prnav18
prred18
copy18
box18
hide18
trans17
news17
logo17
top16
right16
center16
item16
entry15
gray15
style215
spacer15
description15
bodytext14
head14
blue14
MsoNormal14
searchbox13
post13
posted12
block12
style312
yellow12
gallery12
home11
subtitle11
submit11
leftnav11
inputbox11
topnav11
author11
back10
option10
caption10
black10
searchinput10
border10
side10
selected10
tbTitle10
s210
bold10
icons10
helpblk10
ebcPic9
ebPicture9
visual9
topmenu-spacer9
c-right9
block19
s19
read9
block29
ebDetails9
ebPrc9
projectsTable9
bottomLine19
bottomRed9
bottomAboutText9
bottomAbout9
bottomLine29
block-text9
bottom-menu9
style49
green9
normal9
style59
submenu9
input9
navbar9
calendar9
heading9
foot9
red9
category9
blogbody9
divider9
cnnLarge9
cnnLeft9
bottomSearchTable8
bottomInput1008
bottomSelect8
BottomWhite8
bottomAdv8
summary8
smaller8
cnnEndCell8
bottomDot8
bottomServices8
formbut8
related8
breadcrumb8
style78
orange8
navlinks8
nwslink8
leftmenu8
maintext8
txt8
secondary8
rub17
whitetext7
container7
cbox7
ta-c7
formtext7
blurb7
highlight7
mainmenu7
cal7
searchtext7
sidebar7
s7
bottom7
blog7
style7
powered7
footertext6
mini6
pagetitle6
imagealign6
grey6
row16
catlist6
ckCol6
binImg6
rubr6
none6
tm6
LastItem6
tagline6
searchform6
separator6
btn6
menu26
titre6
foot_alt6
module-content6
bannerAd6
smalllist6
subject6
sidetitle6
subhead6
alignleft6
alignright6
cnnData16
tabs5
icomtb5
last5
ContentBorder5
timestamp5
Byline5
TextAd5
Label5
default5
banner5
navtext5
udm5
hr5
pagenav5
style65
info5
bottomnav5
B5
alt5
entry-header5
entry-body5
entry-footer5
permalink5
TextBox5
nav35
text25
m5
rights5
current5
bot5
narrowcolumn5
postmetadata5
cnnHeader5
cnnNoBold5
dark5
clickPath5
formbutt5
pt-105
lnav5
footmsg5
navcolor5
f-115
navMainSections5
portletContent5
stitle5
mmsmall5
mastertable5
sidebarad5
cattitle5
f-gray5
ens5
toptab5
fivevert5
disclaimer5
disclaimerlink5
tenvert5

November 5, 2005 | Permalink | Comments (47) | TrackBack