« four things | Main | Robber Barons »

January 26, 2006

Google web authoring statistics

A couple of months back I published Real World Semantics where I used a little custom crawler to investigate the use of class and id values in the real world, to have a little look at just how semantic (or otherwise) the HTML out there is. I slanted the crawling toward sites that were "close" to (within a few link steps) major standards based sites such as zeldman.com and stopdesign.com.

Now Google have published the results of a major web authoring analysis, of a billion pages (and I'm honored they mentioned my piece in their preamble). It goes far beyond just the use of class and id, to the use of headers, scripting, elements, and even editors.

Fascinating stuff!


January 26, 2006 | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Google web authoring statistics:


Great to see you being acknowledged (by Google no less). Very good view of web authoring to date.

Posted by: Benson | Jan 27, 2006 10:53:46 AM

thanks Benson, I was very chuffed.


Posted by: john Allsopp | Jan 27, 2006 11:00:40 AM

This really was fascinating - especially for me to compare the graph of most used elements with my own findings.

But I was a bit surprised, that they used SVG for the figures - I had to get the latest Firefox to view those. Actually I tried to download the SVG files and view them with a separate program, but none of those I had was capable (!) which is strange, because those figures weren't something you would call "hardcore vector graphics". But well... I'm not an SVG expert by far...

BTW, google asked: "If someone can explain why so many pages would use a tag and then not put any cells in it, please let us know."

I saw recently a site that used only a table element, just like this:

{table align="center" width="800"}

I think they're just using table to center the content - and it seems to work (even without tr and td). Admittedly it's a lot more simple than using CSS.

Posted by: Rene Saarsoo | Jan 28, 2006 12:25:09 AM


not sure how that is simpler than

table {text-align: center}

especially as the latter will then work on any page whichis linked to the style sheet. I suspect its largely ignorance (in the nicest possible sense) which keeps people using presentational HTML.


Posted by: john Allsopp | Jan 30, 2006 10:18:24 AM

It would have been nice that google provided the actual stats for the editors used, rather than talk about custom markup tags. They probably didn't do it because of a certain market leader which they don't really want to praise publicly (they won't be caught saying inadvertedly: hey, my chief competitor's tools creates 85% of the world's webpages...).

Posted by: Luis Alberto Barandiaran | Feb 7, 2006 3:19:30 AM


table {text-align:center} is not the same as table with attribute align="center".

The first aligns the text, but the second centers the container.

Posted by: Rene Saarsoo | Feb 16, 2006 1:05:04 AM