« Noam Chomsky in Forbes Magazine! | Main | Australians at SxSW interactive 2006 »
October 28, 2005
Best Practices in Web Development - results published
Those of you who came to WE05 or who listened to the podCasts might have seen/heard my presentation on a recent survey I did on how well major Australian sites are adhering to best practices in web development (valid HTML/XHTML, CSS, Semantic and Structural use of HTML, Accessibility).
I've just published the whole thing as an article, with all the results (what errors people are making, results for each site surveyed, results by sector).
Its available here long with links to the original slides in PDF, and the podCast.
Hope people might find it interesting/useful
john
October 28, 2005 | Permalink
TrackBack
TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341cbf7d53ef00e55021084c8833
Listed below are links to weblogs that reference Best Practices in Web Development - results published:
» Best practices in Australian web development from 456 Berea Street
How well do major companies and government departments in Australia use web standards and other best practices on their websites? [Read More]
Tracked on Nov 20, 2005 12:59:40 AM
Comments
Excellent article there John. It was really interesting going through the stats!
Posted by: Amit Karmakar | Oct 28, 2005 3:41:42 PM
PHEW! You didn't include our site in your survey :)
Posted by: Nathanael | Oct 29, 2005 12:02:51 AM
Very interesting and useful material indeed.
I have been gathering material about these kind of surveys for some time. Interestingly there has been quite a number of those on this year. I guess a list might be interesting.
* In February, I conducted a survey on 19,655 Estonian web pages - 436 (2.22%) were valid - http://www.triin.net/2005/04/27/Web_Standards_in_Estonia
* In April, Pete from standards-schmandards tested 1020 Swedish websites - 63 (6.6%) were valid - http://www.standards-schmandards.com/index.php?2005/04/10/18-massvalidate
* In May, Andrew Rickmann validated 99 of the top 105 companies in the UK - 11% were valid - http://www.rickmann-design.co.uk/perspective/2005/05/natural-uptake-of-new-technology.html
* In August, I repeated my survey on Estonian pages, this time 20,194 were tested - 440 (2.17%) vere valid - http://www.triin.net/2005/09/18/Web_Standards_in_Estonia_vol_2
* On 8th of September, Molly E. Holzschlag published her results of validating 8 major search engines - only MSN validated - http://webstandards.org/buzz/archive/2005_09.html#a000564
* On 25th of September, Pete from standards-schmandards tested 546 USA government websites - 13 (2.4%) were valid - http://standards-schmandards.com/index.php?2005/10/03/27-government-web-standards-usage-usa
* On 28th of September, 1015 Swedish public sector websites were tested - 60 (5.9%) were valid - http://www.e-namnden.se/validering/050928/
* On 30th of September, you gave a presentation at WE05 about your tests on 83 Australian web sites - 9 (10.8%) were valid - http://westciv.com/style_master/house/good_oil/best_practices/
* On 26th of October, Miles Burke tested 30 Western Australian web development company websites - 8 (27%) were valid - http://www.port80.asn.au/forums/viewtopic.php?t=2233
(These are of course only the ones I have managed to find.)
These are all very interesting surveys, but the results vary a lot: from as little as 2.17% to as much as 27%. Well, the methodology of all these investigations is practically the same (they all used W3C validator), but the selections of web pages differ quite a lot.
If we are after a real number of valid webpages in the word, then all the small surveys are too error-prone. Well, if there's estimated 11 billion pages ( http://www.cs.uiowa.edu/~asignori/web-size/ ) on the internet, then looking at about 100 sites only answers the question "Are there more valid or invalid pages?" (Do not take this personally :-) )
Actually, I consider even my own researches (with thousands of pages) too weak and clumsy. The main question is - How do you get a really representative selection?
Many have chosen to use some sort of selection of "The Top Web Sites". In a way, this is the right decision, but there always remains one question - Where do you draw the line? When is a web site big, and when is it small? How many trees make a forest? But on the other side - very often are the big ones also the last ones, especially when it comes to adopting new kind of tehnologies.
The complete opposite way is to go for some machine-generated selection of pages. This seems perfectly reasonable at first look, but let me tell you what happened with my research. I used a complete listing of all estonian web servers (don't confuse with listing of machines, a better term would be - a listing of websites in .ee domain running on a machine located in Estonia). The selection set seemed perfect, but when I started to look at the pages themselves - the pages that used valid (X)HTML according to my results - I started to change my mind. I discovered that many of these pages were very empty. Many of these had a title like "Under construction" or "Page not found" or "This web site is closed". In short, a lot of those pages (maybe even most of) weren't valid, because the designer was really into web standards. They were valid, because the HTML on those pages was so short (i.e. mostly no tables were used), that they were valid unintentionally/mistakenly/from-pure-luck/...
Some time ago I accidentally spotted a the masters thesis of Dagfinn Parnas from year 2001 - "How to cope with incorrect HTML" http://elsewhat.com/thesis/
As a part of he's thesis, he conducted a survey on 2.4 million pages and found, that only 0.71% of those used valid HTML. What is especially interesting, is where from he got his huge selection. He took the whole list of pages, gathered by the volunteeres in the Open Directory Project. This sort of methodology, for choosing a selection, clearly eliminates the problems I had, because the list is composed by humans, and so does not include pages under construction and other similar oddities. Also, as the selection is so HUGE, it greatly reduces the chance of error. Plus, the list in ODP containes sites from all over the world, in all kinds of languages and from all kinds of places, so it overcomes the limitations of these surveys, conducted on one single country.
Of course, there is one "but" with the results of Dagfinn's research - it was done on 2001, and things have changed since then quite a bit. On the other side, we are really lucky, to have such a wonderful material from the past in our hands - now it's just a matter of will, to repeat the survey as similarly as possible, and with comparing the new results to the past ones, it comes possible to answer to "How much have things changed?" (and do it with numbers).
And this is exactly what I am planning to do in a nearby future.
But I'm not going to limit myself with doing only the simple W3C validator attack. I also want to provide answers to many other questions, including:
* Which DTD-s are used?
* Which character encodings are used?
* Which content-types are used?
* How many pages use CSS? JavaScript? Flash? ...
* What kind of HTML is used? (Most popular elements. Most popular attributes? How many images with empty ALT-text? How much inline CSS?...)
* What kind of CSS is used? (What kind of selectors? What kind of properties? What kind of values? What kind of units are used in CSS values? Which media types? How many external CSS files? Alternate stylesheets?)
* What kind of JavaScript is used? (How many inline event calls? How many external files?)
* What is the content-size/code-size relationship?
* How do sites measure on John Allsopp's scale? :)
Well... actually I wasn't planning to announce this on your website, it just happened this way.
Feel free to send me any comments regarding all this and beyond...
Posted by: Rene Saarsoo | Nov 1, 2005 2:06:52 AM
Very interesting and useful material indeed.
thanks very much Rene
These are all very interesting surveys, but the results vary a lot: from as little as 2.17% to as much as 27%. Well, the methodology of all these investigations is practically the same (they all used W3C validator), but the selections of web pages differ quite a lot.
That's why I wanted to broaden the research from just "is the HTML valid" to other areas of best practice. I also wanted to do it by hand because it would allow me to
see what errors people actually are making
refine the original assumptions I made about what best practices are, and how to measure their use
If we are after a real number of valid webpages in the word, then all the small surveys are too error-prone. Well, if there's estimated 11 billion pages ( http://www.cs.uiowa.edu/~asignori/web-size/ ) on the internet, then looking at about 100 sites only answers the question "Are there more valid or invalid pages?" (Do not take this personally :-) )
I agree. My goals were two fold, and not hugely ambitious
to see just how well major Australian sites were doing
to develop a reusable, largely objective methodology to compare sites in terms of their use of best practices
Many have chosen to use some sort of selection of "The Top Web Sites". In a way, this is the right decision, but there always remains one question - Where do you draw the line? When is a web site big, and when is it small? How many trees make a forest? But on the other side - very often are the big ones also the last ones, especially when it comes to adopting new kind of technologies.
My criteria are outlined in the original article. If doing this by hand, its extremely time consuming - probably around 15 minutes per site at least on average to do a basic survey. So 100 sites is 25 hours work. That's a lot of time :-)
In future, what I did could largely be automated. The hard part is probably deciding whether tables are being used for layout. That might be hard to do, but the rest is pretty straightforward.
But I'm not going to limit myself with doing only the simple W3C validator attack. I also want to provide answers to many other questions, including:
I think that is the right approach. Is it valid is no longer in itself a particularly interesting question. Where is it invalid is interesting, as my results show. The same dozen or so errors are made over and over.
* Which DTD-s are used?
* Which character encodings are used?
* Which content-types are used?
* How many pages use CSS? JavaScript? Flash? ...
* What kind of HTML is used? (Most popular elements. Most popular attributes? How many images with empty ALT-text? How much inline CSS?...)
* What kind of CSS is used? (What kind of selectors? What kind of properties? What kind of values? What kind of units are used in CSS values? Which media types? How many external CSS files? Alternate stylesheets?)
* What kind of JavaScript is used? (How many inline event calls? How many external files?)
* What is the content-size/code-size relationship?
* How do sites measure on John Allsopp's scale? :)
These look very interesting.
I'd add,
what HTML errors are being made - and a frequency count of these
what CSS errors are being made
what accessibility errors are being made
how semantic is the HTML
but of course I would say that as these are the things I looked for in my survey.
Well... actually I wasn't planning to announce this on your website, it just happened this way.
excellent, many thanks for that. Please keep us posted,
john
Posted by: John Allsopp | Nov 1, 2005 11:58:19 AM
The hard part is probably deciding whether tables are being used for layout. That might be hard to do, but the rest is pretty straightforward.
Actually I think, it's not particularly hard to machine-test. If we look at the majority of pages, then we might just say, that when there is a table on the front page of a site, then 99% of the time it is there for layout. Very few sites really present tabular data at their very first page.
But of course there are sites that do - i.e. a TV-channel website with a today's programme. But when we look at the source-code of a typical table-based site, we see something like this:
table border="0" cellpadding="0" cellspacing="0" width="728"
This code definetly identifies a table-based layout. Any site with a CSS-layout would be using CSS to style a table with tabular data.
Another very common thing for table-based sites is using nested tables.
The biggest problem is, how to measure accessibility? A program can only tell you when a web site fails in certain accessibility oriented test. But it can never tell, if a page is accessible. For example we can check how many images have alt-text, but we can't tell if the alt-text is meaningful. And so we can't even say, that a site with alt-text is more accessible than a site without.
img src="a_company_logo.jpg" alt="image"
This is the reason, why I didn't include the accessibility testing into my todo-list. (Well, maybe I do some accessibility testing afterall, but I don't consider it a high priority at the moment.)
I'll probably make a post to WSG, when I have made some more preparations and compiled a more extensive todo-list... stay tuned :)
But for anyone who would like to do some massive accessibility testing, I have an idea to offer:
1) It's clear, that no automated testing can really prove, if a site is accessible.
2) It's also clear, that no human being can survive seeing the "Cynthya Say's" report more than 83 times in a row :)
But what if you take 100 accessibility-aware volunteers from WSG-list and each of those measures the accessibility of about 10 sites?
With careful planning and organization, this could be something that would make even Jakob Nielsen fall off he's chair...
Posted by: Rene Saarsoo | Nov 1, 2005 11:01:00 PM
Rene,
> Very few sites really present tabular data at their very first page
Actually of the 83 I survived, I think none did. I think your suggestion is good, but not perfect. I would suggest that nested tables almost certainly indicate tables for layout - as well as colspans and rowspans. I am sure it would be doable.
> table border="0" cellpadding="0" cellspacing="0" width="728"
actually, something close to that is in the article for the tables of results :-)
As I mention in the original presentation, with accessibility, I was not concerned to test whether a site is accessible, but rather whether is following basic priority 1 and 2 WCA Guidelines. Its a basic benchmarking thing. As soon as you get people involved, unless the methodology is objective, you get inconsistencies, that's why I have tried to be as objective as possible.
j
Posted by: john Allsopp | Nov 3, 2005 10:17:56 AM
Rene,
- img src="a_company_logo.jpg" alt="image"
The last time I used Cynthia Says I seem to remember that it checked if alts contained only the word image and failed the page if so. Perhaps you chose the wrong example in this case.
Posted by: Mike WS | Nov 9, 2005 8:13:27 AM
Excellent Blog every one can get lots of information for any topics from this blog nice work keep it up.
Posted by: Custom dissertation | Aug 7, 2009 9:36:25 PM
yes, there have been many aspects written for web developer, they can get it & I'm impressed with this article
Posted by: web 2.0 development company | Nov 8, 2009 12:09:11 AM
Hi,
I personally like your post; you have shared good insights and experiences. It is such an interesting post.
Posted by: Professional Business Plan | Nov 19, 2009 9:57:31 PM
well its soo good to see this information in your post, i was looking the same but there was not any proper resource, thanx now i have the thing which i was looking for my research.
Posted by: Dissertation Proposal | Dec 29, 2009 11:04:23 PM
I've being searching about Web Development and reading your blog, I found your post very helpful :) . I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog!
Posted by: Affordable Website | Jan 28, 2010 9:22:13 PM
Many articles read, your article is very useful, occasionally there are so few pictures looks very interesting ~ is also very cute, so I learned a lot. Thanks!
Posted by: mbt shoes sale | Mar 22, 2010 2:49:49 PM
The source under title "Best Practices in Web Development - results published" is a great source for those who are looking the above subject. As far as my opinion, and I am of the view that the source is valued and informative information.
Posted by: Custom Essay | Jun 27, 2010 4:22:07 AM
This is a very good idea! Just want to say thank you for the information, you have to share. Just continue to write such a position. I will be your faithful reader. Thank you again.
Posted by: fivefingers | Sep 9, 2010 11:22:58 AM
What web development tools are there that allow you to develop web content with flashy graphics and animation?
Posted by: orange county web development | Nov 25, 2010 9:35:12 PM
This is my first visit to your blog. We are starting a new initiative in the same category of this blog. Your blog gave us important information to act. You did a great job.
Posted by: bbq sauce club | Nov 29, 2010 7:20:08 PM
We are starting a new initiative in the same category of this blog.
Posted by: Fox Hats | Mar 18, 2011 2:49:38 PM
Hola,
Ha hecho un trabajo muy bueno. Hay muchas personas en busca de eso ahora van a encontrar suficientes fuentes por tus consejos.
espera para obtener más consejos acerca de que
Posted by: Careprost | Mar 25, 2011 5:29:09 PM
Hey, Great post **
Posted by: ファロム | Apr 30, 2011 9:26:28 PM
This is really interesting, You're a very skilled blogger. I've joined your feed and look forward to seeking more of your wonderful post. Also, I have shared your site in my social networks!
Posted by: Mobile Computing | May 20, 2011 11:16:57 PM
Hola,Ha hecho un trabajo muy bueno. Hay muchas personas en busca de eso ahora van a encontrar suficientes fuentes por tus consejos.espera para obtener más consejos acerca de que
Posted by: Generic Hyzaar | May 21, 2011 10:29:37 PM
Wow, Great post,Nice work, I would like to read your blog every day Thanks
Posted by: Seo Services India | Jun 25, 2011 10:09:39 PM
I think you are right when you say this. Hats off man, what a superlative knowledge you have on this subject…hope to see more work of yours.
Posted by: Generic Viagra | Jun 27, 2011 9:44:00 PM
thnk you for sahring
Posted by: escort bayan | Jul 29, 2011 10:11:39 AM