I was going to save this for another day, but it’s been on my mind all afternoon. One of the guys at Sapient had the bright idea of feeding a huge number of government URLs into Google and seeing how many pages are spidered for each. Then he put it all in a spreadsheet and with a bit of tinkering from Dan, this picture came out.
What it shows (with the blue line) is the size of individual government websites (at least as far as Google is concerned, which might mean that some pages aren’t spidered or can’t be reached) … the biggest comes in at 113,000 pages (I so, so want to name names here, but I’m not going to – but it’s no site that you would ever think would be that big) and there are 3 sites with just two pages (one of which is my own http://www.gateway.gov.uk – the rest of the site is hidden behind passwords and such). The drop off in size is stunning – by the time you get to 50 sites, you’re down below 10,000 pages; at 100 sites, you’re around 5,000; at 200, 2,500; and at 500, you’re at less than 500 pages (467 to be exact). There are nearly 800 sites in the list, with only .mod, .nhs etc excluded.
The purple line (at least it’s purple on my screen), shows the percentage of the total, cumulatively represented. The first 50 sites account for guess how much of the total? 58.3%. The first 100 make up 71.1%. If Pareto was ever in doubt, he could look at this data for extra reassurance. There’s something like 2.6 million pages of content represented here.
I did go and visit the biggest site – it’s nicely put together with all kinds of good features. But two things knocked me out – to get to search you have to click to another page, and when you do use search it’s a very simple form version without much scope to focus it in. With a site that is that big, I’m surprised to see search relegated off the home page. That said, the navigation is nice – but I know from our own research and evidence that people use search more instinctively than they do navigation unless they’re familiar with your site. My take is that if we can make navigation ever more consistent across government, then there’s a chance that people will find things more easily because they won’ t have to relearn as they go from site to site.
I’d always assumed government sites would be a tadpole – big head, long thin body – but this really stunned me. The next trick would be to map usage and cost data against each of the blue dots, or even number of content authors or uptime or any of a range of data. There could be a whole PHD research project in this one slide alone. I don’t have much of that data and I doubt it even exists for such a wide range of sites, but it’s out there somewhere for the Top 50. I’m going to go hunting.