Napsterising Government Data

I spoke with an organisation the other day who hold 2 petabytes of digital image information. They see that moving to 20 petabytes over the next 5+ years.

When you’re talking that volume of data, in any current delivery model, you’re likely to be locked in for the long term – you’ll never move out of the data centre you’re in and will be forced to add ever greater bandwidth and infrastructure capability to deliver the information. You will, of course, need to employ an aggressive strategy for dealing with the large amount of data that will be accessed frequently shortly after creation, occasionally some weeks after creation and then almost certainly never again.

I propose that the data be split up and stored (in multiple copies) at every possible [government] network node using a BitTorrent/Napster-like algorithm that breaks data into tiny chunks storing it in several of those network nodes and then reassembling it from those sources when its needed. It’s possible that the storage equipment in each node will need to have the algorithm directly coded within it.

Resilience is inherent and performance is improved because of the spread. Sure, you need more storage than you originally had (but you needed 2x your storage to provide for DR in the first place) but you need a lower grade of storage to achieve the same performance and throughput.

This should result in overall lower cost for storing and managing the data and a greater performance for those recalling the data. In this case, by the by, the data is X-Ray images.

Leave a Reply