Loss is not where you find it
Blue Screen of Death- the result of data loss? CC Image from Flickr User Justin Marty
There is a plot: system complexity conspiring to make data inaccessible. It was no coincidence, I am sure, that my first complete disc failure in 17 years come withn two months of the conversion of my laptop's hard drive to full encryption. Lost laptops and compromised personal details are a national problem. The contents of my own laptop would bore anyone else silly, but I'm sure there are all sorts of laptops carrying private and confidential details that deserve full protection.
I just hope that the new encryption systems that now sits on millions of UK hard drives do indeed give protection, because compromise of data is only one risk. Few of us have data whose loss would compromise national security or even embarrass our employers, but all of us have data that we would hate to lose - as I found out at Denver airport when I was about two start two weeks' work in the USA, and had nothing to work with.
I also hope the UK's major (or not so major) IT departments are collecting statistics on computer failures where encryption is implicated. I know in my case they did not, as my efforts to find a way to diagnose the problem while 8000 miles from base led to various changes, and so my dead laptop is logged as 'system rebuild required', not as 'death by encryption'. It is only through statistics that we can understand incidence of failures and their types, and thereby understand the real risks posed by digital technology. Without knowledge of the risks, we can only speculate about where to place our collective efforts - and budgets - that fall in the general area of 'digital preservation'.
Of course, my dead laptop is but one data point, as I am myself, as to that. But I have more: several times recently I've come across major examples of 'loss of data' - and as with my hard drive, it wasn't the data itself that was lost, it was the complexity above the data that got its knickers twisted and so ceased to function.
- The has a panel, , that is looking at asset management systems. A senior engineer of one of the first such systems to specialise in broadcasting came to Geneva in July, to talk about how a broadcaster 'going tapeless' should go about moving into digital asset management. He mentioned entire collections of online content disappearing, owing to corruption of the database - because an asset management system is 'just' a lot of files, and a database with information about those files. He'd personally experienced that situation twice, and in each case there were backups to revert to, to rebuild the collection and get back into business - after up to two weeks of travail.
- Our 91ȱ collaborative project had a workshop at the beginning of October, where we asked about examples of loss (because I'm trying to collect evidence in order to establish risk - that's what I do). Again, a fully competent IT company working with a major Spanish broadcaster described a database corruption, of an asset management system, and in that case 80% of the material was recovered from backups (taking several days) - and 20% had to be re-ingested, which took several weeks (part time).
- All of which should remind us of the 91ȱ's online picture store Elvis, which crashed some years ago and again it was essentially database corruption, compounded by backups that largely failed to work. There was one very effective backup - Lisa, daughter of Elvis - but that held the video-quality scans and not the full-quality scans (as needed by Radio Times and all the other 91ȱ print publication that Elvis also supports). Something like 100k high-resolution images were lost.
There is a common thread so far - the bits still exist, unaltered, on storage media - but the complexity sitting between the user and the bits has 'ceased to be' in some fashion, and so the whole thing is a dead parrot (and called a storage failure, though it is anything but).
This thread leads to an even greater problems: systems that haven't crashed, but still won't find things, because they are in some way inadequate. We're all now aware of metadata and its purposes, but - just as with data itself - there has to be effective technology using the metadata, or again the results is a 'digital dead parrot'.
You may not know that I'm a prize-winning poet. I was somewhat surprised to learn this myself, but indeed my entry into a competition in Ariel, the 91ȱ's in house paper, won second prize, and they were so pleased that they asked me to make a podcast. As with many print publications, this one also has an online version, where it sticks extras, like my podcast. The problem is, there is no search engine on the online version, or indeed ANY other search technology. There's a list or recent or popular pages, but once content falls off that list, it falls away completely, and becomes as inaccessible as the data on my dead hard drive. As with the hard drive, the data is still there, but inaccessible. The PDF's of the print version are indexed by a 91ȱ search engine, but the online pages are not. The result is an inaccessible poem: even the person who posted my podcast can no longer find it!
An internal 91ȱ publication is a tiny issue compared to bbc.co.uk itself, the 91ȱ's world-class media website. 91ȱ policy is to hold the text from bbc.co.uk in a sort-of archive, but reasons of space/budget/complexity mean that the audio and video content on bbc.co.uk is not archived. The justification is: all that audio and video goes out on radio and TV, and so gets archived separately. Has the validity of that statement been checked? How much audiovisual content is NOT also broadcast? I wish I knew! The business case to build a real archive (something with comprehensive capture, and access) was chopped and chopped until it was reduced entirely to a 90-day legal requirements system, with just a couple of access points. Meanwhile, anybody who does want to see 91ȱ content that has been taken down from bbc.co.uk has to go to , where they do monthly (or thereabouts) scans of the entire internet, and make it available to all through their .
So there are a half-dozen examples, ranging from my laptop to bbc.co.uk, where data can no longer be found because, essentially, of failure or inadequacy of the system sitting between the user and the data. The robust solution to failure is to simplify that technology layer - and unfortunately IT systems are moving in the opposite direction. I fully expect an epidemic of data loss, in direct consequence of the mass installation of encryption on company hard drives. I hope I'm wrong.
Comment number 1.
At 3rd Dec 2009, dennisjunior1 wrote:Richard Wright:
That is sad news, that you're computer failed you...At least...your loss is not in vein...
~Dennis Junior~
Complain about this comment (Comment number 1)
Comment number 2.
At 3rd Dec 2009, Mo McRoberts wrote:“How much audiovisual content is NOT also broadcast? I wish I knew!”
Interesting question. It’s a figure that’s only going to increase in time as “multiplatform” efforts increase!
It doesn’t help, of course, that unlike with “lost episodes” of TV series which people may have VHS tapes (or similar) lying around for, the 91ȱ has a policy† of not permitting downloads of any of the content served via EMP—so if the 91ȱ doesn’t archive it, nobody else can either.
† Yes, I know why. Doesn’t mean it’s sensible in real terms.
Complain about this comment (Comment number 2)
Comment number 3.
At 5th Dec 2009, Jerry Kramskoy wrote:Do you have a feel for whether the issues lie in hardware failure (e.g. bit drop out) for persistently-stored structures associated with the intermediate layers that lead to the actual content, or whether the problem lies in the implementation of the intermediate layers leading to corrupted structures being stored in the first place?
Complain about this comment (Comment number 3)
Comment number 4.
At 7th Feb 2010, Richard Wright wrote:Jerry - I think none of the cases I mention are hardware -- they're either software getting too complex for its own good, or bad human decisions (policies). That really was the point- that we tend to focus on storage and storage management reliability, but all the cases I've recently encountered were so-called higher level:
- failure of data encryption, locking up the PC; probably failure at the level of interaction between the encryption and the OS; nothing to do with storage, but it made the disc unusable!
- 3 sorts of database corruption; I guess this counts as 'intermediate layers', though I think of it as just big applications that are imperfect
- storing content where search doesn't go, and not saving the URL
- not saving material in the first place: updating websites, meaning previous versions just get lost.
Complain about this comment (Comment number 4)