On the problems of information archiving

Linux Journal runs an article entitled Everybody’s Guide to OpenDocument. In explaining why it is important to have Open Standards to store data, they refer to a story of scientists who couldn’t read magnetic tapes from the 1976 Viking landings on Mars, with their data being stored in an unknown format. And that’s only data that’s 30 years old.

While scholars can still read the 1086 tome, the digital version needs customized software and hardware that are breaking down from old age, meaning records from just 17 years ago are rapidly vanishing.

Although this goes about the format in which data is stored, the problem goes even further when you consider the media on which data is stored. Ask my customers how confident they are with backups. One of them already has a problem with reading his tapes from 5 years ago. We sold him a new server with a newer tape drive 3 years ago. The old one isn’t supported any more. Same story last month, with that same customer. He needed a new server.

Data is lost. Zero. Nada. /dev/null.

Face it, what can you do to be sure to have a working tape drive along with your old tapes? Maybe big enterprises can bear the cost to provide spares, but when it comes to it, they will also face the fact that the manufacturer dropped support. Small companies? Private persons? If they’re lucky, they might find something on ebay. Right. That’s the way we will preserve our history.

My mother has been cleaning and ordering my aunt’s house last year. It was fun to check all the old photographs of my grandparents and relatives she found. She even made a dedicated ancestor’s album with all those pictures. Which still will be around in the coming decades. And at no significant cost. I don’t wan’t to think what’s going to be left of my albums in a couple of years. And at least I’m doing backups. And pay for hosting.

Archive information? Are you going to trust your stuff to CD’s of DVD’s? I’m not. They might last a couple of years, but that’s going to be it. The only way I’m confident of keeping my data right now is by keeping it on-line. On my file server. If something breaks, I’m going to know and I (hopefully) will have a recent backup. How the hell am I supposed to know if my archived data is still safe? Testing each CD every 6 months? Recopying them every year? Keeping 2 copies? Do you want to trust your data to Google Photo Album?

As I said, I’m a geek thinking about this stuff. It’s my job. But every decent normal person I know of sucks at backups. And actually, I can’t blame them. They never had to do that with plain old paper photo albums. Why should they now?

Because that’s progress?

While scholars can still read the 1086 tome, the digital version needs customized software and hardware that are breaking down from old age, meaning records from just 17 years ago are rapidly vanishing.

Back to basics?