How do you future proof your digitally archived documents?

by Lee Koo (ADMIN) CNET staff/forum admin / April 24, 2009 6:43 AM PDT

Hello, over the years I have accumulated a lot of data in
various forms. I have written documents, my own and those
sent to me by others; I have photos in many different file
formats; I have videos in several formats and I have
converted tapes to digital; finally, I have a large
collection of audio recordings. You might also say that I
have a large software collection--it begins with DOS 1.0 for
the OS family and MS Word 1.0 for the word processing family.
My question relates to archiving all of this material. What
format(s) should I use for each to future proof my
collection? I may need to access documents 10 or 15 years
from now and who knows if the future word processors will be
able to open today's .doc files. Will PDF always be
universal? Photos, video, and audio all have efficient
compressed formats, but which ones will survive 50 years from
now? What do we need to do today so that future generations
are not left with a useless alphabet soup of file formats?

--Submitted by C. Philip C., M.D., J.D.

Here are some featured member answers to get you started, but
please read all the advice and suggestions that our
members have contributed to this question.

Stick to standards --Submitted by rbsjrx

The Future of your Data --Submitted by waytron

Several Solutions --Submitted by Flatworm

File formats.. Here today gone tomorrow. --Submitted by hawk318

Damn Good Question --Submitted by Artist3d

Simple Solution --Submitted by friarchuck

Future proof? Men plan, God laughs... --Submitted by Watzman

Hard to be certain, but some formats safer than others. --Submitted by BigGuns149

If you have any additional recommendation or advice for Philip, please click on the reply link and post away. Please be as detailed as possible in your answer to him. Thanks!
First off . . .
by Coryphaeus / April 24, 2009 10:46 AM PDT

No one can tell us what file formats will be available in any future. All we can do is save what we have. For longevity, CDs, DVDs, and Blu-ray are the current media of choice. I say this because even magnetic data such as tape and hard drives have a shelf life. And tape can break, hard drives can crash, and the sun may rise in the west.

What storage device is in the future? One little item is the four quantum states of a special isotope of I believe cesium (don't quote me on the material). It exhibits four states at the molecular level. Storage of unheard of capacity because each molecule can store four bits.

Save what you have to optical media. In any future, if a file format changes, create new files in the new format. If it's available.

I know you're concerned, but you've asked question no one can answer with any reliability.

The question was about data formats, not medai
by rbsjrx / April 24, 2009 12:59 PM PDT
In reply to: First off . . .

All of the points about storage media are well taken, but they don't address the original question. I have document files that, over the years, have migrated from Apple II 143k floppies to Zip disks to Windows and Linux hard drives. Keeping them is simply a matter of transferring them as the older format becomes obsolescent or the media start to evince the effects of oxidation, moisture, etc. However, some of these old files are pretty much useless because the software to make sense of them is no longer available. This is what the original questioner asked about.

Agree on Proprietary vs. Standards
by Hforman / May 1, 2009 3:27 PM PDT
In reply to: First off . . .

There are issues with either proprietary solutions and standards. The fact is, BOTH change over time. Microsoft is itching to do away with anything written by Word 97, which my former worksite did all of their documentation in. Open Document format will probably change as well, especially since many types of data storage (formats) will need to be tweaked as newer concepts and technology come about.

What can you do? Maintain your data in whatever format pleases you but you need to periodically review your data to make sure the format is current. Do a conversion now and then to a newer format. Automate the process if you can. I know we are not supposed to be talking about media here in this discussion as we went over that in great detail prior to this. We did mention that nothing lasts forever. Try to read an 8" floppy disk today! So, if you are going to move your data from one medium to another, it might be a good time to upgrade the format as well, just to be safe.

everything has a shelf life - even discs
by gbswales1 / May 1, 2009 9:18 PM PDT
In reply to: First off . . .

it is well known that discs of all kind also have shelf lives - just as paper photographs, films, videos and the like - I have had compact flash cards fail after a time so guess that nothing is 100% future proof. I think that the storage medium is a far bigger consideration than the file format. There will in my view always be some way to recover files created in a mainstream product even if the company has gone under - eg most word processors can read legacy versions and even those created in other software.

There simply is no such thing as future proof - the only answers are multiple copies stored in multiple location on multiple media formats. It would also be a good idea to maybe archive everything to at least one brand new hard drive which is then carefully stored without further use in an air tight temperature controlled environment.

to make doubly sure then store legacy hardware as well - computer and monitor with the data so that future generations can blow the dust off and get the old "toy" out. With netbooks and the like becoming really cheap this is not so preposterous as it sounds - dont forget to leave the instruction manual in case the concept of keyboards and 2 monitors have long since died! Mind you better explain about electricity, power plugs etc and pray that hasnt been completely replaced with some kind of energy in the air!

Future Proof? little thing called MTBF
by mikefxlee / May 1, 2009 11:23 PM PDT

Uh, mean time between failures. Much lower when one drinks mountain dew and is clumsy with liquids.

There is one method that has survived the test of time, but we would need to relearn much.

It's called oral tradition. Before you go off on the name calling, please look at this

Give it some time to sink in. I mean be the box and think outside yourself for a change.

Then go ahead and call me anything you want (except Roger).
You can make your DVDs last a thousand years, but in that time who remembers how to build a player?

a half bubble short of level
by cesareDH / May 2, 2009 11:38 AM PDT

you need to get you some help

Did you read the link?
by mikefxlee / May 2, 2009 1:40 PM PDT

It demonstrates a method of data compression, decompression, and integrity checking. And that is not my quote. When you apply information theory, and calculate the amount of data transmitted, factoring in that it has already been transformed into a second order context aware structure - as an inherent characteristic. I'm not going to do the math again. No digital standard includes context awareness. You may see a picture of a chair but have no idea what the object is or was used for.

What the context awareness means is you could carve lines into a rock and with the context it might be 500GB of data. So, by handing down the context via oral tradition, there is no concern for formats or standards. The context inherently has that decoding information. Like a Clave rhythm is an internal time synch, this is an internal second order transformation topographical space matrix.

Such is objectively measurable with an oscilloscope and dual FFT audio measurement instruments from Tektronix. When the context factor is included, the quantity of data is very high. Do the math yourself.

As I do know what a golden ratio base number system implies when AND'd to a ternary and binary system, as well as the multiplication and addition relationships involved with the natural logarithm base e. Which should explain it mathematically.

Almost, AND is not the correct term. it's a much more obscure mathematical concept used when mixing different base number systems.

There are published algorithms to calculate such values.

worry about the storage media, not so much the file format
by luminova / May 3, 2009 4:52 PM PDT

I will have to agree with what gbswales1 said.

i would say, for most digital formats, there will always be some sort of backwards compatibility, or at least a special conversion software for that purpose, but if you really worry, you can always progressively update them anyways to be safe. time is on your side for file format compatibility; you can open early video/audio/document files just as easy as the newest formats (perhaps even easier, since they are more universal).

the best way to preserve something is to diversify file formats and storage media. print a copy, burn it to CD, copy it to flash drive, etc. but i think the best is to simply transfer it to contemporary storage media as they become available, to keep them accessible.

i strongly advise AGAINST using media such as CDs, flash drives, etc. these are good for securing your data against disaster, but NOT against time. those things will be obsolete and unsupported within a couple decades, if not sooner. how many of you have a record player? exactly. computers don't even COME with floppy drives anymore. it is best advised to keep transferring them to the newest forms of storage media.

Just so you're aware...
by rmazzeo / May 2, 2009 10:01 PM PDT
In reply to: First off . . .

...optical media has a shelf life also, the longevity of which is in dispute. Optical media can also get scratched up unless stored carefully. External hard drives are much more reliable in my estimation.

Future proof media
by thatJC / April 24, 2009 12:33 PM PDT

All the possible file formats are not the only problem. What media will survive for more than a few years? CDs and DVDs are not nearly as long-lasting as many people believe.

Mass-produced music CDs are stamped from an original with physical pits to diffuse the reading laser beam away from the normal mirror surface - creating the digital one's and zero's. Pretty robust.

Blank consumer CDs/DVDs use a low power laser to heat up a dye on the disk surface. These dyes can fade in even a couple of years, under some conditions. Before choosing an archive format do your homework!

Stick to standards
by rbsjrx / April 24, 2009 12:52 PM PDT

The answer is different for different types of documents. In general, though, use the simplest and most popular standards-based format that you can.

For text, use ASCII if possible. If you need formatted text, use PDF which can be readily converted to PostScript. If you use special fonts, be sure and archive them as well as your document. Once you get it to PostScript, it can be manipulated and printed with any number of standard tools, e.g. LaTeX, which have been around for decades already and are likely to still be around in decades to come.

For images, stick to RAW or TIFF if you can afford the storage space (RAW is better since TIFF is, technically, a proprietary Adobe format). If not, stick to free, non-proprietary compressed standards such as JPEG or PNG. Avoid GIF unless you need animation - it's technically a proprietary format, and limited to 255 colors. BMP is probably OK, but its close association with Microsoft suggests there may be better alternatives.

For video, use MPEG, which is another free, non-proprietary standard.

The same rationale applies to music, where the safest alternatives are either WAV, if you can afford the storage space, or MP3 (part of the MPEG standard) if you need compression.

This brings me to the last point - compressed storage. Data compression is a wonderful thing for archival storage, but you need to avoid niche or proprietary tools. I use RAR all the time on my Windows machines, but whenever I need something to be archived forever, I stick with ZIP which is free, standardized, and universally available. On my Linux machines, I'll use either zip or gzip.

by rbsjrx / April 24, 2009 1:13 PM PDT
In reply to: Stick to standards

A subsequent post compels me to add an explicit caveat. Regardless of what type of computer or what software you use, when I say to avoid proprietary formats, that especially includes _any_ Microsoft formats. Even if you use MS Office, there are options to save documents in formats other than Microsoft's native formats. Some are already on the "Save as" or "Export" menus. Others may require the use of 3rd-party software (e.g. Acrobat or one of the several free PDF writers).

Why? Microsoft has a terrible history of releasing new versions of software which have various levels of incompatibility with previous versions. Right now, I can use a Microsoft-supplied filter to provide interoperability between Word 2003 and Word 2007 documents, but will such filters be available for Word 2020 when I have a Word 97 document? Using PDF/PostScript, I can be reasonably confident I won't be stuck.

Devil's advocate
by BigGuns149 / May 1, 2009 5:50 PM PDT
In reply to: BTW...

While historically I would agree with you since Microsoft has a published standard for OOXML in theory there should be viewers for it as long as there is demand for them and someone willing to write one.

That being said my gut instincts tell me that ODF has a better long term outlook, but Microsoft has a lot of money to try to ensure that their format doesn't get replaced.

Collapse -
RTF (Rich Text Format)
by Owyn / May 8, 2009 12:35 AM PDT
In reply to: Devil's advocate
by PsychGen / May 1, 2009 10:42 AM PDT
In reply to: Stick to standards

Where we are going, there are no standards...

by FourWheelVibe / May 1, 2009 11:21 AM PDT
In reply to: Stick to standards

PDF/A is the best approach for long term archival. It is an ISO standard and the format used by the US National Archives and Records Administration (NARA). The advantage to PDF/A is that it will retain the formatting and presentation of the existing documents in a format that is meant for long term archival - 100+ years. If you save the text in ASCII you lose all formatting.

The only minor disadvantage to PDF/A is that you must embedded all fonts which can make the files large. There are some licensing issues with certain fonts that you also need to be aware of (some font producers prohibit embedding). Aside from being to archive textual information in PDF/A you can also use it to store graphic and image formats. There are a number of tools on the market that enable creation PDF/A compliant documents.

For more info on PDF/A look it up on wikipedia.

Raw is not a standard
by smcilree / May 1, 2009 12:14 PM PDT
In reply to: Stick to standards

I am in complete agreement with Rbsjrx when it comes to sticking with standard data formats for archival purposes. However, you need to be aware that RAW formats for digital images are proprietary to each camera manufacture and widely different. The closest to a standard RAW format is the Digital Negative (DNG). Although this format was developed by Adobe, it has been made freely available to all software and camera makers. In addition, Adobe offers software for no charge which can be used to convert the various camera RAW formats to DNG. Today, not only Adobe image editing software can handle DNG, but that of most other software makers as well. Finally, since the introduction of the DNG specification, many cameras have been introduced using it as their RAW format.

As a standard which RAW format do you suggest.
by GENEMETZ / May 1, 2009 1:02 PM PDT
In reply to: Stick to standards

Don?t different Cameras give you different RAW formats?

by inkindotpro / May 1, 2009 9:22 PM PDT
In reply to: Stick to standards

I don't think of tiff as standard file format for this century...jpeg coma and gho togather?...images...oh iMagiffs...

by suprememouser / May 2, 2009 7:18 AM PDT
In reply to: Tiff?

Libraries use TIFF

Hire a Part-time Archivist
by tonyny77 / April 24, 2009 12:57 PM PDT

The good doctor clearly has a vast array of material and the set-up for his question emphasizes his intention to keep this information for a long, long time.

You didn't express much concern regarding media survivability/data integrity. But, naturally, these should be great concerns as well. Suffice it to say that no storage medium is perfect, with each having its own strengths and frailties. Each medium type you use ? magnetic tape, photographs on paper, hard disks, optical media, etc. ? has different survivability probabilities and care requirements. Clearly, you must become familiar with each and prepare accordingly.

As for formats becoming obsolete, this is also a continuing problem that appears to have no end in sight. To deal with this, I'd strongly suggest you take a careful inventory of all your data types and list every storage format you use: Word document, Excel spreadsheet, PNG/TIF/JPG graphic format scans, magnetic-tape-based audio/video recordings, etc. In some cases, knowing the type might not be enough; you may also need to note the format versions involved. Then carefully determine the current state/popularity of that format and assess its future prospects.

For example, if you had a collection of reel-to-reel audio tapes, and if reel-to-reel decks were losing popularity and getting increasingly hard to find, clearly that would indicate it's time to transcribe your tape collection to another viable format.

Now, if you have as much material as I've imagined, and if the variety of your collection is itself a challenge, then hiring a professional may indeed be your best option. As my subject suggested hiring "a part-time archivist," it may have seemed like a joke, but if your collection is THAT important, if you have as MUCH data as I think, and if you indeed need your data to survive for DECADES and beyond, then saying you need professional help with this would certainly be no joke.

Good luck!

Storing Documents
by saadhusain / April 24, 2009 1:36 PM PDT

1. Reduce. Do you really need DOS 1.0 programs? If so for what?

2. Categorize. What's the point of having oodles of information if you can't find it quickly. Make sure with every format that you save, there is a program that is also saved that can read it.

3. Offload. No matter what technology you have, it is prone to failure. Hard drives do bad. CD/DVD/BRD disks go bad, get scratched, lost etc. Best is to save in the cloud and let someone else worry about backups. Here are some sources to get you started:

Simple answer, not so simple question
by rbsjrx / April 24, 2009 2:57 PM PDT
In reply to: Storing Documents

"1. Reduce. Do you really need DOS 1.0 programs? If so for what?"

I don't presume to know why someone might want an old document. Of course, it's good to think of such things, but the question is valid even if the motivation behind it isn't.

"2. Categorize. What's the point of having oodles of information if you can't find it quickly. Make sure with every format that you save, there is a program that is also saved that can read it."

Good advice, but again, given the scope and nature of the question it's non sequitur. Most importantly, what if the program that saved or can read it can't be run? I have some old files on 9-track tapes which were generated on hardware you can only find in museums by software no one today has heard of.

Never assume that proprietary applications or OS's will survive indefinitely. The best you can do is to insure that the data survives in a format that's likely to have some support far into the future. That gets us back to standards and away from proprietary solutions.

"3. Offload. No matter what technology you have, it is prone to failure. Hard drives do bad. CD/DVD/BRD disks go bad, get scratched, lost etc. Best is to save in the cloud and let someone else worry about backups..."

Again, this is good advice, bit it still skirts the issue. The question wasn't about media, but file formats.

The most survivable media according to US Defense Department
by saadhusain / April 24, 2009 2:58 PM PDT
In reply to: Storing Documents

Back when I was working at Hughes Aircraft, I came upon a metallic punched tape machine. It writes and reads metallic punched tape which are not subject to fire or electromagnetic discharge or nuclear radiation. That's probably the ultimate but impractical for most of us.

Impractical, yes...
by rbsjrx / April 24, 2009 3:07 PM PDT

...and it still doesn't answer the original question!

Upon death . . .
by JacqJ2u / April 24, 2009 1:57 PM PDT

I wonder what happens to your archive when you die? (I don't have this service)

by ajrozsa / April 24, 2009 2:35 PM PDT
In reply to: Upon death . . .


by Percy Bysshe Shelley

I met a traveler from an antique land
Who said: Two vast and trunkless legs of stone
Stand in the desert. Near them, on the sand,
Half sunk, a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed;
And on the pedestal these words appear:
?My name is Ozymandias, king of kings:
Look upon my works, ye Mighty, and despair!?
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.

Upon death...
by rbsjrx / April 24, 2009 2:58 PM PDT
In reply to: Upon death . . . all becomes someone else's problem!

rackem up
by DADSGETNDOWN / May 1, 2009 11:34 AM PDT
In reply to: Upon death...

useless post, rackem up.
Delete it, just like this one.

by Nargg / May 1, 2009 10:31 PM PDT
In reply to: rackem up

Why was it useless? Seems to be very appropriate.

