Skip to main content
Caltech Library logo

Research Data Management: File Formats

File Formats for the Long Haul

The file format(s) in which you record, store, and transmit your data is a primary factor in one's ability to use your data in the future.

Since technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce it becomes unavailable?

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.           

Examples of preferred format choices:

  • PDF/A, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

For examples of how data archives treat different file formats, see the UK Data Archive page on data formats and software. Note that not all repositories are able to migrate data files to newer file formats for preservation.

Head of Research Services

Gail Clement's picture
Gail Clement
Contact:
Sherman Fairchild Library, 3rd floor
626-395-1203
Social:Twitter Page

Librarian

Kathy Johnson's picture
Kathy Johnson
Contact:
Sherman Fairchild Library 1-43
Rm. 324
(626) 395-6065

Chemistry & Biology Librarian