File Version Control
Keeping track of versions of documents and datasets is critical. Strategies include:
- Directory Structure Naming Conventions
- File Naming conventions
Always record every change to a file no matter how small. Discard obsolete versions after making backups.
Directory Structure Naming Conventions
When organizing files, directory top-level folder should include the project title, unique identifier, and date (year).
The substructure should have a clear, documented naming convention; for example,
each run of an experiment, each version of a dataset, and/or each person in
File Naming Conventions
- Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
- Identify the activity or project in the file name
Use free tools to help you:
File Naming Conventions for Specific Disciplines
Many disciplines have recommendations, for example:
Data Identifiers for Sharing Your Data
The information at the beginning of this page will help you organize your
datasets for your own use. You may want to consider using more sophisticated
name schema if you want to share or cite your data. You'll want put your datasets
where other people can access them, and give your datasets identifiers that
can be referenced easily.
Data identifiers must be globally unique and persistent. That is to
say, they must not be repeated elsewhere and they must not change over
There are many different schemes:
- PURL -- A PURL is a Persistent
Uniform Resource Locator. Functionally, a PURL is a URL. However,
instead of pointing directly to the location of an Internet resource, a
PURL points to an intermediate resolution service. The PURL resolution
service associates the PURL with the actual URL and returns that URL to
the client. Caltech CODA provides Persistent URLs.
- DOI -- A DOI (Digital Object
Identifier) is a name (not a location) for an entity on digital
networks. It provides a system for persistent and actionable
identification and interoperable exchange of managed information on
- ACCESSION -- Accession numbers used by the National Center for Biotechnology Information (NCBI) are unique and citable.
- InChI -- The IUPAC
International Chemical Identifier (InChI) is a non-proprietary
identifier for chemical substances that can be used in printed and
electronic data sources thus enabling easier linking of diverse data
- URI --
Uniform Resource Identifier (URI) consists of a string of characters
used to identify or name a resource on the Internet. Such identification
enables interaction with representations of the resource over a
network, typically the World Wide Web, using specific protocols.