What type of data will be produced?
Gather a clear picture of what your data will look like. Is it, for
example, numerical data, image data, text sequences or modeling data? The kind of data you have
will inform many decisions you need to make about storage, backups and
more. Image data requires a lot of storage space, so you'll want to
decide which of your images, if not all, you need to retain, and where
such large datasets can be stored and shared. As for backing up your data, your
research center or university may have the ability to help you. On the
other hand, if you are storing images, you may quickly exceed your
institution's limit for backing up individual laboratories or groups.
Is there existing data relevant to the project?
Does data already exist which could be integrated into the project? Is it available through a disciplinary repository? What other information (software, analysis, procedures) need to be shared with the data to make it usable?
How much of it, and at what growth rate?
Once you know what kind of data you're producing, you'll be able to
assess the growth rate. For example, are you gathering data by hand or
using sophisticated instrumentation that is able to capture a lot of
data at once? Will there be more data as time goes on? If so, you will
need to plan for the growth. What amounts to enough storage this year
may not be sufficient for next year.
Will it change frequently?
The answer to this question impacts how you organize the data as well as the level of versioning
you will need to undertake. Keeping track of rapidly changing datasets
can be a challenge, so it's imperative you begin with a plan that will
carry you through the data management process.
Who is it for?
Who is your audience for the data? How will they use the data? The
answer to this question will tell you how to structure the data and
where to distribute it, among other things.
Who controls it (PI, student, lab, Caltech, funder)?
Before you spend a lot of time figuring out how you're going to store
the data, name it, etc. you need to know if you have the authority to
How long should it be retained? (e.g. 3-5 years, 10-20 years, permanently)
Not all data needs to be retained indefinitely. Figure out what's
important to keep and make sure your plan for those datasets is solid.