Data Planning Checklist
Data Planning Questions:
- What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?
- How much data will there be? How quickly will it grow? How often will it change? Once archives/stored, what kind of access will be needed to use it?
- Who will use the data now, and in the future?
- Who controls the data (PI, student, lab, Caltech, funding agency)? What intellectual property considerations might apply?
- How long should the data be retained? How long would you expect it to be useful, e.g. through the end of grant/experiment, 3-5 years, 10-20 years, permanently?
- Is there good project and data documentation?
- What directory and file naming conventions will be used?
- What project and data identifiers will be assigned?
- What file formats are used? Are they standards-based or proprietary?
- Are there tools or software needed to create/process/visualize the data? Are the tools or software proprietary?
- Is there an ontology or other community standard for data sharing/integration?
Access, Sharing, and Re-use
- Any special privacy or security requirements? e.g., personal data, high-security data
- Any sharing requirements? e.g., funder data sharing policy
- Any other funder requirements? e.g., data management plan in grant proposals
- What is your storage and backup strategy?
- When will it be shared and where? How broadly will it be shared? Are there I/O throughput issues with respect to the size of the datasets?
- Who in the research group will be responsible for data management?
The following guides cover general principles for managing your data, plus select information related to particular formats or disciplines.
- Australian National Data Service: Data Management for Researchers
National University: Data Management:
Information from courses and a manual on data management.
- CIESIN: Geospatial Electronic Records: Resources on managing and preserving geospatial data and related electronic records.
- ICPSR Guide to Social Science Data Preparation and Archiving (pdf): Outlines best practices throughout the research process, including applying for a research grant, collecting data, and preparing data for deposit in a public archive.
- Oak Ridge National Laboratory: Best Practices for Preparing Environmental Data Sets to Share and Archive: Describes the practices to make data sets ready to share with others.
- UK Data Archive: Create & Manage Data: Provides best practice strategies and methods for creating, preparing and storing shareable datasets. See also Managing and Sharing Data: a Best Practice Guide for Researchers (pdf).
Sherman Fairchild Library, 1st floor
george AT library.caltech.edu