Skip to Main Content

Publish: Research Data Support

Research Data Support

woman in the light of projected code

The library helps campus research labs and centers manage and publish their research data. We've also worked with Information Management Systems & Services (IMSS) to put together a centralized Research Data FAQ.

CaltechDATA and DOIs

Caltech Library offers a free managed data storage and sharing service, the CaltechDATA repository, at data.caltech.edu.

The CaltechDATA repository offers standard data preservation and DOI (permanent identifier) services. We also offer services (at an additional cost) for preserving large volumes of data (> 500 GB). Contact us at data@caltech.edu to discuss the options or read the CaltechDATA FAQ.

Caltech Library also manages custom DOIs for groups on campus. Find out more information, see our DOI page.

Data Management Plans

Funding agencies, such as the National Science Foundation or the National Institutes of Health have specific requirements and templates for Data Management Plans (DMP). We have many resources to help with the new NIH Data Management and Sharing Policy.

The Library created a number of Caltech-specific DMP resources that are available on this webpage, including standard language, DMP checklists, and example DMPs.

Data Consultations and Training

Library staff can help you or your research group work through challenges associated with research data. We can provide:

  • Data management plan (DMP) assistance
  • Data management guidance
  • Capacity planning
  • Storage technology recommendations
  • Data processing suggestions
  • Interactive visualization hosting
  • Long-term archival and data sharing

Contact us at data@caltech.edu to schedule a consultation appointment.

Library staff offer workshops and training in the following data-related areas:

  • Data management
  • Data visualization
  • Software and Data Carpentry

Contact us at library@caltech.edu to schedule a session for your research group, class, or organization.

Research Data FAQ

I’m going to be collecting research data. Where should I put it?

Caltech IMSS provides Box.com cloud storage, with 200 GB of storage for each community member and 1 TB per group. Additional storage is available for an extra cost. Box manages the storage, but you manage access to files. However, Box may not be fast or efficient enough for large amounts of data, and it has a max single file size of 50GB. If your data management needs are too large for Box, you may want to purchase local storage hardware such as a Network Attached Storage device or storage array. IMSS and the library can help you decide on what option is best for your needs (help.caltech.edu select IMSS/Data Storage & Backup or email data@caltech.edu).

I need to analyze research data or run simulations.

The Resnick High Performance Computing Center (HPC) cluster is an excellent option. Your calculations will run on a state-of-the-art resource at Caltech with local support. Your research group leader has to set up an account (hpc.caltech.edu/documentation/getting-started), and there is a charge depending on how much computing time you use. Groups get up to 30 TB of free data storage, although this storage is not backed up, so groups must store primary data elsewhere. National (off-campus) computing resources like ACCESS (https://access-ci.org/) are also available by application and can provide additional computing resources at no charge.

I want to ensure that my data remains available for a long time (like a publication).

You can deposit your files in CaltechDATA (data.caltech.edu), the library-run repository. CaltechDATA accepts files of any type and size, although you should email data@caltech.edu if you’re planning on uploading more than 500 GB of data. Caltech library is responsible for maintaining access to the files, and all data records are assigned a Digital Object Identifier (DOI) to provide permanent linking and simplify citation. You can make your files public immediately or after an embargo period.

I’m developing software and want to make sure it remains available for a long time.

The CaltechDATA repository (data.caltech.edu) can accept software and even has an integration with GitHub to automatically preserve software releases. You can set up the integration following the instructions at https://caltechlibrary.github.io/iga/ or email your GitHub repository to data@caltech.edu for a guided setup.

I want to share data with collaborators or reviewers.

To share research data files you can use the file sharing options in Box.com, which also allows you to set a custom password for the files. Box.com is a complete cloud file service, so you can add collaborators that can access files with Box.com credentials. Unlike services like Dropbox, collaborators can store files in a shared folder using your institutional Box storage allocation.

I’m collecting data on human subjects.

Talk to the Institutional Review Board (IRB) about all data collection and storage plans for your project (irb.caltech.edu/). Box.com, SharePoint, and OneDrive are certified by IMSS for personal data covered by HIPPA or FERPA regulations.