Skip to Main Content

Data Management Planning for NMSU Researchers

Finding Data Repositories

Data should be deposited into a data repository or archive for long-term preservation. 
Use these directories to find a repository for your data or check out either the general data repositories or the discipline-specific ones below. 

Repository Finder - Helps you find an appropriate repository to deposit your research data - Search this registry of repositories to find a good fit for your data

OpenDOAR - "OpenDOAR is the quality-assured, global Directory of Open Access Repositories. You can search and browse through thousands of registered repositories based on a range of features, such as location, software or type of material held." 

Open Access Directory's list of disciplinary repositories

NIH-supported data repositories. List maintained by the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC).

PLOS recommended repositories - PLOS journal recommendations for general and discipline-specific data repositories.

Scientific Data recommended repositories - Recommendations for general and discipline-specific data repositories. Nature Publishing Group.

Google Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible and useful.

General Data Repositories

The following data repositories accept submissions from any discipline. To find discipline-specific repository options, see the Disciplinary Repositories list below. 

  • Dryad Digital Repository
    • A curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types. NMSU is a Dryad member. See Dryad page for more details.
  • FigShare
    • "A repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. figshare features aim to help you organize your research and get as much impact for it as possible, without adding time or effort to your day."
  • Harvard Dataverse
    • “Harvard Dataverse is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. You can open your data to the general public, or restrict access and define customizable terms of use.”
  • Mendeley Data
    • "Mendeley Data is an open research data repository, where researchers can upload and share their research data. Datasets can be shared privately amongst individuals, as well as published to share with the world. Search 26+ million datasets from domain-specific and cross-domain repositories."
  • Open Science Framework (OSF)
    • "OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle. As a collaboration tool, OSF helps research teams work on projects privately or make the entire project publicly accessible for broad dissemination. As a workflow system, OSF enables connections to the many products researchers already use, streamlining their process and increasing efficiency."
  • Zenodo
    • "An open repository for all scholarship, enabling researchers from all disciplines to share and preserve their research outputs, regardless of size or format. Free to upload and free to access, Zenodo makes scientific outputs of all kinds citable, shareable and discoverable for the long term"


  • SIMBAD - The SIMBAD astronomical database provides basic data, cross-identifications, bibliography,y and measurements for astronomical objects outside the solar system. Search SIMBAD by object name, coordinates, and various criteria. Lists of objects and scripts can be submitted.
  • HEASARC (NASA) - The High Energy Astrophysics Science Archive Research Center (HEASARC) is the primary archive for NASA missions dealing with electromagnetic radiation from extremely energetic phenomena, from black holes to the Big Bang. Since its merger with the Legacy Archive for Microwave Background Data Analysis (LAMBDA) in 2008, it includes data obtained by NASA's high-energy astronomy missions from the extreme ultraviolet through gamma-ray bands, along with data from missions that study the relic cosmic microwave background.
  • Infrared Science Archive (US) - IRSA is chartered to curate the calibrated science products from NASAs infrared and sub-millimeter missions, including five major large-area/all-sky surveys. IRSA data sets are cited in about 10% of astronomical refereed papers.
  • Extragalactic Database - NASA's archive of data for over 3 million extragalactic objects. It is built around a master list of extragalactic objects for which cross-identifications of names have been established, accurate positions and redshifts entered to the extent possible, and some basic data collected. Bibliographic references relevant to individual objects have been compiled, and abstracts of extragalactic interest are kept online. Detailed and referenced photometry, position, and redshift data, have been taken from large compilations and from the literature. NED also includes images from 2MASS, from the literature, and from the Digitized Sky Survey. NED's data and references are being continually updated, with revised versions being put on-line every 4-6 months.
  • National Virtual Observatory (US) - Astronomical data from ground and space-based telescopes. Includes data analysis tools. Discover, retrieve, and analyze astronomical data from archives and data centers around the world.
  • National Space Science Data Center (US/NASA) - The National Space Science Data Center is NASA's permanent archive for space science mission data.
  • Sloan Digital Sky Survey - Download optical images of the sky. See also, SkyServer for educational portal to the data. The database contains deep, multi-color images covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000 galaxies and more than 120,000 quasars.
  • Astrophysics Data System (US/Smithsonian) - The ADS maintains three bibliographic databases containing more than 9.5 million records: Astronomy and Astrophysics, Physics, and arXiv e-prints.
  • Blue Obelisk Data Repository (US/MIT) - The Blue Obelisk Group has been in existence as an extended community since 2002. These concepts are Open Source, Open Standards and Open Data (but not necessarily Open Access).
  • The Canadian Astronomy Data Centre (Canada) 
  • Strasbourg Astronomical Data Center (France) - Centre de Données astronomiques de Strasbourg.


Atmospheric Sciences & Climatology

Biology & Life Sciences

  • Arabidopsis Information Resource - The Arabidopsis Information Resource (TAIR) collects information and maintains a database of genetic and molecular biology data for Arabidopsis thaliana, a widely used model plant.
  • Databases at EBI (UK) - The European Bioinformatics Institute (EBI) is an academic research institute located on the Wellcome Trust Genome Campus in Hinxton near Cambridge (UK), part of the European Molecular Biology Laboratory (EMBL).
  • DataBasin (US) - Centers are focal topics or geographies of special interest to Data Basin. Centers provide experts, datasets, maps, galleries, working groups that facilitate more effective collaboration.
  • Ensembl - The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
  • GenBank (US/NCBI & NIH) - GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The GenBank database is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data.
  • Gene Expression Omnibus (US/NCBI & NIH) - Gene Expression Omnibus: a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.
  • Global Biodiversity Information Facility - The Global Biodiversity Information Facility (GBIF) was established by governments in 2001 to encourage free and open access to biodiversity data, via the Internet. Through a global network of countries and organizations, GBIF promotes and facilitates the mobilization, access, discovery and use of information about the occurrence of organisms over time and across the planet.
  • Knowledge Network for Biocomplexity - A national network intended to facilitate ecological and environmental research on biocomplexity. For scientists, the KNB is an efficient way to discover, access, interpret, integrate and analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers.
  • Maize Genetics and Genomics Database - MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays.
  • Molecular Biology Databases - This work is being developed under the auspices of the Science Commons Data project and builds upon the Science Commons Open Access Data Protocol
  • NeuroMorpho (US/George Mason) -NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications. It contains contributions from over 60 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared.
  • PaleoBiology Database The Paleobiology Database seeks to provide researchers and the public with information about the entire fossil record. It has expanded continuously since 2000 thanks to the efforts of 296 paleontologists from around the world.
  • Protein Databank - The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
  • Universal Protein Resource Uniprot The mission of UniProt is to provide the scientific community with a comprehensive, high-quality, and freely accessible resource of protein sequence and functional information.
  • Veterinary Medical Database - The VMDB database started in 1964 as an initiative of the National Cancer Institute for the purpose of studying cancer in animals. Since then, 26 universities have submitted more than 7 million records to this database.


Computer Science

Earth & Planetary Sciences


Geosciences & Geospatial


Health Sciences



Social Sciences