LibGuides: Data Management Planning for NMSU Researchers: Data Repositories &amp; Directories

Use these directories to find a repository for your data or check out either the general data repositories or the discipline-specific ones listed in the accompanying tabs.

Repository Finder - Helps you find an appropriate repository to deposit your research data

reData.org - Search this registry of repositories to find a good fit for your data

OpenDOAR - "OpenDOAR is the quality-assured, global Directory of Open Access Repositories. You can search and browse through thousands of registered repositories based on a range of features, such as location, software or type of material held."

Open Access Directory's list of disciplinary repositories

NIH-supported data repositories. List maintained by the Trans-NIH BioMedical Informatics Coordinating Committee (BMIC).

PLOS recommended repositories - PLOS journal recommendations for general and discipline-specific data repositories.

Scientific Data recommended repositories - Recommendations for general and discipline-specific data repositories. Nature Publishing Group.

Google Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible and useful.

General Data Repositories

The following data repositories accept submissions from any discipline. To find discipline-specific repository options, see the Disciplinary Repositories list below.

Dryad Digital Repository
- A curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types. NMSU is a Dryad member. See Dryad page for more details.
FigShare
- "A repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. figshare features aim to help you organize your research and get as much impact for it as possible, without adding time or effort to your day."
Harvard Dataverse
- “Harvard Dataverse is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. You can open your data to the general public, or restrict access and define customizable terms of use.”
Mendeley Data
- "Mendeley Data is an open research data repository, where researchers can upload and share their research data. Datasets can be shared privately amongst individuals, as well as published to share with the world. Search 26+ million datasets from domain-specific and cross-domain repositories."
Open Science Framework (OSF)
- "OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle. As a collaboration tool, OSF helps research teams work on projects privately or make the entire project publicly accessible for broad dissemination. As a workflow system, OSF enables connections to the many products researchers already use, streamlining their process and increasing efficiency."
Zenodo
- "An open repository for all scholarship, enabling researchers from all disciplines to share and preserve their research outputs, regardless of size or format. Free to upload and free to access, Zenodo makes scientific outputs of all kinds citable, shareable and discoverable for the long term"

Disciplinary Repositories

Click on a subject area to find listings by discipline

Astronomy/Astrophysics	Atmospheric Sciences/Climatology	Biology/Life Sciences
Chemistry	Computer Science	Earth/Planetary Sciences
Engineering	Geoscience/Geospatial Data	Health Sciences
Humanities	Physics	Social Sciences

Astronomy/Astrophysics

SIMBAD - The SIMBAD astronomical database provides basic data, cross-identifications, bibliography,y and measurements for astronomical objects outside the solar system. Search SIMBAD by object name, coordinates, and various criteria. Lists of objects and scripts can be submitted.
HEASARC (NASA) - The High Energy Astrophysics Science Archive Research Center (HEASARC) is the primary archive for NASA missions dealing with electromagnetic radiation from extremely energetic phenomena, from black holes to the Big Bang. Since its merger with the Legacy Archive for Microwave Background Data Analysis (LAMBDA) in 2008, it includes data obtained by NASA's high-energy astronomy missions from the extreme ultraviolet through gamma-ray bands, along with data from missions that study the relic cosmic microwave background.
Infrared Science Archive (US) - IRSA is chartered to curate the calibrated science products from NASAs infrared and sub-millimeter missions, including five major large-area/all-sky surveys. IRSA data sets are cited in about 10% of astronomical refereed papers.
Extragalactic Database - NASA's archive of data for over 3 million extragalactic objects. It is built around a master list of extragalactic objects for which cross-identifications of names have been established, accurate positions and redshifts entered to the extent possible, and some basic data collected. Bibliographic references relevant to individual objects have been compiled, and abstracts of extragalactic interest are kept online. Detailed and referenced photometry, position, and redshift data, have been taken from large compilations and from the literature. NED also includes images from 2MASS, from the literature, and from the Digitized Sky Survey. NED's data and references are being continually updated, with revised versions being put on-line every 4-6 months.
National Virtual Observatory (US) - Astronomical data from ground and space-based telescopes. Includes data analysis tools. Discover, retrieve, and analyze astronomical data from archives and data centers around the world.
National Space Science Data Center (US/NASA) - The National Space Science Data Center is NASA's permanent archive for space science mission data.
Sloan Digital Sky Survey - Download optical images of the sky. See also, SkyServer for educational portal to the data. The database contains deep, multi-color images covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000 galaxies and more than 120,000 quasars.
Astrophysics Data System (US/Smithsonian) - The ADS maintains three bibliographic databases containing more than 9.5 million records: Astronomy and Astrophysics, Physics, and arXiv e-prints.
Blue Obelisk Data Repository (US/MIT) - The Blue Obelisk Group has been in existence as an extended community since 2002. These concepts are Open Source, Open Standards and Open Data (but not necessarily Open Access).
The Canadian Astronomy Data Centre (Canada)
Strasbourg Astronomical Data Center (France) - Centre de Données astronomiques de Strasbourg.

Atmospheric Sciences & Climatology

Antarctic Environmental Data Centre (UK)
The Polar Data Centre (PDC) coordinates the management of data collected by UK funded scientists in the polar regions, and replaces the Antarctic Environmental Data Centre (AEDC) from 1st April 2009.
Atmospheric Radiation Measurement Climate Research Facility Data Archive (US)
Data collected through the routine operations and scientific field experiments of the ARM Climate Research Facility are stored at and distributed through the Archive. These data are available free of charge to the public.
Biological and Chemical Oceanography Data Management Office (US)
The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was created in late 2006 to serve PIs funded by the NSF Geosciences Directorate (GEO) Division of Ocean Sciences (OCE) Biological and Chemical Oceanography Programs and Office of Polar Programs (OPP) Antarctic Sciences (ANT) Organisms & Ecosystems Program.
British Atmospheric Data Centre (UK)
The British Atmospheric Data Centre (BADC) is the Natural Environment Research Council's (NERC) Designated Data Centre for the Atmospheric Sciences. The role of the BADC is to assist UK atmospheric researchers to locate, access, and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by NERC projects. Those submitting to the centre need to get an account and login. Anyone can get an account but entries will be moderated.
Centre for Ecology & Hydrology (UK)
CEH hosts a wealth of environmental information gathered over decades of environmental monitoring and research. This information covers a wide range of nationally and often globally unique long-term datasets.
Climate Change Knowledge Portal (WorldBank)
The Climate Change Knowledge Portal (CCKP) Beta is a central hub of information, data and reports about climate change around the world. Here you can query, map, compare, chart and summarize key climate and climate-related information.
IRI/LDEO Climate Data Library (US)
The IRI Data Library is a powerful and freely accessible online data repository and analysis tool that allows a user to view, manipulate, and download over 400 climate-related data sets through a standard web browser. The Data Library contains a wide variety of publicly available data sets, including station and gridded atmospheric and oceanic observations and analyses, model-based analyses and forecasts, and land surface and vegetation data sets, from a range of sources.
National Climatic Data Center (NCDC) (US)
NCDC develop both national and global data sets that have been used by both government and the private sector to maximize the resource provided by our climate and minimize the risks of climate variability and weather extremes. The Center has a statutory mission to describe the climate of the United States and NCDC acts as the Nation's Scorekeeper regarding the trends and anomalies of weather and climate.
National Center for Atmospheric Research (US)
CISL Research Data Archive. Agriculture, Atmosphere, Biosphere, Climate Indicators, Cryosphere, Hydrosphere, Land Surface, Oceans, Paleoclimate, Solid Earth, Spectral/Engineering, Sun-earth Interactions

Biology & Life Sciences

Arabidopsis Information Resource - The Arabidopsis Information Resource (TAIR) collects information and maintains a database of genetic and molecular biology data for Arabidopsis thaliana, a widely used model plant.
Databases at EBI (UK) - The European Bioinformatics Institute (EBI) is an academic research institute located on the Wellcome Trust Genome Campus in Hinxton near Cambridge (UK), part of the European Molecular Biology Laboratory (EMBL).
DataBasin (US) - Centers are focal topics or geographies of special interest to Data Basin. Centers provide experts, datasets, maps, galleries, working groups that facilitate more effective collaboration.
Ensembl - The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
GenBank (US/NCBI & NIH) - GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The GenBank database is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data.
Gene Expression Omnibus (US/NCBI & NIH) - Gene Expression Omnibus: a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.
Global Biodiversity Information Facility - The Global Biodiversity Information Facility (GBIF) was established by governments in 2001 to encourage free and open access to biodiversity data, via the Internet. Through a global network of countries and organizations, GBIF promotes and facilitates the mobilization, access, discovery and use of information about the occurrence of organisms over time and across the planet.
Knowledge Network for Biocomplexity - A national network intended to facilitate ecological and environmental research on biocomplexity. For scientists, the KNB is an efficient way to discover, access, interpret, integrate and analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers.
Maize Genetics and Genomics Database - MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays.
Molecular Biology Databases - This work is being developed under the auspices of the Science Commons Data project and builds upon the Science Commons Open Access Data Protocol
NeuroMorpho (US/George Mason) -NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications. It contains contributions from over 60 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared.
PaleoBiology Database - The Paleobiology Database seeks to provide researchers and the public with information about the entire fossil record. It has expanded continuously since 2000 thanks to the efforts of 296 paleontologists from around the world.
Protein Databank - The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
Universal Protein Resource Uniprot - The mission of UniProt is to provide the scientific community with a comprehensive, high-quality, and freely accessible resource of protein sequence and functional information.
Veterinary Medical Database - The VMDB database started in 1964 as an initiative of the National Cancer Institute for the purpose of studying cancer in animals. Since then, 26 universities have submitted more than 7 million records to this database.

Chemistry

eCrystals - Southampton is the archive for Crystal Structures generated by the Southampton Chemical Crystallography Group and the EPSRC UK National Crystallography Service.
ChemSpider - An online resource for chemists to search, aggregate and data mine over 10 million publicly available chemical data, including the PubChem collection and data provided by a number of other collaborators.
ChemSynthesis - ChemSynthesis is a freely accessible database of chemicals. This website contains substances with their synthesis references and physical properties such as melting point, boiling point and density. There are currently more than 40,000 compounds and more than 45,000 synthesis references in the database.
Crystallography Open Database - Open-access collection of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers
NMRShiftDB - Germany. nmrshiftdb2 is a NMR database (web database) for organic structures and their nuclear magnetic resonance (nmr) spectra. It allows for spectrum prediction (13C, 1H and other nuclei) as well as for searching spectra, structures and other properties. Last not least, it features peer-reviewed submission of datasets by its users. The nmrshiftdb2 software is open source, the data is published under an open content license.
Pubchem - PubChem is a component of NIH's Molecular Libraries Roadmap Initiative. It provides information on the biological activities of small molecules.
Reciprocal Net - The Reciprocal Net is a distributed database used by research crystallographers to store information about molecular structures; much of the data is available to the general public. The project is funded by the National Science Foundation (NSF) and part of the National Science Digital Library.
RRUFF Project - RRUFF Project website containing an integrated database of Raman spectra, X-ray diffraction and chemistry data for minerals.
WorldWideMolecularMatrix (UK/Cambridge) - The WorldWideMolecularMatrix, an Open collection of information on small molecules
ZINC - (US/UC San Francisco) a free database of commercially-available compounds for virtual screening. ZINC contains over 21 million purchasable compounds in ready-to-dock, 3D formats. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF).

Computer Science

GitHub - Free public repositories, collaborator management, issue tracking, wikis, downloads, code review, graphs and much more. Hosts developer libraries such as Ruby on Rails, IronRuby, jQuery, Perl
Google Code Project hosting - Google Developers is now the place to find all Google developer documentation, resources, events, and products. Open APIs and Google projects like Google Gears, Android, Chromium.
Launchpad - Launchpad is a software collaboration platform that provides bug tracking,code hosting using Bazaar, code reviews,Ubuntu package building and hosting, translations, and mailing lists.
SourceForge - Open source code hosting facility.
Cooperative Association for Internet Data Analysis (CAIDA) - Collection and sharing of data for scientific analysis of Internet traffic, topology, routing, performance, and security-related events is one of CAIDA's core objectives.

Earth & Planetary Sciences

Goddard Earth Sciences Data and Information Services Center - The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics, data and information. The GES DISC also houses the Modern Era Retrospective-Analysis for Research and Applications (MERRA) data assimilation datasets (generated by GSFC’s Global Modeling and Assimilation Office), and the North American Land Data Assimilation System (NLDAS) and Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch).
IRI/LDEO Climate Data Library (US) - The IRI Data Library is a powerful and freely accessible online data repository and analysis tool that allows a user to view, manipulate, and download over 400 climate-related data sets through a standard web browser. The Data Library contains a wide variety of publicly available data sets, including station and gridded atmospheric and oceanic observations and analyses, model-based analyses and forecasts, and land surface and vegetation data sets, from a range of sources.
Marine Geoscience Data System (MGDS) (US/Columbia) - The Marine Geoscience Data System (MGDS) provides free public access to data collected throughout the global oceans. Our data portals serve different communities of NSF-funded researchers providing direct access to data, program-relevant information, and tools for helping them satisfy their data sharing obligations.
National Climatic Data Center (NCDC) (US) - NCDC develop both national and global data sets that have been used by both government and the private sector to maximize the resource provided by our climate and minimize the risks of climate variability and weather extremes. The Center has a statutory mission to describe the climate of the United States and NCDC acts as the Nation's Scorekeeper regarding the trends and anomalies of weather and climate.
National Oceanographic Data Center (NODC) - Provide access to the world's most comprehensive sources of marine environmental data and information. These data include physical, biological and chemical measurements derived from in situ oceanographic observations, satellite remote sensing of the oceans, and ocean model simulations.
National Center for Atmospheric Research (US) - CISL Research Data Archive. Agriculture, Atmosphere, Biosphere, Climate Indicators, Cryosphere, Hydrosphere, Land Surface, Oceans, Paleoclimate, Solid Earth, Spectral/Engineering, Sun-earth Interactions
USGS National Satellite Land Remote Sensing Data Archive - Earth Resources Observation and Science (EROS) Center is a remotely sensed data management, systems development, and research field center for the U.S. Geological Survey's (USGS) Climate and Land Use Change Mission Area. Note that some data access is fee-based
National Snow and Ice Data Center (NSIDC) - NSIDC manages and distributes scientific data, creates tools for data access, supports data users, performs scientific research, and educates the public about the cryosphere. NSIDC distributes more than 500 cryospheric data sets for researchers, from both satellite and ground observations.
National Space Science Data Center (US/NASA) - The National Space Science Data Center is NASA's permanent archive for space science mission data.
Planetary Data System - The PDS archives and distributes scientific data from NASA planetary missions, astronomical observations, and laboratory measurements. The PDS is sponsored by NASA's Science Mission Directorate. Its purpose is to ensure the long-term usability of NASA data and to stimulate advanced research.
Core Research Center (Denver, Colorado) - The Core Research Center (CRC) preserves valuable rock cores for scientists and educators from government, industry, and academia. The CRC is currently one of the largest and most heavily used public core repositories in the United States.
Woods Hole Science Center’s Sample Archive and Core Lab - The Woods Hole Science Center Sample Archive curates data for the USGS Coastal and Marine Geology Program.
National Ice Core Laboratory (NSF & USGS ) - The U.S. National Ice Core Laboratory (NICL) is a facility for storing, curating, and studying ice cores recovered from the polar regions of the world.

Engineering

Alternative Fuels and Advanced Vehicles Data Center (AFDC) - Data stored in the Alternative Fuels and Advanced Vehicles Data Center (AFDC) can provide insight to policymakers, entrepreneurs, fuel users, and other parties interested in reducing petroleum consumption.
Engineering Strong Motion Data Center (CESMD) - The Center for Engineering Strong Motion Data (CESMD) is a cooperative center established by the US Geological Survey (USGS) and the California Geological Survey (CGS) to integrate earthquake strong-motion data from the CGS California Strong Motion Instrumentation Program, the USGS National Strong Motion Project, and the Advanced National Seismic System (ANSS). The CESMD provides raw and processed strong-motion data for earthquake engineering applications.
Space Science and Engineering Data Center - The Data Center at the University of Wisconsin-Madison Space Science and Engineering Center (SSEC), is responsible for the access, maintenance and distribution of real-time and archive weather satellite data.
Suite Sparse Matrix Collection - (Texas A&M University) The Suite Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications.
Energy Information Administration - Providing statistics and analysis, the U. S. Energy and Information Administration focuses on collecting, analyzing, and disseminating impartial and independent information regarding energy in an effort to promote sound policymaking, efficient markets, and the public understanding of energy and how it interacts with the economy and the environment.

Geosciences & Geospatial

California Water CyberInfrastructure (US/Berkeley) - The BWC is currently building partnerships with several water representatives, such as the USGS, Sonoma County Water Agency, the Monterey County Water Resource Agency, and the NOAA National Marine Fisheries Service.
Federal Geographic Data Committee - Provides access to the National Spatial Data Infrastructure (NSDI) Clearing House Network and the geodata.gov portal. Geospatial One-Stop Initiative to provide “one-stop” access to all registered geographic information and related online access services within the United States. Geographic data, imagery, applications, documents, websites, and other resources have been cataloged for discovery in this portal.
GeoCommons.com - GIS file repository and finding tool
Geodata.gov (US) - One-stop for federal, state and local geographic data
Geodata Repository - A full list of suggestions for public domain data sets that are nice-to-haves is maintained at Geodata Discovery Working Group.
GeoGratis (Canada) - GeoGratis is a portal provided by the Earth Sciences Sector (ESS) of Natural Resources Canada (NRCan) which provides geospatial data at no cost and without restrictions via your Web browser.
GeoNames - The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over 10 million geographical names and consists of over 8 million unique features whereof 2.8 million populated places and 5.5 million alternate names.
Global Change Master Directory (US/NASA) - Meta search cross NASA subject databases: Agriculture, Atmosphere, Biological Classification, Biosphere, Climate Indicators, Cryosphere, Human Dimensions, Land Surface, Oceans, Paleoclimate, Solid Earth, Spectral/Engineering, Sun-Earth Interactions, Terrestrial Hydrosphere, Data Centers - Locations - Instruments/Sensors - Platforms/Sources - Projects
Goddard Earth Sciences Data and Information Services Center - The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics, data and information. The GES DISC also houses the Modern Era Retrospective-Analysis for Research and Applications (MERRA) data assimilation datasets (generated by GSFC’s Global Modeling and Assimilation Office), and the North American Land Data Assimilation System (NLDAS) and Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch).
IRI/LDEO Climate Data Library (US) - The IRI Data Library is a powerful and freely accessible online data repository and analysis tool that allows a user to view, manipulate, and download over 400 climate-related data sets through a standard web browser. The Data Library contains a wide variety of publicly available data sets, including station and gridded atmospheric and oceanic observations and analyses, model-based analyses and forecasts, and land surface and vegetation data sets, from a range of sources.
Knowledge Network for Biocomplexity - A national network intended to facilitate ecological and environmental research on biocomplexity. For scientists, the KNB is an efficient way to discover, access, interpret, integrate and analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers.
Long Term Ecological Research Network - The Network promotes synthesis and comparative research across sites and ecosystems and among other related national and international research programs.
National Climatic Data Center (NCDC) (US) - NCDC develop both national and global data sets that have been used by both government and the private sector to maximize the resource provided by our climate and minimize the risks of climate variability and weather extremes. The Center has a statutory mission to describe the climate of the United States and NCDC acts as the Nation's Scorekeeper regarding the trends and anomalies of weather and climate.
National Ecological Observatory Network (US) - The National Ecological Observatory Network (NEON) is a continental-scale observatory designed to gather and provide 30 years of ecological data on the impacts of climate change, land use change and invasive species on natural resources and biodiversity.
National Centers for Environmental Information - NOAA's National Centers for Environmental Information (NCEI) hosts and provides public access to one of the most significant archives for environmental data on Earth. Provides over 37 petabytes of comprehensive atmospheric, coastal, oceanic, and geophysical data.
NERC Earth Observation Data Centre (UK) - The role of the NEODC is to ensure the responsible stewardship and distribution of Earth Observation (EO) data acquired and generated by NERC. It also gives guidance on the availability and use of EO data and coordinates the acquisition of new data resources.
Oak Ridge National Laboratory Distributed Active Archive Center (US/ORNL) - The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological, geological, and chemical components of Earth's environment.
Publishing Network for Geoscientific & Environmental Data (Germany) - The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. The system guarantees long-term availability of its content through a commitment of the operating institutions.
SeaDataNet (France) - The data centres manage large sets of marine and ocean data, originating from their own institutes and from other parties in their country, in a variety of data management systems and configurations. A major objective and challenge in SeaDataNet is to provide an integrated and harmonised overview and access to these data resources, using a distributed network approach.
UNAVCO GPS/GNSS Data - The UNAVCO Facility in Boulder, Colorado is the primary operational activity of UNAVCO and exists to support university and other research investigators in their use of geophysical sensor technology for Earth sciences research.
USGS National Satellite Land Remote Sensing Data Archive - Earth Resources Observation and Science (EROS) Center is a remotely sensed data management, systems development, and research field center for the U.S. Geological Survey's (USGS) Climate and Land Use Change Mission Area. Note that some data access is fee-based

Health Sciences

Agency for Healthcare Research and Quality (AHRQ) - The Agency for Healthcare Research and Quality (AHRQ) offers robust data sources that may interest you if you're a researcher, clinician, purchaser, policymaker, or consumer.
Centers for Disease Control and Prevention Data & Statistics [CDC] - CDC data and statistics includes access to FASTSTATS A-Z, an alphabetical listing of statistics on topics of public health importance, Health E-Stats, Vital Records, etc.
Comprehensive Epidemiologic Data Resource (CEDR) [DOE]- The Comprehensive Epidemiologic Data Resource (CEDR) is the Department of Energy's (DOE) electronic database comprised of health studies of DOE contract workers and environmental studies of areas surrounding DOE facilities.
Data Bank - The Data Bank, consisting of the National Practitioner Data Bank (NPDB) and the Healthcare Integrity and Protection Data Bank (HIPDB), is a confidential information clearinghouse created by Congress to improve health care quality, protect the public, and reduce health care fraud and abuse in the United States.
Health and Medical Care Archive - The HMCA archives the data from the largest health and health care philanthropy in the United States, the Robert Wood Johnson Foundation, and has a primary goal of understanding health and health care in the US.
Immuno Polymophism Database - Established by the HLA Informatics Group of the Anthony Nolan Research Institute, IPD provides a centralized system for studying the immune system's polymorphism in genes. It provides access to information related to the sequences of human Killer-cell Immunoglobulin-like Receptors (KIR), sequences of the major histocompatibility complex in a number of species, human platelet antigens (HPA), and tumour cell lines.
International Classification of Diseases - ICD serves as the international standard for diagnostic classification for all general epidemiological, many health management purposes and clinical use.
National Center for Health Statistics - The NCHS collects data from birth and death records, medical records, interview surveys, direct physical exams, and laboratory texting to document the health status of the population; identify disparities in health status and health care use by various population characteristics; monitor trends in health status; identify health problems; support biomedical and health services research; and evaluate the impact of health policies and programs.
Orthopedic Foundation for Animals Records - This repository allows users to search specific dogs and their breeds for certain hereditary canine health concerns.
Scientific Registry of Transplant Recipients - SRTR focuses on large-scale prospective studies, original data collection, and analysis of existing databases to improve patient lives through research in end-stage organ failure and transplantation.
Surveillance Epidemiology and End Results - A premier source for United States cancer statistics, SEER gathers information related to incidence, prevalence, and survival from specific geographic areas that represent 28 percent of the population, as well as compiles related reports and reports on the national cancer mortality rates..
Veterinary Medical Database - The VMDB data base started in 1964 as an initiative of the National Cancer Institute for the purpose of studying cancer in animals. Since then, 26 universities have submitted more than 7 million records to this data base.
Visible Human Project - The Visible Human Project is the creation of complete, anatomically detailed, three-dimensional representations of the normal male and female human bodies. Acquisition of transverse CT, MR and cryosection images of representative male and female cadavers has been completed.
WHO Global Health Observatory Data Repository - The GHO data repository provides access to over 50 datasets on priority health topics including mortality and burden of diseases, the Millennium Development Goals (child nutrition, child health, maternal and reproductive health, immunization, HIV/AIDS, tuberculosis, malaria, neglected diseases, water, and sanitation), non-communicable diseases and risk factors, epidemic-prone diseases, health systems, environmental health, violence and injuries, equity among others.

Humanities

UK Data Archives - "We are open to offers of any data collection which may be of use to social scientists and historians, whether large scale or small scale, in most formats."
Association of Religion Data Archives (ARDA) - The ARDA provides free access to the most authoritative religion data and religion statistics. It is a collection of surveys, polls, and other data submitted by researchers and made available online by the ARDA
UCL Centre for Digital Humanities - Based at the University College of London, the Centre for Digital Humanities brings together and hosts research projects from a wide variety fields of study.
The Institute for Advanced Technology in the Humanities - IATH is a research unit of the University of Virginia with the goal of exploring and developing information technology as a tool for scholarly humanities research.

Physics

HEP (High Energy Physics) Data - Repository for publication-related High-Energy Physics data
NIST Physical Standards Laboratory - Physical reference data and property tables. The Physical Measurement Laboratory (PML) develops and disseminates the national standards of length, mass, force and shock, acceleration, time and frequency, electricity, temperature, humidity, pressure and vacuum, liquid and gas flow, and electromagnetic, optical, microwave, acoustic, ultrasonic, and ionizing radiation. Its activities range from fundamental measurement research through provision of measurement services, standards, and data.
National Nuclear Data Center - Includes nuclear structure, reaction and decay databases.
National Center for Atmospheric Research (US) - CISL Research Data Archive. Agriculture, Atmosphere, Biosphere, Climate Indicators, Cryosphere, Hydrosphere, Land Surface, Oceans, Paleoclimate, Solid Earth, Spectral/Engineering, Sun-earth Interactions
National Snow and Ice Data Center (NSIDC) - NSIDC manages and distributes scientific data, creates tools for data access, supports data users, performs scientific research, and educates the public about the cryosphere. NSIDC distributes more than 500 cryospheric data sets for researchers, from both satellite and ground observations.

Social Sciences

openICPSR- Open Access repository for data in the social and behavioral sciences, at ICPSR at the University of Michigan.
IQSS Dataverse Network - From the Institute for Quantitative Social Sciences at Harvard University, an easily accessible source for depositing research data.
National Center for Educational Statistics - The National Center for Education Statistics (NCES) is the primary federal entity for collecting and analyzing data related to education.
National Data Archive on Child Abuse and Neglect - "The Children’s Bureau and NDACAN share the belief that secondary analysis of archived data is a vital aspect of research. Archiving the data with NDACAN benefits the contributor by preserving the original data and increasing the potential number of author citations. Your contribution also benefits the scientific community as a whole, bringing greater understanding to the study of child well-being through replication and extension of your previous research."
RunMyCode - RunMyCode is a novel cloud-based platform that enables scientists to openly share the code and data that underlie their research publications. Service is based on the innovative concept of a companion website associated with a scientific publication. The code is run on a computer cloud server and the results are immediately displayed to the user. Welcomes code and data from multiple research areas including social sciences.
Data.gov - An Official Web Site of the United States Government - The purpose of Data.gov is to increase public access to high-value, machine readable datasets generated by the Executive Branch of the Federal Government. As a priority Open Government Initiative for President Obama's administration, Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov.
UK Data Archive - The UK's largest collection of digital research data in the social sciences and humanities: "We are open to offers of any data collection that may be of use to social scientists and historians, whether large scale or small scale, in most formats."