Biomedical datasets

There’s a lot of publicly accessible data out there, so why not keep a list to have them all handy? The best are sites that have nice interfaces but also allow you to easily download and handle the data. Here are some resources, in not yet logical order / organization.


Downloadable data tables:

SourceLinkDescription
GnomADhttp://gnomad.broadinstitute.org/Human polymorphisms observed in genome / exome data.
GTExhttps://gtexportal.org/home/Tissue RNAseq data. Download large data files, and search with Bash.
ClinVarhttps://www.ncbi.nlm.nih.gov/clinvar/Interpretations of genetic variants.
depmaphttps://depmap.org/portal/Mutations in cell lines.
cBioPortalhttp://www.cbioportal.org/Cancer genomics data
COSMIChttps://cancer.sanger.ac.uk/cosmicCatalog of Somatic Mutations in Cancer
Genomics of Drug Sensitivity in Cancerhttps://www.cancerrxgene.org/Cell-line specific sensitivities to compounds
NIH RePORTERhttps://projectreporter.nih.gov/reporter.cfmInformation on NIH-funded grants
Human Protein Atlashttps://www.proteinatlas.org/about/downloadDifferent types of data about cells


Additional types of data:

SourceLinkDescription
Protein Data Bankhttp://www.rcsb.org/Molecular structures
The Human Protein Atlashttps://www.proteinatlas.org/Protein localization
Clinical Genome Resourcehttps://www.clinicalgenome.org/Clinical information for genes
Genetic Testing Registryhttps://www.ncbi.nlm.nih.gov/gtr/List of approved genetic tests
EVcouplingshttps://evcouplings.org/Evolutionary coupling data
IntActhttps://www.ebi.ac.uk/intact/Protein interactions
Timetreehttp://www.timetree.org/Timescales of organisms
PheWAShttps://phewascatalog.org/Phenome -wide association studies
FPbasehttps://www.fpbase.org/Fluorescent proteins
PaxDb: Protein Abundance Databasehttps://pax-db.org/Protein abundance (Mass Spec)
PEP Trackerhttps://peptracker.com/More Mass Spec
Gene Ontology Consortiumhttp://geneontology.org/Gene / Protein associations
denovo-dbhttp://denovo-db.gs.washington.edu/Tables of de novo variants seen in trios
DECIPHERhttps://decipher.sanger.ac.uk/Clinical genome
DGIdbhttp://dgidb.orgDruggable targets?
Mouse Genome Informaticshttp://www.informatics.jax.org/Information on genes (in mice)
Human Phenotype Ontogenyhttps://hpo.jax.org/app/Info about the protein


Virus data:

SourceLinkDescription
HIV databases at Los Alamos National Labshttps://www.hiv.lanl.govHIV genetic sequences and immunological epitopes
Virus Pathogen Resourcehttps://www.viprbrc.orgViral sequences (general)