Pdb protein database URL! +30 protein database sites!

A protein database refers to a database that specifically stores protein-related information. They collect, organize and store a large amount of protein data, including protein sequence, structure, function, interaction relationship, expression pattern, disease association and other information. The protein database provides retrieval, query and analysis functions for these data, and provides important resources for scientific researchers, bioinformaticians and drug developers.

The content of the protein database usually comes from the protein data actually measured in the laboratory, such as data obtained by protein sequence determination, crystallography, nuclear magnetic resonance, mass spectrometry and other techniques. After verification and standardization, these data are integrated into the database, so that researchers can easily access and use these data for various research work.

The following are the commonly used protein databases and URLs summarized by the author for your reference.

⓪BioXFinder: BioXFinder is the first and only biological database in China: it contains more than 500,000 high-quality, non-redundant protein information that integrates data from multiple sources and is manually annotated, including basic protein information, sequence, sequence features, Information on function, name and lineage, subcellular localization, disease and variants, post-translational modifications, expression, interactions, and more.

Protein structure library: contains more than 190,000 protein structure data determined by X-ray single crystal diffraction, nuclear magnetic resonance, electron diffraction and other experimental means. Including protein 3D structure, basic information, experimental data, references, etc.

BioXFinder：https://bio.bcpmdata.com/

Image source: BioXFinder

①UniProt: UniProt is a comprehensive protein database that provides sequence, structure, function, interaction and annotation information of a large number of proteins. It integrates data from multiple sources, including Swiss-Prot, TrEMBL and PIR databases.

UniProt: https://www.uniprot.org/

②Protein Data Bank (PDB): PDB is a database that stores the structures of proteins and other biological macromolecules. It provides three-dimensional coordinate data of experimentally determined protein structures, which can be used in fields such as structural biology research, drug design and molecular simulation.

Protein Data Bank (PDB): https://www.rcsb.org/

③ NCBI Protein: NCBI Protein is a protein database provided by the National Center for Biotechnology Information (NCBI), which contains a large amount of protein sequence data, and can be used for basic protein information query and comparative analysis.

NCBI Protein: https://www.ncbi.nlm.nih.gov/protein/

④ Ensembl: Ensembl is a comprehensive genome annotation database that contains genome sequences, gene structures, transcripts, and protein information of multiple species. It provides a genome browser and analysis tools for researchers to conduct genome research and comparative genomics analysis.

Together: https://www.ensembl.org/

⑤Swiss-Prot: Swiss-Prot is a human-annotated protein database that provides high-quality protein sequences and annotation information. It contains detailed annotations of protein functions, domains, modifications, subcellular localization, etc., and provides rich references.

Swiss-Prot: https://www.uniprot.org/uniprot/?query=reviewed:yes

⑥RefSeq: RefSeq is a comprehensive protein and nucleic acid sequence database provided by NCBI, which contains reference sequences of multiple species. It provides high-quality gene and protein sequences, annotation information, and references for genomics, genetics, and bioinformatics research.

RefSeq: https://www.ncbi.nlm.nih.gov/refseq/

⑦STRING: STRING is a protein interaction database that integrates protein interaction information from multiple data sources, including experimentally verified interactions, computationally predicted interactions, and literature-reported interactions. It provides visualization and analysis tools for protein interaction networks to study protein interaction networks and functional modules.

STRING: https://string-db.org/

⑧InterPro: InterPro is a protein family and domain annotation database that integrates information from multiple annotation resources. It provides functional and domain annotations of protein sequences, helping researchers understand protein function and structure.

InterPro: https://www.ebi.ac.uk/interpro/

⑨Pfam: Pfam is a protein family database that provides annotation information of protein families and domains in multiple species. It is based on multiple sequence alignments and hidden Markov models for protein function prediction and annotation.

Pfam: https://pfam.xfam.org/

⑩SMART: SMART is a protein structure and functional domain annotation database, which provides annotation information of protein domains in multiple species. It helps researchers understand the function of proteins and the evolutionary relationship of domains.

SMART: http://smart.embl-heidelberg.de/

⑪KEGG: KEGG is a bioinformatics resource that includes information on genomes, genes, proteins, metabolic pathways, and diseases. It provides information on protein sequences, functional annotations, metabolic pathways, and signaling pathways for the study of biological systems and drug development.

KEGG: https://www.genome.jp/kegg/

⑫NCBI GenBank: NCBI GenBank is a comprehensive nucleic acid sequence database that contains genome, mRNA and protein sequences from different species. It provides a large amount of nucleic acid sequence data and related annotation information, which can be used in genomics, genetics and bioinformatics research.

NCBI GenBank: https://www.ncbi.nlm.nih.gov/genbank/

⑬NCBI RefSeq: NCBI RefSeq is a comprehensive reference sequence database provided by NCBI, which contains genomes, transcripts and protein sequences of multiple species. It provides high-quality gene and protein sequences, annotation information, and references for genomics, genetics, and bioinformatics research.

NCBI RefSeq: https://www.ncbi.nlm.nih.gov/refseq/

⑭NCBI Conserved Domain Database (CDD): NCBI CDD is a protein conserved domain database, which is used to identify conserved domains and functional modules in protein sequences. It integrates information from multiple domain databases to provide domain annotation and function prediction of protein sequences.

NCBI Conserved Domain Database (CDD): https://www.ncbi.nlm.nih.gov/cdd/

⑮NCBI Protein Clusters: NCBI Protein Clusters is a protein clustering database that clusters similar protein sequences together to form protein families. It is based on sequence similarity and clustering algorithms for annotation and function prediction of protein families.

NCBI Protein Clusters: https://www.ncbi.nlm.nih.gov/proteinclusters/

⑯NCBI Structure: NCBI Structure is a protein structure database provided by NCBI, which contains experimentally determined protein three-dimensional structure data. It provides three-dimensional coordinates of protein structure, domain annotation and function prediction, which can be used in structural biology research and drug design.

NCBI Structure: https://www.ncbi.nlm.nih.gov/structure/

⑰NCBI COG (Clusters of Orthologous Groups): NCBI COG is a protein orthogonal group database used to identify Orthologous Groups (Orthologous Groups) in different species. It is based on protein sequence similarity and functional conservation among species, and is used to study the evolutionary relationship and functional annotation of proteins.

NCBI COG (Clusters of Orthologous Groups): https://www.ncbi.nlm.nih.gov/COG/

⑱NCBI GEO (Gene Expression Omnibus): NCBI GEO is a repository of gene expression data, including gene expression profile data from different experiments. It provides raw data and analysis results of gene expression profiles, which can be used to study expression patterns of gene regulation and biological processes.

NCBI GEO (Gene Expression Omnibus): https://www.ncbi.nlm.nih.gov/geo/

⑲NCBI SRA (Sequence Read Archive): NCBI SRA is a high-throughput sequencing data repository that contains sequencing data from different experiments. It provides raw sequencing data and related annotation information, which can be used in genomics, transcriptomics, and variation analysis.

NCBI SRA (Sequence Read Archive): https://www.ncbi.nlm.nih.gov/sra/

⑳NCBI dbSNP (Single Nucleotide Polymorphism Database): NCBI dbSNP is a single nucleotide polymorphism database that collects single nucleotide variation information in humans and other species. It provides annotation and frequency information of single nucleotide polymorphisms for the study of genetic variation and disease-associated genetic variation.

NCBI dbSNP (Single Nucleotide Polymorphism Database): https://www.ncbi.nlm.nih.gov/snp/

㉑NCBI ClinVar: NCBI ClinVar is a clinically relevant genetic variation database that collects genetic variation information related to human diseases. It provides the clinical significance of genetic variation, associated diseases, and relevant literature for research into the diagnosis and treatment of genetic diseases.

NCBI ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/

㉒NCBI dbGaP (Database of Genotypes and Phenotypes): NCBI dbGaP is a genotype and phenotype database for storing and sharing data of human genetic research. It contains genotype, phenotype and clinical data, which can be used to study genetic variation and the genetic basis of complex diseases.

NCBI dbGaP (Database of Genotypes and Phenotypes): https://www.ncbi.nlm.nih.gov/gap/

㉓PANTHER (Protein ANalysis THrough Evolutionary Relationships): PANTHER is a protein family and functional annotation database, which predicts protein function based on the evolutionary relationship between species. It provides information on protein families, functional annotations, and evolutionary relationships for the study of protein function and evolution.

PANTHER (Protein ANalysis THrough Evolutionary Relationships): http://www.pantherdb.org/

㉔ SUPERFAMILY: SUPERFAMILY is a protein structure and functional domain database, which classifies and annotates proteins based on the structure and function of domains. It provides annotations and functional predictions of protein domains for studying protein structure and function.

SUPERFAMILY: http://supfam.org/

㉕ PROSITE: PROSITE is a protein domain and motif database for identifying domains and motifs in protein sequences. It performs annotation and function prediction of protein sequences based on sequence patterns and conserved motifs.

ASK: https://prosite.expasy.org/

㉖HPRD (Human Protein Reference Database): HPRD is a human protein reference database that provides information on the sequence, structure, function and interaction of human proteins. It integrates information from multiple data sources for the study of human protein functions and interaction networks.

HPRD (Human Protein Reference Database): http://www.hprd.org/

㉗ BioGRID: BioGRID is a biological grid database that collects experimentally validated data on protein interactions. It provides data and analysis tools for protein interaction networks to study protein interactions and signaling pathways.

BioGRID: https://thebiogrid.org/

㉘IntAct: IntAct is a protein interaction database that integrates experimentally validated protein interaction data. It provides protein interaction annotation and network visualization tools for studying protein interaction networks and functional modules.

IntAct: https://www.ebi.ac.uk/intact/

㉙Reactome: Reactome is a metabolic pathway and signaling pathway database that provides information on biological processes and molecular interactions in multiple species. It provides detailed annotation and visualization tools for metabolic pathways and signaling pathways for studying biological processes and disease mechanisms.

Reactome: https://reactome.org/

㉚NCBI CDD (Conserved Domain Database) is a protein conserved domain database, which identifies conserved domains and functional modules in protein sequences, and provides corresponding annotations and predictions.

NCBI CDD (Conserved Domain Database): https://www.ncbi.nlm.nih.gov/cdd/

Protein databases play an important role in biological research, protein function prediction, protein structure prediction, drug development and other fields. By using protein databases, researchers can obtain basic information of proteins, interaction relationships, structural domain annotations, function predictions, etc., so as to gain an in-depth understanding of the biological functions and mechanisms of proteins.

Pdb protein database URL! +30 protein database sites!

Guess you like