RNA-seq analysis-database

! ! ! ! Disclaimer: Not original, I just make it easy for myself to learn, the original text guides the way

NCBI-SRA database and EBI-ENA database

Most of the high-throughput sequencing data in the published literature will be uploaded to a database to facilitate other people's downloading, learning and re-study. Of these, the SRA database from NCBI is naturally the most popular. At the same time, the ENA database from EBI has a lot of convenience for downloading data, so please understand the situation of these two databases before downloading files.

NCBI and EBI belong to INSDC: International Nucleotide Sequence Database Collaboration, and the data submitted to the three databases can be interoperable. The content of the structure is as follows:

  • NCBI: National Center for Biotechnology Information
  • EBI: European Bioinformatics Institute
  • DDBJ:DNA Data Bank of Japan

SRA database : Sequence Read Archive,

  • It is a database that stores high-throughput sequencing data, comparison information and meta data. Basically all high-throughput sequencing data in published documents will be uploaded to this database. This database belongs to NCBI.
    Insert picture description here

  • Meta data of the SRA database : refers to data related to sequencing experiments and their experimental samples, such as experimental purpose, experimental design, sequencing platform, sample data (species, strains, individual phenotypes, etc.). In the SRA database, meta data is stored in the following levels:
    [1] Research topic (study): In the SRA database, the accession number of the research topic is prefixed withDRPERP or SICKLEbeginning.
    [2] Sample information (sample): The retrieval number of the sample starts with the prefix DRS, ERS or SRS. The sample information may include species information, strain (strain) information, family information, phenotype data, clinical data, tissue type, etc.
    [3] Experiment information (experiment): the retrieval number of the experiment is prefixedDRXERXor SRXbeginning. The experiment is the most basic unit of the SRA database, just as every article in the PubMed database is the basic unit of the PubMed database. An experiment is part of a research topic. One or more samples are sequenced, and the resulting sequencing data is stored in the SRA database in the form of runs.
    [4] Sequence data: including sequence and quality information, etc., stored in the SRA database with run as a unit. The search number of run is prefixed withDRRERR or SRR beginning.

ENA database : European Nucleotide Archive

  • It belongs to EBI and should be similar in function to SRA, but its search interface is more friendly to the people, and it is more friendly for downloading fastq files and checking the integrity of downloaded data, so it is strongly recommended to use it first.

Insert picture description here

  • The advantage of ENA database
    [1] You can directly obtain fastq files
    [2] Another advantage of using ENA database is that you can confirm the integrity of the downloaded data. The long download time caused by the large volume of biographical data (the network will fluctuate if the network is abnormal during the period) may cause problems such as the lack of downloaded data. These problems are generally difficult to be found in the early stage of obtaining the data. The ENA database provides the md5 code to check the integrity of the data.

  • ENA database usage

First, enter the target SRA search number in the search bar at the upper right corner of the database page, and wait a while after confirmation to get the result page

Insert picture description here

Secondly, click to select Experiment to get all the sequencing sequence data information of the experiment

Insert picture description here

We can see the two sequence data information belonging to the experiment, and can obtain the FTP address for downloading the fastq file directly in the FASRTQ files (FTP) column.

Insert picture description here

Get the FTP address for downloading fastq files directly

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_44520665/article/details/113743765