NCBI参考序列RefSeq

关于RefSeq的基本信息,可以参照一下几篇文章【开启传送门~!@#¥%……&*】

http://liucheng.name/381/

http://www.biosino.org/pages/ncbi-10.htm

官方版本:http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html

不过可能我现在更关注与RefSeq的格式说明,这一阶段的失败教训提醒我,数据分析的时候一定要搞清楚各个数据项的意义。

方便查阅

Accession @ Molecule Method  Note 说明 
AC_123456 Genomic Mixed Alternate complete genomic molecule. This prefix is used for records that are provided to reflect an alternate assembly or annotation. Primarily used for viral, prokaryotic records.  基因组序列,主要是病毒、原核生物。
AP_123456 Protein Mixed Protein products; alternate protein record. This prefix is used for records that are provided to reflect an alternate assembly or annotation. The AP_ prefix was originally designated for bacterial proteins but this usage was changed.  蛋白序列,AP_原本只用于细菌的蛋白。
NC_123456 Genomic Mixed Complete genomic molecules including genomes, chromosomes, organelles, plasmids.  全基因组序列,包括细胞器的、质粒等
NG_123456 Genomic Mixed Incomplete genomic region; supplied to support the NCBI genome annotation pipeline. Represents either non-transcribed pseudogenes, or larger regions representing a gene cluster that is difficult to annotate via automatic methods.  不完整的基因组序列,
NM_123456
NM_123456789
mRNA Mixed Transcript products; mature messenger RNA (mRNA) transcripts.  成熟的mRNA
NP_123456
NP_123456789
Protein Mixed Protein products; primarily full-length precursor products but may include some partial proteins and mature peptide products.  全长蛋白序列。但也有可能包括非全长的蛋白或成熟的多肽序列。
NR_123456 RNA Mixed Non-coding transcripts including structural RNAs, transcribed pseudogenes, and others.  不编码的RNA,假基因或其它
NT_123456 Genomic Automated Intermediate genomic assemblies of BAC and/or Whole Genome Shotgun sequence data.  BAC法或鸟枪法得到的基因组序列
NW_123456
NW_123456789
Genomic Automated Intermediate genomic assemblies of BAC or Whole Genome Shotgun sequence data.  BAC法或鸟枪法得到的基因组序列
NZ_ABCD12345678 Genomic Automated A collection of whole genome shotgun sequence data for a project. Accessions are not tracked between releases. The first four characters following the underscore (e.g. 'ABCD') identifies a genome project.  'ABCD'代表的是具体的基因组计划
XM_123456
XM_123456789
mRNA Automated Transcript products; model mRNA provided by a genome annotation process; sequence corresponds to the genomic contig.  转录序列
XP_123456
XP_123456789
Protein Automated Protein products; model proteins provided by a genome annotation process; sequence corresponds to the genomic contig.  蛋白序列
XR_123456 RNA Automated Transcript products; model non-coding transcripts provided by a genome annotation process; sequence corresponds to the genomic contig.  不编码的转录序列,
YP_123456
YP_123456789
Protein Mixed Protein products; no corresponding transcript record provided. Primarily used for bacterial, viral, and mitochondrial records.  蛋白序列,没有对应的转录序列。用于细菌、病毒和线粒体
ZP_12345678 Protein Automated Protein products; annotated on NZ_ accessions (often via computational methods).  蛋白序列。来自对应的NZ_开头的核酸序列。
NS_123456 Genomic Automated Genomic records that represent an assembly which does not reflect the structure of a real biological molecule. The assembly may represent an unordered assembly of unplaced scaffolds, or it may represent an assembly of DNA sequences generated from a biological sample that may not represent a single organism.  比较复杂

@ Method:   
Mixed: indicates the process flow includes both automated processing and expert review for some of the records; curation analysis may be provided either by NCBI staff or collaborators.由专家手动检查过的
Automated: indicates records that are not individually reviewed; updates are released in bulk for a genome.自动注释的

For more:http://www.ncbi.nlm.nih.gov/RefSeq/key.html#accession

猜你喜欢

转载自bbsunchen.iteye.com/blog/657603
今日推荐