Biomedical Big Data - protein Genomics: Mass Comment
Relationship proteomic studies with other groups is mutually confirmed: protein genomics genome annotation was originally used, extended back to the relationship between proteins and transcriptome, or alternative splicing, at the same time, proteomics relies on the genome annotation for verification. Many studies have not indicated genomics protein, but assigned to the corresponding genomics.
Protein Genomics existing problems:
Genome annotation methods: 1.Denovo . 2. the transcriptome corresponding certificate. 3. genome database homology alignments.
Genome annotation problem:
For the particular structure:
- Start site is difficult to determine the positioning signals
- Determining promoter regulatory binding sites, the error rate
- Termination site is not a comment
- Alternative splicing prediction and requires more training data in the genome confirmed using denovo may all be predicted, but using intron and exon prediction only 60% .
For the conventional part of the genome:
- Because the second generation reads short so difficult splicing
- Previously identified genomic already exists, it is now unknown species genome
- Similar data have different results of different treatments
- The level of missing persons
For a large number of proteomic mass spectrometry data has not been fully resolved, it may be because
- A variety of modifications of proteins
- Search engine differences
- Did not identify other charge
- Lack of database data
Many did not fully resolve the identified cause of data that might be
- noise
- Experimental pollution, by the addition of part of contaminating proteins was confirmed by experiments that pollution, of course, will discard this part after identified, for the need to improve quality control of part of the problem.
- Generating a mixed pattern (i.e., the same map segment polypeptide) or a fusion peptide
- In the known information not included in the library of protein that the new protein
Protein Genomics role:
1. The correction model gene, i.e., add new annotations, add new peptide 2 in turn increases the new genes from the new peptide
Protein genomics need to improve the way
Protein experimental aspects:
- Improve protein separation techniques apart points
- Increase the enrichment technology allows large amount of protein
- Precision instruments
- Increase the diversity of the samples were collected from different sample space of time
Database and data processing:
Conventional large part of the problem is that there a lot of noise and similar database, data translation results search space and six denovo almost. By limiting the size of the exon can, the positive and negative high retention library methods improved. When multi-segment identified quality assessment. High false positive, wide dynamic range, it is difficult to identify alternative splicing (from alternatively spliced denovo large database), a high false positive heavy stitching.
Search speed:
The method of filtering of low mass quality map cluster, to the computer's parallel or heavy improved retrieval speed.
new method:
Join RNAseq than multifactor authentication.
Into a variety of methods to identify subcellular fractions.
To verify the in vitro transcription and translation
n- terminus genomics: using terminal peptide diagonal check Chromatography
DNA and RNA data-aided identification of protein
Ribosome spectrum