MATLAB for RNA-seq data, a duuude how to use MATLAB to preprocess RNA-seq data

This is how I use MATLAB to acquire RNA-Seq data for analysis using the standard bowtie program. I'm a novice coder, so please read this as "what did I do", not "what should you do". I also welcome suggestions on better ways to handle these data tasks, better MATLAB coding, etc.

My goal in this project is to use RNA-Seq to pinpoint the transcription start site (exactly where in a gene RNA production begins).

To do this, I preserve and purify the unique origin of the RNA and attach a defined RNA sequence (which I call the 5' RNA Adapter) to the RNA origin. I then convert all the RNA to cDNA (for complementary DNA) using a viral enzyme called reverse transcriptase. I then amplified the cDNA from only my gene of interest using primers specific for that gene. Finally, I sequenced the amplified cDNA and analyzed the sequence reads.

insert image description here

As I explained in the first post above, the sequence data is returned to me as fasta files. I need to make an embarrassing note here: the code I wrote in the first post works as I want it to. The problem is that I'm not asking MATLAB to do the right thing - the output I'm specifying doesn't apply to the bowtie program (which I wrote about in the second post above).

The code I'm discussing here does process data in a similar way to the first article, but in a way that bowtie can accept.

Import the file and make up the second read in reverse...

My first task is to simply read the fastq data files, which MATLAB simplifies for me with a built-in function that imports the file and assigns the normalized data format into what MATLAB calls a structure file. I have two files to import because the type of RNA-Seq I'm requesting is paired-end read&#x

Guess you like

Origin blog.csdn.net/code2day/article/details/131273251