sra data into and renamed fastq

Sra move the data into our working directory, we start sra turn faq.
Before the formal operation code, you must acquire the next sample test code can run successfully, it is very critical, because even if successful operation of this step is particularly slow, even if the code is another mistake a waste of time.

Take the first sample for testing

ls SRR5315196.sra |fastq-dump -gzip --split-3 -O ./ SRR5315196.sra

sra-tools in the tool fastq-dump file can be converted to FASTQ SRR format, -split-3 parameter indicates if the double ended sequencing is automatically resolved, if single side unaffected. That is, -split-3 parameters may be split fastq a PE file to extract the files sra _1.fastq and _2.fastq, if the sample data set is sequenced SE, no splitting is performed. -Gzip fastq converted to compressed files to save space.
Single test is successful, then we write cycle batch conversion format.

cat >sra.sh #写脚本 ls SRR* |while read id; do (fastq-dump -gzip
–split-3 -O ./ i d ) 1 > . / {id}) 1>./ {ID} .sra.log 2> &. 1; DONE # plurality of cycles nohup bash sra.sh & # hanging background

This step must surely remember the hang running in the background (nohup cmd &) as particularly slow, do not you hang back a dropped just fall short.
We interpret this next script written inside loop command:
LS * SRR SRR is to be able to put all the data downloaded from the NCBI current folder listed
All data SRRfastq-dump -gzip --split-3 -O ./ $ {id } this command tells the system to be converted into I and fastq formats to be compressed; and indicates if the double ended sequencing is automatically resolved, if it is a single-ended unaffected; -O ./ i d o u t p u t S R R 5315196. f a s t q . g z 1 > . / {Id} indicates the output (output) to the current folder, the file name prefix unchanged (this is the case the output sheets such SRR5315196.fastq.gz) 1> ./ {ID} .sra.log 2> & 1 (of 0. The standard input; 1 standard output; 2 standard error) This order is to say, to redirect the standard output file of the current files SRR5315197.sra.sra.log, standard output and standard error to a file (2> & 1)
As shown, SRA successful transfer fastqmaking the config file
making action config file: SRA data downloaded from the NCBI, then then into fastq.gz format, in this process the original file name ( SRR number) when run into the process can distinguish the file name of each sample, Seng Credit analysis file naming is important, when generating fastq file, if you do not change the operation will directly generate SRR ****. fastq.gz, we can not do this, only the SRR numbers we do not know what these are samples, Seng Credit analysis is very important that we have from the file name know that this is what the data.
NCBI SRA also went to first download the data interface to download a txt file
Here Insert Picture Descriptionafter download to your computer into a server
after the import was successful, use the command:
$ head -1 SraRunTable.txt | TR '\ t' '\ the n-' | cat -n
Here Insert Picture Descriptionthe first row (header) to view the file put it into the column, and add line numbers.

Released four original articles · won praise 1 · views 3744

Guess you like

Origin blog.csdn.net/zhihaoyi/article/details/96150755