Write your own Linux-Shell script to download the sequence database_2020-11-17

Those who engage in biology and life letter can’t do without downloading genes or protein sequences. If you have a species, a gene, a protein, or an RNA, it’s okay. I’ll just click to download, but if you want to download the species, gene, or There are a lot of proteins, which is a long list. It is not suitable for laborious downloading one by one. At this time, we need to solve the problem with the idea of ​​programming, and let the machine tirelessly help us complete the mechanical task!

The following is a realistic task: we need to download the FASTA sequence database of 31 species of proteome, we use Linux shell programming to solve, this method and idea can be extended and expanded, you can modify the code according to your own actual problems, my code is as follows:

#!/usr/bin/bash

for w in $(cat Species_fasta_websites.txt)

do

    echo "The website of the currently downloaded species sequence database is:"

    echo

    website=`echo ${w##*:}`

    taxid=`echo ${website%%&*}`

    wget -c -O "Uniprot_taxonomy_$taxid.fasta.gz" $w

    echo

    echo "Species $taxid Seq Database is Download Finished!"

done

The above code is saved in a file named: downloader.sh, and then the one in the same folder as the downloader.sh file records all the URLs to be downloaded, and then we use powerful linux commands to traverse the download It, of course, you can also use R, Python, Perl, Java, Julia, C and other languages ​​to achieve such a function, no matter what language you use, our purpose is the same!

A screenshot of the contents of this file Species_fasta_websites.txt is as follows:

 

 

Guess you like

Origin blog.csdn.net/qq_32391411/article/details/109746894