Those who engage in biology and life letter can’t do without downloading genes or protein sequences. If you have a species, a gene, a protein, or an RNA, it’s okay. I’ll just click to download, but if you want to download the species, gene, or There are a lot of proteins, which is a long list. It is not suitable for laborious downloading one by one. At this time, we need to solve the problem with the idea of programming, and let the machine tirelessly help us complete the mechanical task!
The following is a realistic task: we need to download the FASTA sequence database of 31 species of proteome, we use Linux shell programming to solve, this method and idea can be extended and expanded, you can modify the code according to your own actual problems, my code is as follows:
#!/usr/bin/bash
for w in $(cat Species_fasta_websites.txt)
do
echo "The website of the currently downloaded species sequence database is:"
echo
website=`echo ${w##*:}`
taxid=`echo ${website%%&*}`
wget -c -O "Uniprot_taxonomy_$taxid.fasta.gz" $w
echo
echo "Species $taxid Seq Database is Download Finished!"
done
The above code is saved in a file named: downloader.sh, and then the one in the same folder as the downloader.sh file records all the URLs to be downloaded, and then we use powerful linux commands to traverse the download It, of course, you can also use R, Python, Perl, Java, Julia, C and other languages to achieve such a function, no matter what language you use, our purpose is the same!
A screenshot of the contents of this file Species_fasta_websites.txt is as follows: