seqtk extracts reads in batches

Sometimes there are differences in the number of reads in the sample, some are hundreds of thousands, and some are tens of thousands. At this time, seqkit is usually used to extract

The commonly used extraction modes are:

Specify the number of records (10000) to extract:

seqtk sample -s 100 sample1.fq 10000 | gzip > sample1.fq  

seqtk sample -s 100 sample2.fq 10000  | gzip > sample2.fq

Draw proportionally (0.6)

seqtk sample -s 100 sample1.fq 0.6 | gzip > sample1.fq  

seqtk sample -s 100 sample2.fq 0.6 | gzip > sample2.fq

Available when multiple samples need to be processed

for f in *; do seqtk sample -s 100 $f 0.5 | gzip > temp/$f; done

However, there is a small question why reads * 0.6, which is sometimes not the original data, is proportionally extracted. I don’t understand this place for the time being. If anyone knows, please leave a message, thank you!

Guess you like

Origin blog.csdn.net/whiteof/article/details/130387271