0088-【生物软件】-GATK4如何使用idx和tbi索引

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/leadingsci/article/details/83622881

gatk数据库下载

使用路径:https://software.broadinstitute.org/gatk/download/bundle

数据库下载后,
hg19的vcf为gz结尾压缩格式,idx索引后缀。
hg38的vcf为gz结尾压缩格式,tbi索引。

运行命令

使用数据库下载后的vcf文件,直接用户跑命令。发现报错,说没有读到index索引。

/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz --known-sites /Bio/Database/GATK/bundle/hg19/dbsnp_138.hg19.vcf.gz -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table >> /opt/script/pipeline/thalaflow/script/run_call_snp.log 2>&1

解决方法

1. 对vcf进行解压

开始的时候,使用tar -zxf进行解压,但没有效果。

成功解压后,可以直接使用cat进行查看,不再是二进制文件。

gunzip 1000G_phase1.indels.hg19.sites.vcf.gz

2. 对index文件进行解压

gunzip 1000G_phase1.indels.hg19.sites.vcf.idx.gz

3. 测试

/opt/conda/bin/gatk --java-options "-Xmx2G" BaseRecalibrator -R /Bio/Database/UCSC/hg19/hg19.fa -I /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.bam --known-sites /Bio/Database/GATK/bundle/hg19/temp/1000G_phase1.indels.hg19.sites.vcf -O /opt/script/pipeline/thalaflow/call/D180001/D180001.sorted.markdup.recal_data.table

经测试,成功输出

4. 批量解压

ls *.gz|xargs gunzip

猜你喜欢

转载自blog.csdn.net/leadingsci/article/details/83622881