Setup:
# Use conda installation lumpy- sv, error: # CondaHTTPError: HTTP 000 CONNECTION FAILED for url <HTTP: // mirrors.ustc.edu.cn/anaconda/pkgs/free/noarch/repodata.json> # the Elapsed: - # An Trying to the when error occurred HTTP Retrieve the URL of the this. # HTTP errors are intermittent Often field, and the Simple A retry by Will you GET ON your Way. # error principle: # Conda can not create a virtual environment and can not install other third-party packages # default image source access too slow, resulting in a timeout will lead to updates and downloads fail. , # Solution: # replace the mirror source Tsinghua mirroring, and delete the default image source. You can modify the file condarc # Tsinghua University found that mirror 2019 to stop the service, the service can be changed USTC mirror! # Installation: Conda install -c bioconda lumpy- sv # View rc file location: Conda config--show-sources
Python2.7 create a virtual environment:
# Create lumpy- sv used python2.7 virtual environment Conda the Create -n Python 2. 7 Python = 2.7 source activate Python 2. 7 # If you can not successfully activated, then the source activate the full path name of the virtual environment Conda install -c bioconda lumpy- sv Conda install -c bioconda samblaster
lumpy-sv pretreatment prior to analysis data:
data preprocessing before #lumpy Analysis: BWA MEM -R & lt " @RG \ tID: sample_id} {$ \ TSM: SAMPLE_T} {$ \ TLB: lib " $ {REF}} $ {data_dir / S008_dnahezi-A_HX20 - $ { } {$ data_dir -cfDNA_AHYHJHDSXX_S1_L001_R1_001.R1.clean.fastq.gz sample_id} / S008_dnahezi-A_HX20 - $ {sample_id} - cfDNA_AHYHJHDSXX_S1_L001_R1_001.R2.clean.fastq.gz | samblaster --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 | -S -b samtools View - > $ {result_dir} / $ {} .bam SAMPLE_T # BWA command: # " @RG \ tID: $ the Sample \ the TSM: $ the Sample \ tlb: WES \ tPL: Illumina " is read group information for distinguishing between different samples, # wherein each unique ID ID of the group, # SM represents the name of the sample, # an LB representatives library, represents the name of the library, # PL represents the platform, represents the name of sequencing platforms, there are optional values Illumina, PacBio # --maxSplitCount INT of the Maximum Number The Split Alignments for A to the Read BE included in Splitter File . # - Minimum non-overlaping INT -minNonOverlap Base pairs the BETWEEN TWO alignments for a to the Read bE included in Splitter file . # samblaster command: # after completing the required ratio to handle it with samblaster, # mainly bam file is not normal than the results mark for processing took over. View samtools -b -F 1294 sample.bam | samtools the Sort -> sample.discordants.sorted.bam # will discordant than the extracted # -F 1294 : samtools flags 1294 # 1294 can be found expressed " PROPER_PAIR, UNMAP, munmap, the SECONDARY, DUP " , # take - over F means that these markers associated with records in our screened will not appear, # means that screening records to comply with the following requirements # can not be PROPER_PAIR: is alignment tools are considered correct alignment to the genome, in the same chromosome, in the case of the same chain, is common 83,147 and 99, 163 # can not be UNMAP and munmap, i.e. short read paired with at least one can be aligned to the reference genome # can not be the SECONDARY, that is, he must be a primary alignment # optical repeat, the DUP, the more can not be a samtools View -h sample.bam | scripts / extractSplitReads_BwaMem -i stdin | samtools View -Sb - | samtools the Sort -> sample.splitters.sorted.bam # Lumpy use the software that comes with extractSplitReads_BwaMem will splitreads extracted, if conda download lumpy can not find the script, and then download it again to github source code package just fine
lumpy analysis:
-B tumor.bam lumpyexpress, -S tumor.splitters.bam normal.bam, normal.splitters.bam -D tumor.discordants.bam, normal.discordants.bam - O tumor_normal.vcf
# lumpyexpress used for variant calling, there are tumor -only mode, there are tumor-normal pairing mode, to modify the above code, can not be used directly