Pysam processing bam file

Pysam available to handle bam file

installation:

You can use pip or conda

 

use:

Pysam have a lot of functions, the main reading functions are:

  • AlignmentFile: reading BAM / CRAM / SAM file

  • VariantFile: reading data variability (VCF or BCF)

  • TabixFile: tabix read the file from the index;

  • FastaFile: reading fasta sequence file;

  • FastqFile: fastq sequencing reads the sequence file

Commonly used in the first and second.

 

example:

import pysam

bf = pysam.AlignmentFile ( "in.bam", "rb"); wherein r = read, b:. binary binaries. bam file index

bf is an iterator can next () or for reading

for i in bf:

    print i.reference_name,i.pos,i.mapq,i.isize

result:

ctg000331_np121 144935 27 -284
ctg000331_np121 144940 48 291
ctg000331_np121 144941 48 309
ctg000331_np121 144944 48 255
ctg000331_np121 144946 27 -370
ctg000331_np121 144947 27 -346

  • Representative i.reference_name read than the reference sequence to the chromosome of the id;

  • Representative i.pos read position of alignment;

  • Representative i.mapq read than the quality value;

  • Representative PE read i.isize direct insertion fragment length, sometimes referred to as Fragment length;

 

Some features:

  • check_index() 

          Detecting whether the index file exists is the true presence

  • close()

          Run out remember to close

  • count(self,contig=None, start=None, stop=None, region=None, until_eof=False, read_callback='nofilter', reference=None,end=None)
             Calculating the number of reads on the target region than

           bf.count(contig="ctg000331_np121", start=1, stop=6000)
           24

  • count_coverage(self, contig=None, start=None, stop=None, region=None, quality_threshold=15, read_callback='all', reference=None, end=None)
         Computing coverage within the target area. Returns a 4-dimensional array, the representative coverage ACGT, and the length of each array dimension is 100, which number represents the degree of coverage of each base position.

       bf.count_coverage(contig="ctg000331_np121",start=1,stop=100)
  • fetch(self, contig=None, start=None, stop=None, region=None, tid=None, until_eof=False, multiple_iterators=False, reference=None, end=None)
          All reads extracted into alignment target area. Returns an iterator, by a for loop or removed therefrom reads the next function, we use the next () function reads the first extraction, is represented by AlignedSegment reads objects, the method may be performed by the built object and then reads this article Some queries.
          allreads=bf.fetch(contig="ctg000331_np121",start=1,stop=10000)
   Is an iterator, you can use a for loop to get
  • get_index_statistics (self)
    by the number of statistical reads the index file BAM on each chromosome mapped / unmapped of.
bf.get_index_statistics()
 

Guess you like

Origin www.cnblogs.com/zhanmaomao/p/11990448.html
BAM