[bioinfo] Fusion detection software FusionMap analysis process and report results


Image via Wikipedia - FusionGene
Image source: https://en.wikipedia.org/wiki/Fusion_gene

written in front

The main content below is about the fusion of RNA-seq data analysis, and the software used is FusionMap [ FusionMap reference ].

Which software is used for fusion analysis and which software performs better? I found a question and answer in Biostarts that listed some software ( see here ). There are more than 30 fusion analysis software such as STAR-Fusion, STAR-Fusion, deFuse, FusionCatcher, etc., of which about 20 The literature of multiple software was published in 2011-2013, and the literature of FusionMap software was also published in 2011. There are also several software comparison literatures, and the pros and cons of each analysis software will also be mentioned in the literature, and the literature published later will also be compared with the previously published software.

In addition, the FusionMap software should not be updated very early, and it is maintained in the Oshell toolkit .
insert image description here

FusionMap fusion detection principle

Fusion Reads: Seed readsand Rescued reads
insert image description here
Fusion Direction: Figure Source
insert image description here

FusionMap vs other soft comparisons

insert image description here
Image source literature: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797269/

The red mark in the above table is that, compared to FusionMap, the result is worse than FusionMap. From the above table given in the literature, it can be seen that FusionMap performs well on the three groups and structured data sets, but performs poorly on the sample data of breast cancer and melanoma. The comprehensive evaluation of it belongs to the medium level, but One of its biggest advantages is that it is written in C#, and its running speed is faster than other software.

For the soft comparison literature , what kind of data is used specifically, the parameters used in the analysis of each software, and the scoring criteria for comparison may have an impact on each software.

FusionMap analysis process

Software analysis process pipeline: Figure source
insert image description here
Fusion detection process:
FusionMap software analysis flow chart
Among them, sequence alignment is based on some improvements made on the basis of GSNAPsoftware . Introducing GSNAP

(1) Analysis process configuration oscript file example :

http://www.arrayserver.com/wiki/index.php?title=OmicScript_example_for_RNA-Seq_data_analysis_pipeline

(2)Example of software usage:

mono oshell.exe --runscript Base_Dir Script_path/buildIndex.oscript Temp_Dir Mono_Path

Description of FusionMap result file

Result file report : http://www.arrayserver.com/wiki/index.php?title=Fusion_SE_report

header name meaning
FusionID Fusion ID information, the format is: FUS_Start_ENDNote [ 1 ] ^{Note[1]}Note [ 1 ]
Bam.UniqueCuttingPositionCount The number of Uniq reads, which is equivalent to the deduplication of Seed Reads+Rescued reads
Bam.SeedCount As shown in the figure above, assume α αα is the minimum length of softclip at one end, and SeedCount issoftclip长度>=αthe number of Reads. (If the value is relatively small, it may not be comparable at all.) The purpose is that these Reads can be used as a seed sequence to expand into a longer fusion sequence, and then use the extended fusion sequence as a self-constructed ref, and compare the marginal fusion sequence to the self-constructed ref. Build ref. If it is PE150bp,α = 25 α=25a=25 , which can be expanded into a fusion sequence of 125+125=250bp at most
Bam.RescuedCount The equivalent softclip长度<αnumber of reads, the reads on the comparison are carried out through the self-constructed ref of SeedReads
Strand chain direction
Chromosome1 breakpoint 1 chromosome
Position1 breakpoint 1 location
Chromosome2 breakpoint 2 chromosome
Position2 breakpoint 2 position
KnownGene1 breakpoint 1 gene
KnownTranscript1 Breakpoint 1 transcript
KnownExonNumber1 Breakpoint 1 exon number
KnownTranscriptStrand1 Breakpoint 1 gene strand direction
KnownGene2 breakpoint 2 gene
KnownTranscript2 Breakpoint 2 transcript
KnownExonNumber2 Breakpoint 2 exon number
KnownTranscriptStrand2 Breakpoint 2 gene strand direction
FusionJunctionSequence Fusion breakpoint upstream and downstream (30bp) sequences
FusionGene fusion gene
SplicePattern Fusion Splicing Mode[1]
SplicePatternClass Types of fusion splicing patterns[1]
FrameShift The format in which frameshift occurs [2]
FrameShiftClass Type of frameshift[2]
Distance Distance between fusion breakpoints (-1 if not the same chromosome)
OnExonBoundary Whether it is on the Exon boundary, None: neither breakpoint is on; Both: both breakpoints are on; Single: one breakpoint is on.
Filter Information can be filtered, including: InFamilyList (family gene list)/InBlackList (blacklist list)

in:

[1] SplicePatternClassincludes:

  • CanonicalPatter[Major]: GT-AG SplicePattern
  • CanonicalPatter[Minor]: GC-AG and AT-AC SplicePattern
  • NonCanonicalPatter: all other detected di-nucleotides

[2] FrameShiftClassincluding:

  • FrameShift: A frameshift has occurred at the fusion.
  • InFrame: The fusion point is the whole code (the base of the gene at the breakpoint is a multiple of 3).

FrameShiftThe format of the corresponding value is: [0{0,1}1{0,1}2{0,1}->0{0,1}1{0,1}2{0,1}(python regular expression 0{0,1}means that the 0 character appears 0-1 times)

  • (1) If the value is ->or ->0{0,1}1{0,1}2{0,1}or 0{0,1}1{0,1}2{0,1}->(for example, ->0// ), it may be that the two breakpoints or one end breakpoint of the fusion alignment are not in the coding region ->01? ->012InFrame
  • (2) If the value is 0->1or 1->2or 2->0(below), it means that after the two genes are fused, no frameshift occurs【InFrame
  • (3) If the value is 0->2/ 0->0, 1->0/ 1->1or 2->1/ 2->2, it means that after the two genes are fused, a frame shift occurs【FrameShift
  • (4) If the value is ->left or right, there are multiple modes, for example 02->2, 012->1. When multiple patterns are included in case (3), it is [ FrameShift], for example: 0->02, 01->0, 02->2, 1->01, 12->1, 2->12; When at least one of the multiple patterns belongs to situation (2), it is [ InFrame】, for example: 0->012, 0->01, 01->01, etc.

The source of the figure also introduces the fusionMap detection fusion:
insert image description here

Suggested filter : graph source

SeedCount>= 3; SplicePatternClass=CanonicalPattern[Major] or CanonicalPattern[Minor]; Filter=Empty
More stringent conditions: FrameShiftClass=InFrame; OnExonBoundary=Both
insert image description here

Gene expression results of single-end/double-end fusion : source of graph (set analysis expression step in oscript configuration)

insert image description here


FusionMap mono CUP settings

High CPU usage, how to set it? 【Not sure yet】

There are related parameters in Mono's help documentation :

parameter illustrate
–aot
Environment variables:
MONO_CPU_ARCH Overrides the automatic CPU detection mechanism. Currently only for arm, eg:MONO_CPU_ARCH="armv4 thumb" mono ...
MONO_THREADS_PER_CPU Generally the maximum number of threads in a thread pool will be 20 + (MONO_THREADS_PER_CPU * number of CPUs). The default value of this variable is 10
MONO_TLS_SESSION_CACHE_TIMEOUT The amount of time, in seconds, that the SSL/TLS session cache will keep its entries to avoid new negotiations between the client and server. Negotiation is very CPU intensive, so application-specific custom values ​​may prove useful for small embedded systems. The default is 180 seconds.

Other references:


Guess you like

Origin blog.csdn.net/sinat_32872729/article/details/102607711