Article Directory
Image source: https://en.wikipedia.org/wiki/Fusion_gene
written in front
The main content below is about the fusion of RNA-seq data analysis, and the software used is FusionMap [ FusionMap reference ].
Which software is used for fusion analysis and which software performs better? I found a question and answer in Biostarts that listed some software ( see here ). There are more than 30 fusion analysis software such as STAR-Fusion, STAR-Fusion, deFuse, FusionCatcher, etc., of which about 20 The literature of multiple software was published in 2011-2013, and the literature of FusionMap software was also published in 2011. There are also several software comparison literatures, and the pros and cons of each analysis software will also be mentioned in the literature, and the literature published later will also be compared with the previously published software.
In addition, the FusionMap software should not be updated very early, and it is maintained in the Oshell toolkit .
FusionMap fusion detection principle
Fusion Reads: Seed reads
and Rescued reads
Fusion Direction: Figure Source
FusionMap vs other soft comparisons
Image source literature: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797269/
The red mark in the above table is that, compared to FusionMap, the result is worse than FusionMap. From the above table given in the literature, it can be seen that FusionMap performs well on the three groups and structured data sets, but performs poorly on the sample data of breast cancer and melanoma. The comprehensive evaluation of it belongs to the medium level, but One of its biggest advantages is that it is written in C#, and its running speed is faster than other software.
For the soft comparison literature , what kind of data is used specifically, the parameters used in the analysis of each software, and the scoring criteria for comparison may have an impact on each software.
FusionMap analysis process
Software analysis process pipeline: Figure source
Fusion detection process:
Among them, sequence alignment is based on some improvements made on the basis of GSNAP
software . Introducing GSNAP
(1) Analysis process configuration oscript file example :
http://www.arrayserver.com/wiki/index.php?title=OmicScript_example_for_RNA-Seq_data_analysis_pipeline
(2)Example of software usage:
mono oshell.exe --runscript Base_Dir Script_path/buildIndex.oscript Temp_Dir Mono_Path
Description of FusionMap result file
Result file report : http://www.arrayserver.com/wiki/index.php?title=Fusion_SE_report
header name | meaning |
---|---|
FusionID | Fusion ID information, the format is: FUS_Start_ENDNote [ 1 ] ^{Note[1]}Note [ 1 ] |
Bam.UniqueCuttingPositionCount | The number of Uniq reads, which is equivalent to the deduplication of Seed Reads +Rescued reads |
Bam.SeedCount | As shown in the figure above, assume α αα is the minimum length of softclip at one end, and SeedCount issoftclip长度>=α the number of Reads. (If the value is relatively small, it may not be comparable at all.) The purpose is that these Reads can be used as a seed sequence to expand into a longer fusion sequence, and then use the extended fusion sequence as a self-constructed ref, and compare the marginal fusion sequence to the self-constructed ref. Build ref. If it is PE150bp,α = 25 α=25a=25 , which can be expanded into a fusion sequence of 125+125=250bp at most |
Bam.RescuedCount | The equivalent softclip长度<α number of reads, the reads on the comparison are carried out through the self-constructed ref of SeedReads |
Strand | chain direction |
Chromosome1 | breakpoint 1 chromosome |
Position1 | breakpoint 1 location |
Chromosome2 | breakpoint 2 chromosome |
Position2 | breakpoint 2 position |
KnownGene1 | breakpoint 1 gene |
KnownTranscript1 | Breakpoint 1 transcript |
KnownExonNumber1 | Breakpoint 1 exon number |
KnownTranscriptStrand1 | Breakpoint 1 gene strand direction |
KnownGene2 | breakpoint 2 gene |
KnownTranscript2 | Breakpoint 2 transcript |
KnownExonNumber2 | Breakpoint 2 exon number |
KnownTranscriptStrand2 | Breakpoint 2 gene strand direction |
FusionJunctionSequence | Fusion breakpoint upstream and downstream (30bp) sequences |
FusionGene | fusion gene |
SplicePattern | Fusion Splicing Mode[1] |
SplicePatternClass | Types of fusion splicing patterns[1] |
FrameShift | The format in which frameshift occurs [2] |
FrameShiftClass | Type of frameshift[2] |
Distance | Distance between fusion breakpoints (-1 if not the same chromosome) |
OnExonBoundary | Whether it is on the Exon boundary, None: neither breakpoint is on; Both: both breakpoints are on; Single: one breakpoint is on. |
Filter | Information can be filtered, including: InFamilyList (family gene list)/InBlackList (blacklist list) |
in:
[1] SplicePatternClass
includes:
- CanonicalPatter[Major]: GT-AG
SplicePattern
- CanonicalPatter[Minor]: GC-AG and AT-AC
SplicePattern
- NonCanonicalPatter: all other detected di-nucleotides
[2] FrameShiftClass
including:
- FrameShift: A frameshift has occurred at the fusion.
- InFrame: The fusion point is the whole code (the base of the gene at the breakpoint is a multiple of 3).
FrameShift
The format of the corresponding value is: [0{0,1}1{0,1}2{0,1}->0{0,1}1{0,1}2{0,1}
(python regular expression 0{0,1}
means that the 0 character appears 0-1 times)
- (1) If the value is
->
or->0{0,1}1{0,1}2{0,1}
or0{0,1}1{0,1}2{0,1}->
(for example,->0
// ), it may be that the two breakpoints or one end breakpoint of the fusion alignment are not in the coding region->01
?->012
【InFrame
】 - (2) If the value is
0->1
or1->2
or2->0
(below), it means that after the two genes are fused, no frameshift occurs【InFrame
】 - (3) If the value is
0->2
/0->0
,1->0
/1->1
or2->1
/2->2
, it means that after the two genes are fused, a frame shift occurs【FrameShift
】 - (4) If the value is
->
left or right, there are multiple modes, for example02->2
,012->1
. When multiple patterns are included in case (3), it is [FrameShift
], for example: 0->02, 01->0, 02->2, 1->01, 12->1, 2->12; When at least one of the multiple patterns belongs to situation (2), it is [InFrame】
, for example: 0->012, 0->01, 01->01, etc.
The source of the figure also introduces the fusionMap detection fusion:
Suggested filter : graph source
SeedCount
>= 3; SplicePatternClass
=CanonicalPattern[Major] or CanonicalPattern[Minor]; Filter
=Empty
More stringent conditions: FrameShiftClass
=InFrame; OnExonBoundary
=Both
Gene expression results of single-end/double-end fusion : source of graph (set analysis expression step in oscript configuration)
FusionMap mono CUP settings
High CPU usage, how to set it? 【Not sure yet】
-
FusionMap usage documentation
-
Some installation instructions about FusionMap mentioned that in the control document example, some parameter settings of mono are mentioned: (but I don’t know where to set this file?)
There are related parameters in Mono's help documentation :
parameter | illustrate |
---|---|
–aot | |
Environment variables: | |
MONO_CPU_ARCH | Overrides the automatic CPU detection mechanism. Currently only for arm, eg:MONO_CPU_ARCH="armv4 thumb" mono ... |
MONO_THREADS_PER_CPU | Generally the maximum number of threads in a thread pool will be 20 + (MONO_THREADS_PER_CPU * number of CPUs). The default value of this variable is 10 |
MONO_TLS_SESSION_CACHE_TIMEOUT | The amount of time, in seconds, that the SSL/TLS session cache will keep its entries to avoid new negotiations between the client and server. Negotiation is very CPU intensive, so application-specific custom values may prove useful for small embedded systems. The default is 180 seconds. |
Other references:
- Mono related issues
MONO_THREADS_PER_CPU=100
reference on github - Special attention must be paid to using mono to run c# programs on linux. The problem of while (true) (using sleep) refers to
- Configure supervisor to manage mono program reference
- https://www.mono-project.com/docs/
- Fusion mechanism and detection method