maftools | tumor mutation TCGA summary data, analysis and visualization

 

This article first appeared in the public number "life letter depot", https://mp.weixin.qq.com/s/WG4JHs9RSm5IEJiiGEzDkg

 

Before the introduction of the use of maftools | scratch draw level published oncoplot (waterfall) oncoplot R-maftools package drawn genomics mutations result (MAF) or called "waterfall", as well as some of the details of the changes and comments.

The paper presents the maftools for other applications MAF file is easier to understand and reproduce, use this TCGA download LIHC data.

A data portion

setwd ( "C: \\ the Users \\ Maojie \\ Desktop \\ maftools-V2 \\ ") 
Library ( maftools)
laml.maf = read.csv to ( "TCGA.LIHC.mutect.maf.csv", header = TRUE ) # the drawings show only some statistical maf, read-only data into a set of learning, without the addition of clinical data laml = read.maf ( maf = laml.maf) bASIC # view data laml An Object of class   the MAF ID Summary   Mean Median 1 :             NCBI_Build      






                       
1     NA     NA
2:                 Center       1     NA     NA
3:                Samples     364     NA     NA
4:                 nGenes   12704     NA     NA
5:        Frame_Shift_Del    1413  3.893      3
6:        Frame_Shift_Ins     551  1.518      1
7:           In_Frame_Del     277  0.763      0
8:           In_Frame_Ins     112  0.309      0
9:      Missense_Mutation   28304 77.972     63
10:      Nonsense_Mutation    1883  5.187      4
11:       Nonstop_Mutation      45  0.124      0
12:            Splice_Site    1051  2.895      2
13: Translation_Start_Site      65  0.179      0
14:                  total   33701 92.840     75

 

Information can be summary of gene MAF files, sample output file to laml prefix summary

write.mafSummary(maf = laml, basename = 'laml')

laml_geneSummary.txt

img

laml_sampleSummary.txt

img

 

Two drawing part

1, first draw the overall results of Fig MAF file

plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)

img

 

2, oncoplot FIG.

#oncoplot for top ten mutated genes.
oncoplot(maf = laml, top = 20)

img

Add SCNA information, add P value information, add comments clinical information, can change the color and other reference links. .

3 Oncostrip

You can use oncostripthe function to show mutation of a specific gene in a sample, here for three liver genes of interest in more 'TP53', 'CTNNB1', 'ARID1A', as follows:

oncostrip(maf = laml, genes = c('TP53','CTNNB1', 'ARID1A'))

img

4 Transition , Transversions

titv函数将SNP分类为Transitions_vs_Transversions,并以各种方式返回汇总表的列表。汇总数据也可以显示为一个箱线图,显示六种不同转换的总体分布,并作为堆积条形图显示每个样本中的转换比例。

laml.titv = titv(maf = laml, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = laml.titv)

img

5 Rainfall plots

使用rainfallPlot参数绘制rainfall plots,展示超突变的基因组区域。detectChangePoints设置为TRUE,rainfall plots可以突出显示潜在变化的区域.

brca <- system.file("extdata", "brca.maf.gz", package = "maftools")
brca = read.maf(maf = brca, verbose = FALSE)
rainfallPlot(maf = laml, detectChangePoints = TRUE, pointSize = 0.6)

img

6 Compare mutation load against TCGA cohorts

通过tcgaComapre函数实现laml(自有群体)与TCGA中已有的33个癌种队列的突变负载情况的比较。

#cohortName 给输入的队列命名
laml.mutload = tcgaCompare(maf = laml, cohortName = 'LIHC-2')

img

7 Genecloud

使用 geneCloud参数绘制基因云,每个基因的大小与它突变的样本总数成正比。

geneCloud(input = laml, minMut = 15)

img

8 Somatic Interactions

癌症中的许多引起疾病的基因共同发生或在其突变模式中显示出强烈的排他性。可以使用somaticInteractions函数使用配对Fisher 's精确检验来分析突变基因之间的的co-occurring 或者exclusiveness。

#exclusive/co-occurance event analysis on top 10 mutated genes. 
Interact <- somaticInteractions(maf = laml, top = 25, pvalue = c(0.05, 0.1))
#提取P值结果
Interact$gene_sets
                gene_set       pvalue
1:   CTNNB1, AXIN1, TP53 0.0001486912
2:  CTNNB1, TP53, ARID1A 0.0018338597
3:     AXIN1, TP53, APOB 0.0087076043
4:     CSMD3, AXIN1, ALB 0.0130219628
5:      AXIN1, TP53, ALB 0.0173199619
6: CTNNB1, AXIN1, ARID1A 0.0363739468

img

可以看到TP53和CTNNB1之间有较强的exclusiveness,也与文献中的结论一致。

 

9 Comparing two cohorts (MAFs)

由于癌症的突变模式各不相同,因此可是 mafComapre参数比较两个不同队列的差异突变基因,检验方式为fisher检验。

#输入另一个 MAF 文件
Our_maf <- read.csv("Our_maf.csv",header=TRUE)
our_maf = read.maf(maf = Our_maf)

#Considering only genes which are mutated in at-least in 5 samples in one of the cohort to avoid bias due to genes mutated in single sample.
pt.vs.rt <- mafCompare(m1 = laml, m2 = our_maf, m1Name = 'LIHC', m2Name = 'OUR', minMut = 5)
print(pt.vs.rt)

img

  • result部分会有每个基因分别在两个队列中的个数以及P值和置信区间等信息。

  • SampleSummary 会有两个队列的样本数。

 

1) Forest plots

比较结果绘制森林图

forestPlot(mafCompareRes = pt.vs.rt, pVal = 0.01, color = c('royalblue', 'maroon'), geneFontSize = 0.8)

img

 

10 Oncogenic 信号通路

`OncogenicPathways 功能查看显著富集通路

OncogenicPathways(maf = laml)
#会输出统计结果
Pathway alteration fractions
      Pathway  N n_affected_genes fraction_affected
1:    RTK-RAS 85               68         0.8000000
2:        WNT 68               55         0.8088235
3:      NOTCH 71               52         0.7323944
4:      Hippo 38               30         0.7894737
5:       PI3K 29               24         0.8275862

img

Enriched above can be selected to complete passage of a mutation of interest show:

PlotOncogenicPathways(maf = laml, pathways = "PI3K")

img

 

Well, that is the use of maftools package summary of omics data format of the MAF, analysis, visualization.

 

Background reply "maf file" to get files and code examples of maf

img

[Feel good, enjoy the lower right corner click on a "look" forward is appreciated, thank you! ]

 

 

Guess you like

Origin www.cnblogs.com/Mao1518202/p/11530946.html