Follow the data analysis of Nature Communications: R language DESeq2 package for otu differential abundance analysis and volcano plot display results...

paper

Microbiome differential abundance methods produce different results across 38 datasets

data link

https://figshare.com/articles/dataset/16S_rRNA_Microbiome_Datasets/14531724

code link

https://github.com/nearinj/Comparison_of_DA_microbiome_methods

This person's github homepage has data and code for other papers

https://github.com/jnmacdonald/differential-abundance-analysis This link has a lot of code titled differential abundance analysis

In the past two days, I was watching the metagenome using otu abundance data for differential abundance analysis. I found this paper and read the abstract. It seems to compare the similarities and differences of the results obtained by different differential abundance analysis methods. Repeat the code here for differential abundance analysis using DESeq2

The dataset I use here is

  • Abundance dataArcticFireSoils_genus_table.tsv

  • packet dataArcticFireSoils_meta.tsv

Here is a question: There are two abundance table data provided by the paper, and the other is with a rare suffix. I don't know the difference between the two for the time being.

The first is to read the dataset

ASV_table <- read.table("metagenomics/dat01/ArcticFireSoils_genus_table.tsv", 
                        sep="\t", 
                        skip=1, 
                        header=T, 
                        row.names = 1,
                        comment.char = "", 
                        quote="", check.names = F)
groupings <- read.table("metagenomics/dat01/ArcticFireSoils_meta.tsv", 
                        sep="\t", 
                        row.names = 1, 
                        header=T, 
                        comment.char = "", 
                        quote="", 
                        check.names = F)
dim(ASV_table)
dim(groupings)
groupings$Fire<-factor(groupings$Fire)

Here you need to assign a factor to represent the grouping, otherwise there will be a warning message in the step of deseq2 later

Determine whether the sample names of the abundance data and the sample names of the grouped data are in the same order

identical(colnames(ASV_table), rownames(groupings))

return false inconsistent

Take the intersection of two sample names

rows_to_keep <- intersect(colnames(ASV_table), rownames(groupings))

Reselect samples based on the result of taking the intersection

groupings <- groupings[rows_to_keep,,drop=F]
ASV_table <- ASV_table[,rows_to_keep]

Question: What is the function of the drop parameter in the brackets here?

Judge the order of the sample names in the two datasets again

identical(colnames(ASV_table), rownames(groupings))

This time returns TRUE

Modify the column names of the grouping file

colnames(groupings)[1] <- "Groupings"

Differential abundance analysis

library(DESeq2)
dds <- DESeq2::DESeqDataSetFromMatrix(countData = ASV_table,
                                      colData=groupings,
                                      design = ~ Groupings)
dds_res <- DESeq2::DESeq(dds, sfType = "poscounts")

res <- results(dds_res, 
               tidy=T, 
               format="DataFrame",
               contrast = c("Groupings","Fire","Control"))
head(res)
45e2878c8d39f0c38d462c21463e86d6.png
image.png

volcano map code

DEG<-res
logFC_cutoff<-2
DEG$change<-as.factor(ifelse(DEG$pvalue<0.05&abs(DEG$log2FoldChange)>logFC_cutoff,
                             ifelse(DEG$log2FoldChange>logFC_cutoff,"UP","DOWN"),
                             "NOT"))
this_title <- paste0('Cutoff for logFC is ',round(logFC_cutoff,3),
                     '\nThe number of up gene is ',nrow(DEG[DEG$change =='UP',]) ,
                     '\nThe number of down gene is ',nrow(DEG[DEG$change =='DOWN',]))
DEG<-na.omit(DEG)
library(ggplot2)
ggplot(data=DEG,aes(x=log2FoldChange,
                    y=-log10(pvalue),
                    color=change))+
  geom_point(alpha=0.8,size=3)+
  labs(x="log2 fold change")+ ylab("-log10 pvalue")+
  ggtitle(this_title)+theme_bw(base_size = 20)+
  theme(plot.title = element_text(size=15,hjust=0.5),)+
  scale_color_manual(values=c('#a121f0','#bebebe','#ffad21')) -> p1
p1+xlim(NA,10)+ylim(NA,30) -> p2

library(patchwork)
p1+p2
9313d1d28e7623bfbff338de0e0cc324.png
image.png

The sample data and code of today's tweet can be obtained in the comment area of ​​yesterday's ad tweet, click the blue text to go to yesterday's ad tweet

12 hours to delete, the Excel practice book bought at 199, 0.01 to send fans!

Welcome everyone to pay attention to my public number

Xiao Ming's data analysis notebook

Xiaoming’s data analysis notebook public account mainly shares: 1. Simple examples of R language and python for data analysis and data visualization; 2. Reading notes on horticultural plants related transcriptomics, genomics, and population genetics literature; 3. Bioinformatics Learn introductory study materials and your own study notes!

Guess you like

Origin blog.csdn.net/weixin_45822007/article/details/124223966