[R Language]——Drawing a heat map grouped by clustering results 3 (nanny-level tutorial)

The previous issue "[R Language] - Clustering Heatmap, Row, Row, Grouping Information Annotation Heatmap 2" introduced the R language pheatmap package to draw grouping information annotation heatmaps. This issue mainly introduces another grouping information annotation method, through To analyze the data clustering results, preset data clustering clusters, and draw a clustering heat map grouped by the clustering results for this data information.

1 Data preparation

Data input format (csv format):

2 R package loading and data import

#下载包#

install.packages("pheatmap")

install.packages("RColorBrewer")

#加载包#

library("pheatmap")

library("RColorBrewer")

#加载绘图数据#

data<-read.table(file='C:/Rdata/jc/pheatmap.csv',header=TRUE,row.names= 1,sep=',')

head(data) #查看数据

#data=log2(data[,1:6]+1) #对基因表达量数据处理

#data <- as.matrix(data) #转变为matrix格式矩阵

#head(data)

3 Heatmap grouped by clustering results

3.1 View the number of clusters in the data

Before drawing a heat map grouped by clustering results, it is necessary to estimate the number of clusters in the data to provide a basis for subsequent selection of the number of clusters. Usually, the number of clusters where the sum of squares within a class begins to decrease gradually is selected as the optimal number of clusters:

#查看数据的分簇数

data <- t(apply(data, 1, scale))

tested_cluster <- 30  #检验的分簇数

wss <- (nrow(data)-1) * sum(apply(data, 2, var))

for (i in 2:tested_cluster) {

  wss[i] <- kmeans(data, centers=i,iter.max=100,  nstart=25)$tot.withinss

}

plot(1:tested_cluster, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")

Figure 1 Cluster diagram

3.2 Generate initial heat map

#列信息注释

ann_col = data.frame(Sample=c(rep("control",3),rep("test",3)))#创建分组列

row.names(ann_col) = colnames(data) #这一行必须有,否则会报错:Error in check.length("fill") :  'gpar' element 'fill' must not be length 0

ann_color = list(Sample = c(control="#0089CF", test="#E889BD")) #定义分组颜色
#热图绘制

p<-pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题", #表示热图的标题名字

         color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

         angle_col = "45", #表示列标签的角度

         gaps_row = NULL,  #仅在未进行行聚类时使用,表示在行方向上热图的隔断位置

         gaps_col = c(1,2,3,4,5,6),  #仅在未进行列聚类时使用,表示在列方向上热图的隔断位置

         annotation_row = NA, annotation_col = ann_col, #表示是否对行、列进行注释,默认NA

         annotation = NA, annotation_colors = ann_color,  #表示行注释及列注释的颜色,默认NA

         annotation_legend = TRUE, #表示是否显示注释的图例信息

         annotation_names_row = TRUE, annotation_names_col = TRUE) #表示是否显示行、列注释的名称

summary(p)

3.3 Generate clustering tree

Use "tree_row" and "tree_col" to extract the corresponding row and row order, and then construct a clustered data set according to the preset number of clusters to provide a grouping annotation basis for subsequent heat map drawing:

#提取热图的行方向(基因)的聚类树

clu<- p$tree_row

#clu<- p$tree_row$order

#clu<- p$tree_col$order

#对聚类树进行分簇;

cluster<- factor(cutree(clu,20)) #数值为预设分簇数

cluster

#转成数据框;

cut.df <- data.frame(cluster)

#绘制聚类树;

plot(clu,hang = -1,cex=0.6,axes=FALSE,ann=FALSE)

 

Figure 2 Gene clustering tree

3.4 Draw heat map according to clustering results

Use "annotation_row" to annotate the clustering data results to draw a cluster heat map:

#热图绘制

p<-pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

            cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

            cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

            treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

            border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

            cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

            display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

            fontsize_number = 6, #表示热图上显示数字的字体大小

            number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

            number_color = "grey30", #表示热图单元格上显示的数据字体颜色

            fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

            show_rownames = T, show_colnames = T, #表示是否显示行名、列名

            main = "Gene标题", #表示热图的标题名字

            color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

            angle_col = "45", #表示列标签的角度

            gaps_row = NULL,  #仅在未进行行聚类时使用,表示在行方向上热图的隔断位置

            gaps_col = c(1,2,3,4,5,6),  #仅在未进行列聚类时使用,表示在列方向上热图的隔断位置

            annotation_row = cut.df, annotation_col = ann_col, #表示是否对行、列进行注释,默认NA

            annotation = NA, annotation_colors = ann_color,  #表示行注释及列注释的颜色,默认NA

            annotation_legend = TRUE, #表示是否显示注释的图例信息

            annotation_names_row = TRUE, annotation_names_col = TRUE) #表示是否显示行、列注释的名称

Figure 3 Clustering and grouping heat map

Well, this sharing ends here. In the next issue, we will share the grouping of heat maps based on clustering conditions.

 

Scan the QR code to follow the official account and send "Group Heat Map 3" to get the complete code and demo data package

Guess you like

Origin blog.csdn.net/weixin_54004950/article/details/128224856