Prepare data
Two data need to be prepared: one is the gene expression profile, and the other is the annotation of the gene (it can be KO annotation, or it can be any other annotation)
gene expression profiling
sample1 | sample2 | sample3 | ... | |
---|---|---|---|---|
genes1 | 1.0 | 2.0 | 2.0 | ... |
gene2 | 3.0 | 3.0 | 4.0 | ... |
gene3 | 5.0 | 5.0 | 5.0 | ... |
gene4 | 6.0 | 7.0 | 9.0 | ... |
... | ... | ... | ... | ... |
Path information
gene | IS | pathway |
---|---|---|
genes1 | KO1 | pathway1 |
gene2 | KO2 | pathway1 |
gene3 | KO2 | pathway2 |
... | ... | ... |
simulated data
library(tidyverse)
library(magrittr)
library(circlize)
#模拟数据
## Data1
fpkm <- rbind(cbind(matrix(rnorm(500*3, mean = 1), nr = 500),
matrix(rnorm(500*3, mean = 2), nr = 500),
matrix(rnorm(500*3, mean = 3), nr = 500)))
fpkm <- fpkm[sample(500, 500), ] # randomly permute rows
rownames(fpkm) <- paste0("gene", seq(500))
colnames(fpkm) <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
fpkm %<>% as.data.frame() %>% mutate(gene = row.names(.))
# Data2
pathways <- rep(paste0("pathway", seq(6)), sample(12:20, size = 6)) %>% sample(70)
KOs <- rep(paste0("KO", seq(20)), sample(5:20, size = 20, replace = TRUE)) %>% sample(70)
KOannotation <- data.frame(KO=KOs, pathway=pathways)
KOannotation <- KOannotation[sample(70, 200, TRUE),]
KOannotation$gene <- sample(paste0("gene", seq(500)),200)
# 假设你有富集到的想要可视化的通路
maps <- c("pathway1", "pathway2", "pathway3")
samples <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
Think about drawing data types
First, let’s briefly introduce the circos object. Just like an ordinary graph has an x-axis and a Y-axis, you can understand that a circos graph has many axes (the specific number is determined by your data). Then there are naturally corresponding positions on each axis (1, 2, 3, 4 in the figure below).
So naturally it is easy to imagine that if you want to draw a link, you need a data like this
from_axis | from_position | to_axis | to_position |
---|---|---|---|
A | 1 | B | 4 |
A | 2 | C | 5 |
A | 3 | D | 6 |
Furthermore, if you want to control the width of the link, you should specify the starting and ending positions of each link.
from_axis | from_position_start | from_position_end | to_axis | to_position_start | to_position_end |
---|---|---|---|---|---|
A | 0.5 | 1.5 | B | 3.5 | 4.5 |
A | 1.5 | 2.5 | C | 4.5 | 5.5 |
A | 2.5 | 3.5 | D | 5.5 | 6.5 |
After thinking about links, think about heat maps. First confirm that you are using circos.heatmap()
a drawn heat map (as shown above). This data is relatively simple, so consider drawing the heat map first, and then consider how to draw the chord map in the middle.
heat map
Data cleaning
# 准备热图颜色
col_fun1 = colorRamp2(c(-2, 0, 2), c("#247ab5", "white", "#fda1a0"))
# 需要画图的基因
plot_gene <- KOannotation %>%
filter(pathway %in% maps) %>%
pull(gene) %>%
{filter(fpkm, row.names(fpkm) %in% .)}
# 需要画图的KO
plot_KO <- plot_gene %>%
left_join(KOannotation) %>%
filter(pathway %in% maps) %>% # There are some unenriched map
group_by(KO) %>%
summarise(across(samples,sum))
# 需要画图的Pathway
plot_map <- plot_gene %>%
left_join(KOannotation) %>%
filter(pathway %in% maps) %>% # There are some unenriched map
group_by(pathway) %>%
summarise(across(samples,sum))
plot_data1 <- bind_rows(plot_gene %>% rename(id=gene),
plot_KO %>% rename(id=KO),
plot_map %>% rename(id=pathway)) %>%
`row.names<-`(.$id) %>%
select(-id)
plot_data1 <- t(scale(t(plot_data1))) %>% as.data.frame()
# 在热图上划分为gene、KO、pathway
lev_split = row.names(plot_data1) %>% str_match("[a-zA-Z]+") %>% factor()
Draw a picture
circos.clear()
circos.par(gap.degree=10, track.height=0.1)
# 分多次画表达谱数据,更有层次
circos.heatmap(plot_data1[samples[1:3]], split = lev_split , col = col_fun1, rownames.side = "outside", cluster = TRUE)
circos.heatmap(plot_data1[samples[7:9]], split = lev_split , col = col_fun1)
circos.heatmap(plot_data1[samples[4:6]], split = lev_split , col = col_fun1)
According to lev_split
, this heat map is divided into gene
, KO
, and pathway
three axes. What needs to be pointed out here is that the length of each gene, KO, and pathway on each axis is 1. For example, the position of gene67 on the gene axis is 0-1, and the position of pathway2 on the pathway axis is 0-1. Therefore, if we want to draw a link, according to the previous discussion, if there are three KOs that need to be connected to pathway2, we should need data similar to the following:
from_axis | from_position_start | from_position_end | to_axis | to_position_start | to_position_end |
---|---|---|---|---|---|
IS | 0.5 | 1.5 | pathway | 0 | 0.333 |
IS | 1.5 | 2.5 | pathway | 0.333 | 0667 |
IS | 2.5 | 3.5 | pathway | 0.667 | 1 |
Note the above table. There may be multiple KOs connected to a pathway, so we need to reasonably split the start and end positions to avoid overlap. Another advantage of doing this is that it can make the link lines thicker and thinner, which looks much more beautiful. so
On the other hand, we should note that since the genes on the circos heat map are arranged according to the clustering results, the order of the data in the data frame is different. Therefore, we first need to obtain each gene and KO after drawing the map. , the coordinates of the pathway on its corresponding axis. At this time, you need to circlize
obtain get.cell.meta.data
the corresponding information from the graph.
Chord diagram
Data cleaning
According to the results of the above discussion, our data cleaning should achieve two purposes:
-
Obtain the information of each axis on gene, KO, and pathway after the heat map is generated, which can be organized into the following format:
id sector position genes1 gene 23 KO1 IS 45 pathway1 pathway 56 Table A
-
Calculate the one-to-many gene-KO relationship and KO-pathway relationship to obtain relative positions
from_axis from_position_start from_position_end to_axis to_position_start to_position_end gene 0.5 1.5 IS 5 5.33 gene 1.5 2.5 IS 5.33 5.66 gene 2.5 3.5 IS 5.66 6 Table B
Pay attention to the above table. Since the three genes are connected to the same KO, I connected them to different positions of the KO.
Furthermore, if you want to obtain this table, you can split the process into the following steps:
2.1 Generate a connection object table
from to genes1 KO2 gene2 KO1 KO1 pathway1 2.2 Calculate a displacement based on the number of times the connection object appears in the table
from to from_start from_end to_start to_end genes1 KO2 0 1 0 0.333 gene2 KO1 0 1 0.333 0.667 KO1 pathway1 0 0.5 0.667 1 KO1 pathway2 0.5 1 0 0.0333 Table C
2.3 Combine Table A and Table C to calculate Table B
After clarifying the idea, the following is the code
# 1 获得gene、KO、pathway在每个轴上的位置
plot_data3 = data.frame()
for(lev in levels(lev_split)){
a <- rownames(plot_data1)[lev_split==lev][get.cell.meta.data("row_order", sector.index = lev)]
a <- seq(length(a)) %>% `names<-`(a) %>% enframe("id", "position")
a$sector = lev
plot_data3 <- rbind(plot_data3, a)
}
plot_data3$position <- plot_data3$position - 1 # 因为每个元素的范围是0~1,所以统一减一方便后面相加
#2.1 获取点对点连接表
plot_data2 <- KOannotation %>%
filter(gene %in% row.names(plot_data1), KO %in% row.names(plot_data1)) %>%
select(gene, KO) %>%
rename(from=gene, to=KO)
tmp_obj1 <- KOannotation %>%
filter(KO %in% row.names(plot_data1), pathway %in% row.names(plot_data1)) %>%
select(KO, pathway) %>%
rename(from=KO, to=pathway)
plot_data2 %<>% bind_rows(tmp_obj1) %>% distinct(from, to)
#2.2 计算位移
tmp_obj1<-plot_data2 %>%
group_by(from) %>%
mutate(V1=1/n()) %>%
group_by(from) %>%
mutate(from_end = cumsum(V1),
from_start = from_end-V1) %>%
select(from, to, from_start, from_end)
tmp_obj2<-plot_data2 %>%
group_by(to) %>%
mutate(V1=1/n()) %>%
group_by(to) %>%
mutate(to_end = cumsum(V1),
to_start = to_end-V1) %>%
select(from, to, to_start, to_end)
plot_data2 %<>% left_join(tmp_obj1) %>% left_join(tmp_obj2)
# 2.3 合并上述两表
plot_data2 %<>%
left_join(plot_data3, by=c('from'='id')) %>%
rename('from_position'=position, 'from_sector'=sector) %>%
left_join(plot_data3, by=c('to'='id')) %>%
rename('to_position'=position, 'to_sector'=sector)
Since the above plot_data2
includes the connection relationships between all points, we may not need so many, so you can choose the data you want to display. This step may require drawing the graph several times to determine the data that needs to be displayed.
#3. 进一步筛选想要展示的连线
# highlight the pathway I wanted
highlight_pathway <- c("pathway1", "pathway2")
highlight_KO <- c("KO5", "KO6", "KO20")
highlight_gene <- c("gene292", "gene256", "gene67", "gene146", "gene52", "gene391", "gene139", "gene327", "gene218", "gene142", "gene375", "gene194")
plot_link <- plot_data2 %>% filter(to %in% c(highlight_pathway, highlight_KO)) %>% mutate(col="#dcdcdc80")
highlight_link <- plot_link %>% filter (from %in% c(highlight_gene, highlight_KO)) %>% mutate(col="#fb9b9a80")
plot_link <- plot_link %>% filter (!from %in% c(highlight_gene, highlight_KO))
plot_data2<-rbind(plot_link, highlight_link)
Graphing using circos.link
loops
for ( idx in seq(nrow(plot_data2))){
tmp_obj <- plot_data2[idx,]
circos.link(tmp_obj[['from_sector']],
c(tmp_obj[['from_position']] + tmp_obj[['from_start']],
tmp_obj[['from_position']] + tmp_obj[['from_end']]),
tmp_obj[['to_sector']],
c(tmp_obj[['to_position']] + tmp_obj[['to_start']],
tmp_obj[['to_position']] + tmp_obj[['to_end']]),
col = tmp_obj[['col']],
border = NA
)
}
All code
library(tidyverse)
library(magrittr)
library(circlize)
#模拟数据
## Data1
fpkm <- rbind(cbind(matrix(rnorm(500*3, mean = 1), nr = 500),
matrix(rnorm(500*3, mean = 2), nr = 500),
matrix(rnorm(500*3, mean = 3), nr = 500)))
fpkm <- fpkm[sample(500, 500), ] # randomly permute rows
rownames(fpkm) <- paste0("gene", seq(500))
colnames(fpkm) <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
fpkm %<>% as.data.frame() %>% mutate(gene = row.names(.))
# Data2
pathways <- rep(paste0("pathway", seq(6)), sample(12:20, size = 6)) %>% sample(70)
KOs <- rep(paste0("KO", seq(20)), sample(5:20, size = 20, replace = TRUE)) %>% sample(70)
KOannotation <- data.frame(KO=KOs, pathway=pathways)
KOannotation <- KOannotation[sample(70, 200, TRUE),]
KOannotation$gene <- sample(paste0("gene", seq(500)),200)
# 假设你有富集到的想要可视化的通路
maps <- c("pathway1", "pathway2", "pathway3")
samples <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
# 准备热图颜色
col_fun1 = colorRamp2(c(-2, 0, 2), c("#247ab5", "white", "#fda1a0"))
# 需要画图的基因
plot_gene <- KOannotation %>%
filter(pathway %in% maps) %>%
pull(gene) %>%
{filter(fpkm, row.names(fpkm) %in% .)}
# 需要画图的KO
plot_KO <- plot_gene %>%
left_join(KOannotation) %>%
filter(pathway %in% maps) %>% # There are some unenriched map
group_by(KO) %>%
summarise(across(samples,sum))
# 需要画图的Pathway
plot_map <- plot_gene %>%
left_join(KOannotation) %>%
filter(pathway %in% maps) %>% # There are some unenriched map
group_by(pathway) %>%
summarise(across(samples,sum))
plot_data1 <- bind_rows(plot_gene %>% rename(id=gene),
plot_KO %>% rename(id=KO),
plot_map %>% rename(id=pathway)) %>%
`row.names<-`(.$id) %>%
select(-id)
plot_data1 <- t(scale(t(plot_data1))) %>% as.data.frame()
# 在热图上划分为gene、KO、pathway
lev_split = row.names(plot_data1) %>% str_match("[a-zA-Z]+") %>% factor()
circos.clear()
circos.par(gap.degree=10, track.height=0.1)
# 分多次画表达谱数据,更有层次
circos.heatmap(plot_data1[samples[1:3]], split = lev_split , col = col_fun1, rownames.side = "outside", cluster = TRUE)
circos.heatmap(plot_data1[samples[7:9]], split = lev_split , col = col_fun1)
circos.heatmap(plot_data1[samples[4:6]], split = lev_split , col = col_fun1)
plot_data3 = data.frame()
for(lev in levels(lev_split)){
a <- rownames(plot_data1)[lev_split==lev][get.cell.meta.data("row_order", sector.index = lev)]
a <- seq(length(a)) %>% `names<-`(a) %>% enframe("id", "position")
a$sector = lev
plot_data3 <- rbind(plot_data3, a)
}
plot_data3$position <- plot_data3$position - 1 # 因为每个元素的范围是0~1,所以统一减一方便后面相加
#2.1 获取点对点连接表
plot_data2 <- KOannotation %>%
filter(gene %in% row.names(plot_data1), KO %in% row.names(plot_data1)) %>%
select(gene, KO) %>%
rename(from=gene, to=KO)
tmp_obj1 <- KOannotation %>%
filter(KO %in% row.names(plot_data1), pathway %in% row.names(plot_data1)) %>%
select(KO, pathway) %>%
rename(from=KO, to=pathway)
plot_data2 %<>% bind_rows(tmp_obj1) %>% distinct(from, to)
#2.2 计算位移
tmp_obj1<-plot_data2 %>%
group_by(from) %>%
mutate(V1=1/n()) %>%
group_by(from) %>%
mutate(from_end = cumsum(V1),
from_start = from_end-V1) %>%
select(from, to, from_start, from_end)
tmp_obj2<-plot_data2 %>%
group_by(to) %>%
mutate(V1=1/n()) %>%
group_by(to) %>%
mutate(to_end = cumsum(V1),
to_start = to_end-V1) %>%
select(from, to, to_start, to_end)
plot_data2 %<>% left_join(tmp_obj1) %>% left_join(tmp_obj2)
# 2.3 合并上述两表
plot_data2 %<>%
left_join(plot_data3, by=c('from'='id')) %>%
rename('from_position'=position, 'from_sector'=sector) %>%
left_join(plot_data3, by=c('to'='id')) %>%
rename('to_position'=position, 'to_sector'=sector)
highlight_pathway <- c("pathway1", "pathway2")
highlight_KO <- c("KO5", "KO6", "KO20")
highlight_gene <- c("gene292", "gene256", "gene67", "gene146", "gene52", "gene391", "gene139", "gene327", "gene218", "gene142", "gene375", "gene194")
plot_link <- plot_data2 %>% filter(to %in% c(highlight_pathway, highlight_KO)) %>% mutate(col="#dcdcdc80")
highlight_link <- plot_link %>% filter (from %in% c(highlight_gene, highlight_KO)) %>% mutate(col="#fb9b9a80")
plot_link <- plot_link %>% filter (!from %in% c(highlight_gene, highlight_KO))
plot_data2<-rbind(plot_link, highlight_link)
for ( idx in seq(nrow(plot_data2))){
tmp_obj <- plot_data2[idx,]
circos.link(tmp_obj[['from_sector']],
c(tmp_obj[['from_position']] + tmp_obj[['from_start']],
tmp_obj[['from_position']] + tmp_obj[['from_end']]),
tmp_obj[['to_sector']],
c(tmp_obj[['to_position']] + tmp_obj[['to_start']],
tmp_obj[['to_position']] + tmp_obj[['to_end']]),
col = tmp_obj[['col']],
border = NA
)
}
Finished product
important point
-
If you draw a multi-layer circle diagram, it will be aligned based on the line number, but not the line name . Therefore, expression profiling data for all groups must be prepared in one step.
-
There's a lot of data cleaning and calculations going on between the heat map and chord map code, don't be afraid, it's okay. Because only after getting the heat map can we obtain the positions corresponding to genes, KOs, and pathways.
-
It is a habit to
circos.clear()
clear the cache before each drawing.