R language Circos diagram visualization

Prepare data

Two data need to be prepared: one is the gene expression profile, and the other is the annotation of the gene (it can be KO annotation, or it can be any other annotation)

gene expression profiling

	sample1	sample2	sample3	...
genes1	1.0	2.0	2.0	...
gene2	3.0	3.0	4.0	...
gene3	5.0	5.0	5.0	...
gene4	6.0	7.0	9.0	...
...	...	...	...	...

Path information

gene	IS	pathway
genes1	KO1	pathway1
gene2	KO2	pathway1
gene3	KO2	pathway2
...	...	...

simulated data

library(tidyverse)
library(magrittr)
library(circlize)
#模拟数据
## Data1
fpkm <- rbind(cbind(matrix(rnorm(500*3, mean = 1), nr = 500), 
                   matrix(rnorm(500*3, mean = 2), nr = 500),
                   matrix(rnorm(500*3, mean = 3), nr = 500)))
fpkm <- fpkm[sample(500, 500), ] # randomly permute rows
rownames(fpkm) <- paste0("gene", seq(500))
colnames(fpkm) <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
fpkm %<>% as.data.frame() %>% mutate(gene = row.names(.))
# Data2
pathways <- rep(paste0("pathway", seq(6)),  sample(12:20, size = 6)) %>% sample(70)
KOs <- rep(paste0("KO", seq(20)),  sample(5:20, size = 20, replace = TRUE)) %>% sample(70)
KOannotation <- data.frame(KO=KOs, pathway=pathways)
KOannotation <- KOannotation[sample(70, 200, TRUE),]
KOannotation$gene <- sample(paste0("gene", seq(500)),200)
             
# 假设你有富集到的想要可视化的通路
maps <- c("pathway1", "pathway2", "pathway3")
samples <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")

Think about drawing data types

First, let’s briefly introduce the circos object. Just like an ordinary graph has an x-axis and a Y-axis, you can understand that a circos graph has many axes (the specific number is determined by your data). Then there are naturally corresponding positions on each axis (1, 2, 3, 4 in the figure below).

So naturally it is easy to imagine that if you want to draw a link, you need a data like this

from_axis	from_position	to_axis	to_position
A	1	B	4
A	2	C	5
A	3	D	6

Furthermore, if you want to control the width of the link, you should specify the starting and ending positions of each link.

from_axis	from_position_start	from_position_end	to_axis	to_position_start	to_position_end
A	0.5	1.5	B	3.5	4.5
A	1.5	2.5	C	4.5	5.5
A	2.5	3.5	D	5.5	6.5

After thinking about links, think about heat maps. First confirm that you are using circos.heatmap()a drawn heat map (as shown above). This data is relatively simple, so consider drawing the heat map first, and then consider how to draw the chord map in the middle.

heat map

Data cleaning

# 准备热图颜色
col_fun1 = colorRamp2(c(-2, 0, 2), c("#247ab5", "white", "#fda1a0"))
# 需要画图的基因
plot_gene <- KOannotation %>% 
  filter(pathway %in% maps) %>% 
  pull(gene) %>%
  {filter(fpkm, row.names(fpkm) %in% .)}
# 需要画图的KO
plot_KO <- plot_gene %>% 
  left_join(KOannotation) %>%
  filter(pathway %in% maps) %>% # There are some unenriched map
  group_by(KO) %>%
  summarise(across(samples,sum))
# 需要画图的Pathway
plot_map <- plot_gene %>% 
    left_join(KOannotation) %>%
    filter(pathway %in% maps) %>% # There are some unenriched map
    group_by(pathway) %>%
    summarise(across(samples,sum))

plot_data1 <- bind_rows(plot_gene %>% rename(id=gene), 
                        plot_KO %>% rename(id=KO), 
                        plot_map %>% rename(id=pathway)) %>%
    `row.names<-`(.$id) %>%
    select(-id)
plot_data1 <- t(scale(t(plot_data1))) %>% as.data.frame()
# 在热图上划分为gene、KO、pathway
lev_split = row.names(plot_data1) %>% str_match("[a-zA-Z]+") %>% factor()

Draw a picture

circos.clear()
circos.par(gap.degree=10, track.height=0.1)
# 分多次画表达谱数据，更有层次
circos.heatmap(plot_data1[samples[1:3]], split = lev_split , col = col_fun1, rownames.side = "outside", cluster = TRUE)
circos.heatmap(plot_data1[samples[7:9]], split = lev_split , col = col_fun1)
circos.heatmap(plot_data1[samples[4:6]], split = lev_split , col = col_fun1)

According to lev_split, this heat map is divided into gene, KO, and pathwaythree axes. What needs to be pointed out here is that the length of each gene, KO, and pathway on each axis is 1. For example, the position of gene67 on the gene axis is 0-1, and the position of pathway2 on the pathway axis is 0-1. Therefore, if we want to draw a link, according to the previous discussion, if there are three KOs that need to be connected to pathway2, we should need data similar to the following:

from_axis	from_position_start	from_position_end	to_axis	to_position_start	to_position_end
IS	0.5	1.5	pathway	0	0.333
IS	1.5	2.5	pathway	0.333	0667
IS	2.5	3.5	pathway	0.667	1

Note the above table. There may be multiple KOs connected to a pathway, so we need to reasonably split the start and end positions to avoid overlap. Another advantage of doing this is that it can make the link lines thicker and thinner, which looks much more beautiful. so

On the other hand, we should note that since the genes on the circos heat map are arranged according to the clustering results, the order of the data in the data frame is different. Therefore, we first need to obtain each gene and KO after drawing the map. , the coordinates of the pathway on its corresponding axis. At this time, you need to circlizeobtain get.cell.meta.datathe corresponding information from the graph.

Chord diagram

Data cleaning

According to the results of the above discussion, our data cleaning should achieve two purposes:

Obtain the information of each axis on gene, KO, and pathway after the heat map is generated, which can be organized into the following format:

id	sector	position
genes1	gene	23
KO1	IS	45
pathway1	pathway	56

Table A

Calculate the one-to-many gene-KO relationship and KO-pathway relationship to obtain relative positions

from_axis	from_position_start	from_position_end	to_axis	to_position_start	to_position_end
gene	0.5	1.5	IS	5	5.33
gene	1.5	2.5	IS	5.33	5.66
gene	2.5	3.5	IS	5.66	6

Table B

Pay attention to the above table. Since the three genes are connected to the same KO, I connected them to different positions of the KO.

Furthermore, if you want to obtain this table, you can split the process into the following steps:

2.1 Generate a connection object table

from	to
genes1	KO2
gene2	KO1
KO1	pathway1

2.2 Calculate a displacement based on the number of times the connection object appears in the table

from	to	from_start	from_end	to_start	to_end
genes1	KO2	0	1	0	0.333
gene2	KO1	0	1	0.333	0.667
KO1	pathway1	0	0.5	0.667	1
KO1	pathway2	0.5	1	0	0.0333

Table C

2.3 Combine Table A and Table C to calculate Table B

After clarifying the idea, the following is the code

# 1 获得gene、KO、pathway在每个轴上的位置
plot_data3 = data.frame()
for(lev in levels(lev_split)){
  a <- rownames(plot_data1)[lev_split==lev][get.cell.meta.data("row_order", sector.index = lev)]
  a <- seq(length(a)) %>% `names<-`(a) %>% enframe("id", "position")
  a$sector = lev
  plot_data3 <- rbind(plot_data3, a)
} 
plot_data3$position <- plot_data3$position - 1 # 因为每个元素的范围是0~1，所以统一减一方便后面相加
#2.1 获取点对点连接表
plot_data2 <- KOannotation %>%
    filter(gene %in% row.names(plot_data1), KO %in% row.names(plot_data1)) %>%
    select(gene, KO) %>%
    rename(from=gene, to=KO)
tmp_obj1 <- KOannotation %>%
    filter(KO %in% row.names(plot_data1), pathway %in% row.names(plot_data1)) %>%
    select(KO, pathway) %>%
    rename(from=KO, to=pathway)
plot_data2 %<>% bind_rows(tmp_obj1) %>% distinct(from, to)
#2.2 计算位移
tmp_obj1<-plot_data2 %>% 
    group_by(from) %>% 
    mutate(V1=1/n()) %>% 
    group_by(from) %>% 
    mutate(from_end = cumsum(V1), 
           from_start = from_end-V1) %>%
    select(from, to, from_start, from_end)
tmp_obj2<-plot_data2 %>% 
    group_by(to) %>% 
    mutate(V1=1/n()) %>% 
    group_by(to) %>% 
    mutate(to_end = cumsum(V1), 
           to_start = to_end-V1) %>%
    select(from, to, to_start, to_end)
plot_data2 %<>% left_join(tmp_obj1) %>% left_join(tmp_obj2)
# 2.3 合并上述两表
plot_data2 %<>% 
  left_join(plot_data3, by=c('from'='id')) %>% 
  rename('from_position'=position, 'from_sector'=sector) %>%
  left_join(plot_data3, by=c('to'='id')) %>%
  rename('to_position'=position, 'to_sector'=sector)

Since the above plot_data2includes the connection relationships between all points, we may not need so many, so you can choose the data you want to display. This step may require drawing the graph several times to determine the data that needs to be displayed.

#3. 进一步筛选想要展示的连线
# highlight the pathway I wanted
highlight_pathway <- c("pathway1", "pathway2")
highlight_KO <- c("KO5", "KO6", "KO20")
highlight_gene <-  c("gene292", "gene256", "gene67", "gene146", "gene52", "gene391", "gene139", "gene327", "gene218", "gene142", "gene375", "gene194")

plot_link <- plot_data2 %>% filter(to %in% c(highlight_pathway, highlight_KO)) %>% mutate(col="#dcdcdc80")
highlight_link <- plot_link %>% filter (from %in% c(highlight_gene, highlight_KO)) %>% mutate(col="#fb9b9a80")
plot_link <- plot_link %>% filter (!from %in% c(highlight_gene, highlight_KO))
plot_data2<-rbind(plot_link, highlight_link)

Graphing using circos.linkloops

for ( idx in seq(nrow(plot_data2))){
  tmp_obj <- plot_data2[idx,]
  circos.link(tmp_obj[['from_sector']], 
              c(tmp_obj[['from_position']] + tmp_obj[['from_start']], 
                tmp_obj[['from_position']] + tmp_obj[['from_end']]), 
              tmp_obj[['to_sector']], 
              c(tmp_obj[['to_position']] + tmp_obj[['to_start']], 
                tmp_obj[['to_position']] + tmp_obj[['to_end']]),
              col = tmp_obj[['col']],
              border = NA
              )
}

All code

library(tidyverse)
library(magrittr)
library(circlize)
#模拟数据
## Data1
fpkm <- rbind(cbind(matrix(rnorm(500*3, mean = 1), nr = 500), 
                   matrix(rnorm(500*3, mean = 2), nr = 500),
                   matrix(rnorm(500*3, mean = 3), nr = 500)))
fpkm <- fpkm[sample(500, 500), ] # randomly permute rows
rownames(fpkm) <- paste0("gene", seq(500))
colnames(fpkm) <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
fpkm %<>% as.data.frame() %>% mutate(gene = row.names(.))
# Data2
pathways <- rep(paste0("pathway", seq(6)),  sample(12:20, size = 6)) %>% sample(70)
KOs <- rep(paste0("KO", seq(20)),  sample(5:20, size = 20, replace = TRUE)) %>% sample(70)
KOannotation <- data.frame(KO=KOs, pathway=pathways)
KOannotation <- KOannotation[sample(70, 200, TRUE),]
KOannotation$gene <- sample(paste0("gene", seq(500)),200)
             
# 假设你有富集到的想要可视化的通路
maps <- c("pathway1", "pathway2", "pathway3")
samples <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
# 准备热图颜色
col_fun1 = colorRamp2(c(-2, 0, 2), c("#247ab5", "white", "#fda1a0"))
# 需要画图的基因
plot_gene <- KOannotation %>% 
  filter(pathway %in% maps) %>% 
  pull(gene) %>%
  {filter(fpkm, row.names(fpkm) %in% .)}
# 需要画图的KO
plot_KO <- plot_gene %>% 
  left_join(KOannotation) %>%
  filter(pathway %in% maps) %>% # There are some unenriched map
  group_by(KO) %>%
  summarise(across(samples,sum))
# 需要画图的Pathway
plot_map <- plot_gene %>% 
    left_join(KOannotation) %>%
    filter(pathway %in% maps) %>% # There are some unenriched map
    group_by(pathway) %>%
    summarise(across(samples,sum))

plot_data1 <- bind_rows(plot_gene %>% rename(id=gene), 
                        plot_KO %>% rename(id=KO), 
                        plot_map %>% rename(id=pathway)) %>%
    `row.names<-`(.$id) %>%
    select(-id)
plot_data1 <- t(scale(t(plot_data1))) %>% as.data.frame()
# 在热图上划分为gene、KO、pathway
lev_split = row.names(plot_data1) %>% str_match("[a-zA-Z]+") %>% factor()
circos.clear()
circos.par(gap.degree=10, track.height=0.1)
# 分多次画表达谱数据，更有层次
circos.heatmap(plot_data1[samples[1:3]], split = lev_split , col = col_fun1, rownames.side = "outside", cluster = TRUE)
circos.heatmap(plot_data1[samples[7:9]], split = lev_split , col = col_fun1)
circos.heatmap(plot_data1[samples[4:6]], split = lev_split , col = col_fun1)
plot_data3 = data.frame()
for(lev in levels(lev_split)){
  a <- rownames(plot_data1)[lev_split==lev][get.cell.meta.data("row_order", sector.index = lev)]
  a <- seq(length(a)) %>% `names<-`(a) %>% enframe("id", "position")
  a$sector = lev
  plot_data3 <- rbind(plot_data3, a)
} 
plot_data3$position <- plot_data3$position - 1 # 因为每个元素的范围是0~1，所以统一减一方便后面相加
#2.1 获取点对点连接表
plot_data2 <- KOannotation %>%
    filter(gene %in% row.names(plot_data1), KO %in% row.names(plot_data1)) %>%
    select(gene, KO) %>%
    rename(from=gene, to=KO)
tmp_obj1 <- KOannotation %>%
    filter(KO %in% row.names(plot_data1), pathway %in% row.names(plot_data1)) %>%
    select(KO, pathway) %>%
    rename(from=KO, to=pathway)
plot_data2 %<>% bind_rows(tmp_obj1) %>% distinct(from, to)
#2.2 计算位移
tmp_obj1<-plot_data2 %>% 
    group_by(from) %>% 
    mutate(V1=1/n()) %>% 
    group_by(from) %>% 
    mutate(from_end = cumsum(V1), 
           from_start = from_end-V1) %>%
    select(from, to, from_start, from_end)
tmp_obj2<-plot_data2 %>% 
    group_by(to) %>% 
    mutate(V1=1/n()) %>% 
    group_by(to) %>% 
    mutate(to_end = cumsum(V1), 
           to_start = to_end-V1) %>%
    select(from, to, to_start, to_end)
plot_data2 %<>% left_join(tmp_obj1) %>% left_join(tmp_obj2)
# 2.3 合并上述两表
plot_data2 %<>% 
  left_join(plot_data3, by=c('from'='id')) %>% 
  rename('from_position'=position, 'from_sector'=sector) %>%
  left_join(plot_data3, by=c('to'='id')) %>%
  rename('to_position'=position, 'to_sector'=sector)
highlight_pathway <- c("pathway1", "pathway2")
highlight_KO <- c("KO5", "KO6", "KO20")
highlight_gene <-  c("gene292", "gene256", "gene67", "gene146", "gene52", "gene391", "gene139", "gene327", "gene218", "gene142", "gene375", "gene194")

plot_link <- plot_data2 %>% filter(to %in% c(highlight_pathway, highlight_KO)) %>% mutate(col="#dcdcdc80")
highlight_link <- plot_link %>% filter (from %in% c(highlight_gene, highlight_KO)) %>% mutate(col="#fb9b9a80")
plot_link <- plot_link %>% filter (!from %in% c(highlight_gene, highlight_KO))
plot_data2<-rbind(plot_link, highlight_link)
for ( idx in seq(nrow(plot_data2))){
  tmp_obj <- plot_data2[idx,]
  circos.link(tmp_obj[['from_sector']], 
              c(tmp_obj[['from_position']] + tmp_obj[['from_start']], 
                tmp_obj[['from_position']] + tmp_obj[['from_end']]), 
              tmp_obj[['to_sector']], 
              c(tmp_obj[['to_position']] + tmp_obj[['to_start']], 
                tmp_obj[['to_position']] + tmp_obj[['to_end']]),
              col = tmp_obj[['col']],
              border = NA
              )
}

Finished product

important point

If you draw a multi-layer circle diagram, it will be aligned based on the line number, but not the line name . Therefore, expression profiling data for all groups must be prepared in one step.
There's a lot of data cleaning and calculations going on between the heat map and chord map code, don't be afraid, it's okay. Because only after getting the heat map can we obtain the positions corresponding to genes, KOs, and pathways.
It is a habit to circos.clear()clear the cache before each drawing.