In previous articles, we gave a brief introduction to Mendelian randomization research. We can find that the forest map made using the TwoSampleMR package is not very beautiful. Today we use the R language forestploter package to elegantly draw the forest plot of Mendelian randomization research.
The forest diagram made by using the TwoSampleMR package is like this,
and the forest diagrams in many SCI articles are like this.
Today, we will make a forest diagram like the one above, using the data of "Reproducing a 6-point Mendelian Randomization Article in R Language". The author of this article directly provided the data, so I used it directly. The author has analyzed the results of many mental illnesses and bone density. Here I will take the results of schizophrenia and bone density for analysis
. The approximate format of our forest map construction is as follows, so we need to construct a data table as shown in the figure below.
This step can only be done manually. There is no good way, but it does not take a few minutes. The sample size can be found on the website. It will look like the figure below. Reply from the official account: Mendel's forest map, you can get this data.
Let's import the data first and have a look.
This format is a little different from our drawing format. We still need to make some adjustments. Another problem is that the upper and lower intervals of the 95% CI are connected together. To extract them separately, we need to use a little knowledge of very simple regular expressions. To use regular expressions, you need to use the stringr package. Let's put forward 95% ci first.
library(stringr)
cl<-bc$`95%Cl`
The table made by the author of the article is very neat. We need to use the str_sub function to extract it. This function is very simple to use. Just enter its position if you want to extract the content. The lower limit is 1-5 positions. Note here that the decimal point also occupies a position
bc$low<-str_sub(cl,1,5)
Next we extract the upper limit interval, the position is 7-11
bc$hi<-str_sub(cl,7,11)
The data is extracted separately
The next step is to import the forestploter package to draw the forest map
library(grid)
library(forestploter)
Give it a blank space for the first variable, and it will look better when it is drawn later
bc$Outcome<- ifelse(!is.na(bc$`sample size`), bc$Outcome, paste0(" ", bc$Outcome))
The sample size, P-Value
variables, etc. will be drawn next, and we will turn the missing places into spaces
bc$`sample size` <- ifelse(is.na(bc$`sample size`), "", bc$`sample size`)
bc$`P-Value` <- ifelse(is.na(bc$`P-Value`), "", bc$`P-Value`)
Generate a variable se, which represents the size of the square when drawing
bc$se <- (log(as.numeric(bc$hi)) - log(as.numeric(bc$OR)))/1.96
Convert hi and low into numbers, and use it later to convert
bc$hi<-as.numeric(bc$hi)
bc$low <-as.numeric(bc$low)
Generate OR (95% CI)
bc$`OR (95% CI)` <- ifelse(is.na(bc$se), "",
sprintf("%.2f (%.2f to %.2f)",
bc$OR, bc$low, bc$hi))#sprintF返回字符和可变量组合
generate an empty plot interval
bc$` ` <- paste(rep(" ", 20), collapse = " ")
The final data format is as follows
Drawing, here note that moving down ci_column = 4 is determined by the data generated later
forest(bc[,c(1:2,9,10,5)],
est = bc$OR, #效应值
lower = bc$low, #可信区间下限
upper = bc$hi, #可信区间上限
sizes = bc$se,
ci_column = 4, #在那一列画森林图,要选空的那一列
ref_line = 1,
arrow_lab = c("No Schizophrenia", "Schizophrenia"),
xlim = c(0, 4),
ticks_at = c(0.5, 1, 2, 3),
footnote = "This is the demo data. Please feel free to change\nanything you want.")
We can also set a format for the forest map first, and then use this format later
tm <- forest_theme(base_size = 10, #文本的大小
# Confidence interval point shape, line type/color/width
ci_pch = 15, #可信区间点的形状
ci_col = "#762a83", #CI的颜色
ci_fill = "blue", #ci颜色填充
ci_alpha = 0.8, #ci透明度
ci_lty = 1, #CI的线型
ci_lwd = 1.5, #CI的线宽
ci_Theight = 0.2, # Set an T end at the end of CI ci的高度,默认是NULL
# Reference line width/type/color 参考线默认的参数,中间的竖的虚线
refline_lwd = 1, #中间的竖的虚线
refline_lty = "dashed",
refline_col = "grey20",
# Vertical line width/type/color 垂直线宽/类型/颜色 可以添加一条额外的垂直线,如果没有就不显示
vertline_lwd = 1, #可以添加一条额外的垂直线,如果没有就不显示
vertline_lty = "dashed",
vertline_col = "grey20",
# Change summary color for filling and borders 更改填充和边框的摘要颜色
summary_fill = "yellow", #汇总部分大菱形的颜色
summary_col = "#4575b4",
# Footnote font size/face/color 脚注字体大小/字体/颜色
footnote_cex = 0.6,
footnote_fontface = "italic",
footnote_col = "red")
Use this template to draw
forest(bc[,c(1:2,9,10,5)],
est = bc$OR, #效应值
lower = bc$low, #可信区间下限
upper = bc$hi, #可信区间上限
sizes = bc$se,
ci_column = 4, #在那一列画森林图,要选空的那一列
ref_line = 1,
arrow_lab = c("No Schizophrenia", "Schizophrenia"),
xlim = c(0, 4),
ticks_at = c(0.5, 1, 2, 3),
footnote = "This is the demo data. Please feel free to change\nanything you want.",
theme = tm)
Such a forest map that conforms to the publication of the paper is ready.