R language for Mendelian randomization + meta analysis (2) ---- based on R and stata

At present, many articles use Mendelian randomization + meta analysis. In the previous chapter, we briefly introduced the basic knowledge of meta analysis. Today we will introduce an 11-point article that looks at how Mendelian randomization + meta-analysis is performed. The title of the article is: Appraising the causal role of smoking in multiple diseases: A systematic review and meta-analysis of Mendelian randomization studies (Assessing the causal role of smoking in various diseases: a systematic review and meta-analysis of Mendelian randomization studies)
Insert image description here
Smoking is actually not very creative. The only novelty is the addition of Mendelian randomization and meta-analysis. We can see that the type of article is meta-analysis, which shows that the essence of Mendelian randomization + meta-analysis is meta-analysis. The authors first introduced that the causal relationship between smoking and various diseases is still unclear, and aimed to evaluate the causal role of smoking in various diseases by summarizing the evidence from Mendelian randomization (MR) studies.
Let’s take a look at how its methodology is done:
Insert image description here

It is completely a meta-analysis routine, and the data is extracted in two steps.
The first step is to search for articles related to "Mendelian randomization" and "smoking" in major databases, and extract the data on the relationship between smoking and diseases in the articles. Inclusion criteria: Original full-text articles presenting results linking genetic susceptibility to smoking or lifelong smoking with risk of one or more circulatory, digestive, neurological and musculoskeletal system diseases, endocrine, metabolic and ocular diseases or tumors. A total of 385 articles were included. Exclusion criteria: Duplicate publications based on the same or overlapping study samples, and studies using only a single or a small number (<10) of instrumental variables for nicotine dependence or smoking behavior or quantity. The author here extracted the year, sample size, and odds ratio OR of the relationship. After exclusion, 29 articles were suitable for analysis.
Step 2: There was a part of the data that the author of Finnish Genetic Research (FinnGen) could not retrieve through the search, so he did it himself. He used the data in the R6 version for Mendelian randomization analysis, including 260,405 Finns. , but data with ambiguous gender, non-Finnish ancestry, genotype deletion rate exceeding 5%, or excessively high heterozygosity (±4 standard deviations) were eliminated. In addition, the authors performed de novo MR analyzes for osteoarthritis, gout, and primary open-angle glaucoma using summary statistics published in GWAS meta-analyses. The data extracted in the second part should be 27 articles, because there were 56 articles in the end.
The following is his flow chart:
Flow chart:
Insert image description here
From the flow chart, we can know that the author finally got 14 articles on circulatory diseases, 8 articles on digestive diseases, 5 articles on neurological diseases, 4 articles on the musculoskeletal system, 2 articles on endocrinology, and 3 articles on ophthalmology. Disease, 21 articles on tumors. The whole process is quite time-consuming to process. After all, you have to read each article and extract the data.
Next, let’s take a look at the data provided by the author. Appendix Table 1 is the result of Mendelian randomization done by the author himself. There are two results, one is just starting to smoke, and the other is lifelong smoking. The author also conducted a meta-analysis based on these two results.
Insert image description here

Insert image description here
Next are the author's two main tables. Table 2 is the disease analysis of people who start smoking, and Table 3 is the disease analysis of lifelong smokers. The author does meta-analysis based on these two tables. I will put the data below. Extract it and run it.
Insert image description here
Insert image description here
Next, I will extract the data and run it. Since the amount of data is quite large, I will extract the circulatory disease part of patients who have just started smoking as an example. The same applies to other diseases. We should note here that there are many diseases in the circulatory system. Take the disease of atrial fibrillation as an example. The author has a lot of data. He added the results of the two databases, GWAS meta-analysis and FinnGen, and then When doing meta-analysis, if your data is not that large, you can do it separately for each database and then summarize it.

bc<-read.csv("E:/r/test/smokemeta.csv",sep=',',header=TRUE)
names(bc)

After the data is extracted, it will look like the picture below. This data can be extracted according to the author's method. If you want to be lazy, use the data I extracted. The public account replies: the code is available.
Insert image description here
In the past, we have introduced "R language forestploter package to elegantly draw Mendelian randomization research forest plots". If you need it, you can take a look at it yourself. Today we will introduce the forestplot package to draw this forest map. This package is relatively simple and easy to use.

library(forestplot)

Let’s first create a credible interval

bc$`OR (95% CI)` <- sprintf("%.2f (%.2f to %.2f)", bc$OR, bc$LB, bc$UB)

Insert image description here
After generating the credible interval, we need to generate a drawing interval. Just select the variables you need. I choose 1, 2, and 6 here.

dt1<-as.matrix(bc[,c(1,2,6)])

Insert image description here
Please note here that the dt data is a matrix without column names. We also need to generate a column name.

dt1 <- rbind(c("outcome","Cases","OR (95% CI)"),dt1)

Insert image description here
Such data can be plotted

forestplot(labeltext=dt1,graph.pos="right", 
           mean=c(NA,bc$OR),
           lower=c(NA,bc$LB),
           upper=c(NA,bc$UB),
           graphwidth = unit(60,"mm"),#设置图片位置和宽度
           boxsize =0.2,line.margin = unit(5,"mm"),#对散点和线条进行设置
           lineheight = unit(5,"mm"),#设置图形行距
           col=fpColors(box = "grey0",lines = "grey0",summary = "grey0"),
           colgap = unit(1,"mm"),#图形列间距
           zero = 1,#参照值
           xticks = c(0,1,2))#X轴的定义标签

Insert image description here
We can see that it is almost exactly the same as what the author did
Insert image description here
. Why I say almost exactly the same, because it is still a little different. The author does not have the data of 11278 pulmonary embolism, but the original data is there. I guess he forgot to add it when drawing
Insert image description here
. , there is still one thing that remains unresolved. Some papers report the I (heterogeneity) and P values ​​of meta-analysis. How do you find this?
Insert image description here
Insert image description here
The author of the article recommends using stata to calculate these two values. Stata makes meta analysis much simpler. Just use the metan function.

metan or lb ub

Insert image description here
Insert image description here
The pictures can also be modified, but I won’t do that here. In the final calculation, I was 80.6% and P was 0.00, which may be a little different from the author's because he added one less study. Finally, the author also performed a sensitivity analysis, which was done through Mendelian randomization. Not all articles do this, and the following article does not perform sensitivity analysis.
Insert image description here
I won’t go into it here. If you are interested, please read my previous articles.

Guess you like

Origin blog.csdn.net/dege857/article/details/133375386