This is the OpenGWAS project (mrcieu.ac.uk)
In Mendelian randomization (MR) studies, we only need significant SNP information for exposure data, and such information is easily available in various GWAS databases. However, regarding the outcome data, since the SNP is not related to the outcome, many times this insignificant result cannot be directly queried from the article or database. At this time, we need to download the complete GWAS summary data. This kind of data It generally contains millions or even tens of millions of SNP information, so the amount of data is relatively large (about 200M after compression). I hope everyone is aware of it and is prepared.
Next, I will introduce how to download the complete GWAS summary data from the GWAS catalog
First, enter the official website of GWAS catalog (https://www.ebi.ac.uk/gwas/) and click a>Summary statistics (as shown in the figure below)
Enter Summary statistics and clickAvailable studies (as shown in the figure below)
Finally, you will enter the following interface (link:https://www.ebi.ac.uk/gwas/downloads/summary-statistics)
The interface mainly consists of three parts
The first block is "List of published studies with summary statistics" (as shown in the figure below ): The GWAS studies here are allpublished, and their quality is guaranteed, you can enter keywords in the search box (marked in red) to search for the phenotype of interest.
The second block is "List of prepublished/unpublished studies with summary statistics" (as shown below shown): The GWAS study here is unpublished (may be derived from a preprint), The quality cannot be guaranteed. You can enter keywords in the search box (marked in red) to search for the phenotype of interest. The phenotypes here are likely to be relatively new and complementary to published data. When you really can’t find the data, you might as well try here.
The third block is "Additional sources of summary statistics" (as shown in the figure below): Here is a summary of the current GWAS research collaboration (consortium) related information. Generally, these collaborations have their own websites to store data. We can download the complete GWAS summary data from their official websites. Marked in red in the picture are the coronary heart disease research collaborations.
The GWAS catalog database is a treasure. Mickey Mouse is here to inspire others. I hope everyone can study and use it more deeply. You are also welcome to exchange your ideas via private messages (WeChat: MedGen16)!
PS: Sometimes the GWAS catalog needs to be opened in foreign agency mode before it can be used, friends, be prepared in advance!
ssgac
Get the source of gwas
Data included
1 Read exposure data
1.2 Save exposure
Start practicing
Read exposed data
Read ending data
harmonize data
mr
Sensitivity analysis
Significant and independent, obtain instrumental variables
The advantage is that it is fast, but the disadvantage is that it is possible
May not be independent of each other Linkage disequilibrium
5 * 10 -8
It shows that the instrumental variable is related to the exposure but not related to the outcome.
Maybe I lost my snp
step1 r reads exposed data
Requires correlation setting subset function 5*10 -8
Independence setting clump function to remove linkage disequilibrium ld r2 The smaller the better, usually 0.001 and the maximum is 0.1.
Depends on the number of snp, distance 500kb is also ok
Statistical strength setting f>10 is better
1.1 Requires correlation setting subset function 5*10 -8
1.2 Modify the column name of the file
1.3 Independence setting Exposed data after re-reading subset read_exposure_data
clump default ldr2<0.01
You can clump it later clump_data
step2 read outcomedata
1 read.table
2 merge to get the intersection
2.1 Change the listing name
3 read_out_come_data
summary
Effect allele
Need to use code coordination A--.>T
agent snp
The agent snp is set to 0.8. The larger it is, the more it indicates that there is linkage disequilibrium between them, indicating that they have a large influence on each other, and the possibility of them replacing each other is high.
But when setting the independence, make ld r2 as small as possible 0.001
Samples overlap
Exposed data 500,000
Ending data 1 million
SNP data must be greater than 500w to be used. Normally it can reach 1000w.
step3 coordination harmonise
Eliminate palindrome sequences
save document
Ensure that the exposed SNP is not related to the outcome
snp is related to exposure
SNP is not related to the outcome, consistent with the hypothesis
step4 mr
ivw is a random effects model
Outcomes are continuous variables using beta values bounded by 0
When the outcome is a categorical variable, it needs to be logarithmically transformed, use or and use 1 as the boundary.
Use other methods
mr(dat,method_list=c())
When drawing a scatter plot, choose the method you want to draw it.
5 Visualization of results
6 Sensitivity analysis includes: heterogeneity detection pleiotropy detection
Heterogeneity detection
If heterogeneity <0.05, there is heterogeneity.
There is heterogeneity and it does not affect the reliability of the results.
nbdistribution is set to 1w, which is more accurate
6.1 Find the snp run_mr_pressor that has the greatest impact on heterogeneity
nb
Does this outlier have an impact on the direction? If not, then p>0.05
l List outliers, p is less than 0.05, indicating the existence of heterogeneity
If there is a lot of heterogeneity, throw in a few SNPs in time and recalculate and there will still be heterogeneity.
6.2 Heterogeneity visualization funnel plot
The more symmetrical the better
will exist; even if there is no heterogeneity, the funnel plot is asymmetrical
6.2 Multiple Effects mr_pleiotropy_test() If the result is not good, it will be withdrawn and the article will not be published.
Functional pleiotropy Horizontal pleiotropy
For example, snp may affect ad through other phenotypes, rather than through the bmi phenotype.
0.078》0.05 No pleiotropy
Use egger_intercept to evaluate multiple effects
The p value of the intercept between egger and the y-axis is to evaluate whether the intercept exists
If p》0.05, there is no significance, indicating that the intercept does not exist
If p<0.05, it is significant. It shows that when SNP is 0, there is a non-zero effect on outcome, indicating that SNP may affect the outcome by affecting other phenotypes. This indicates the existence of horizontal pleiotropy. Such results cannot be used
(When the effect of SNP on exposure is 0, it still has a non-zero effect on the outcome, indicating that there are other intermediate factors that affect the outcome, and it has horizontal pleiotropy)
6.3 leave-one-out
If the result is good, the confidence interval should be to the right of the dotted line
When the first rs3817334 is lost, do the remaining snp again.
Summarize
Use r to analyze
1 Extract exposure data
2 Import ending data
The follow-up is the same
Screen for the second phenotype of SNP. If a second phenotype exists, it may need to be screened out.
7 Statistical performance calculation power
sample size is the total sample size
aDefault 0.05
k Proportion of the number of cases to the total number
or value is the calculated value
r2 is the sum of r2 of all snp (60)