Microbiome-Metagenome Analysis Seminar (2023.11)

Benefit announcement : In order to respond to the learning needs of students, after discussion and preparation by the training team of Yishengxin, it is now decided to arrange online live classes and offline lectures on amplicon 16S analysis, metagenomics , Python courses and transcriptomics . Teachers who sign up for online live classes can choose to participate in one offline class of the same course within one year  . Looking forward to meeting you both online and offline.

Currently available information:

  • Clinical Genomics online/offline course start time: 2023/6/30-7/2

  • Transcriptomics online/offline course start time: 2023/9/15-17

  • Amplicon online/offline class start time: 2023/10/13-15

  • Metagenome online/offline course start time: 2023/11/24-27

  • Registration link: http://www.ehbio.com/Training/

488f0f39b73f2b8b8500fabc6a701ccd.jpeg

With the expectation of fans, "Shengxin Baodian" and "Metagenome" jointly launched a special training on " Metagenome Analysis " in Beijing ( online and offline classes are available at the same time ), providing everyone with a shortcut to enter the door of Shengxin. Peers provide an opportunity to learn and communicate with metagenomic analysis, help students truly understand the analysis principles and complete actual analysis, and create an original four-stage teaching (3 days of intensive teaching + 2 weeks of self-practice + re-focused lectures and Q&A + class video review and repeated practice ), the four links of "teaching-practice-answer-use" are unified and coordinated to truly realize independent analysis of big data .

Regarding the importance of learning bioinformatics analysis, please read "9-Day Crash Course in Bioinformatics-Becoming an Indispensable Person in the Team" . Bioinformatics analysis is inseparable from program writing. This part is not as difficult as imagined. As long as you follow our operation, you can understand it. For details, see "Program Learning Experience in Bioinformatics" .

Course Introduction

Please read the course introduction in detail. If you are proficient in all the following content, you do not need to attend this training.

"Metagenome Analysis Course" belongs to "Advanced Amplicon Analysis", mainly about shotgun metagenomic data analysis and the use of processes under Linux. Colleagues who have just started microbiome analysis, want to learn drawing, and 16S/ITS amplicon analysis, please sign up for the "Amplifier Analysis Symposium".

Metagenome/microbiome is one of the most popular research fields in the world today. In order to strengthen technical exchanges and dissemination in this field, and promote the development of China's microbiome program, young researchers from the Chinese Academy of Sciences created the "Metagenome" public account with the goal of creating this Pure dry goods technology and thought exchange platform in the field. Established for three years, it has shared 3,100+ original articles on professional technology, with 150,000+ followers and 40 million+ cumulative readings.

In order to meet the needs of readers for further study, we are now organizing a special training course on metagenomics in conjunction with "Shengxin Baodian" to further learn and communicate metagenomics analysis techniques. We will guide you to get started quickly, save precious time, and help scientific research results to be produced as soon as possible. out.

This course has a total of 3 days, 6 classes per day, and a total of 18 classes. All courses are combined with theory and practice (as long as the lectures are all analysis that can be learned and realized by yourself). From Linux and R foundation, metagenomics Linux server analysis platform construction, Windows common statistical analysis software, data analysis chart interpretation and actual combat, metagenomics with reference (Reference/Read-based suitable for human, animal intestines, etc.) and without reference (De Novo/Assemble is suitable for plants, environmental samples, etc.) standard analysis process, Binning (mining single bacterial genome), statistical analysis and various advanced analysis (multigene connection evolutionary tree, network diagram drawing and beautification, network attribute comparison, machine learning, etc.) , and CNS-level image modification layout. In 3 days, the old driver will take you through the rough road that takes 3 months or even 3 years of self-study, helping you to truly realize metagenomic analysis and optimize the analysis plan according to the background of your own subject.

Course Outline

Each class has a theme of 1 hour, combining theory with practice, learning the principles, and practicing in practice, all of which are the selfless sharing of years of experience and code by veteran drivers. The following is the course schedule. For example, 11 represents the first class on the first day, 26 represents the sixth class on the second day, and 41 represents the online centralized video Q&A two weeks later.

serial number theme Introduction
11 Linux basics Introduction , remote login, file transfer, common commands
12 Linux software installation Conda installation and configuration, metagenomic related software installation and database download
13 Win software installation git , R, Rstudio, R packages, STAMP, AI, etc.
14 chart interpretation The meaning and usage scenarios of commonly used analysis charts in articles
15 R basics Development history, application in biology, ggplot2 drawing
16 visualization Data sorting and online drawing of 16 charts
21 Introduction to metagenomics Development history, scope of application of commonly used technologies, and analysis ideas
22 Metagenome quality control FastQC , Trimmomatic, MultiQC]( ), KneadData quality control, parallel parallel computing
23 Species and functional composition MetaPhlAn2 species composition, HUMAnN2 functional composition, functional association driving species
24 Comparison and visualization of species and functional differences GraPhlAn , LEfSe , STAMP , R language statistics
25 Prepare for publication Picture typesetting , data release , code finishing (optional)
26 network drawing Foundation , igraph , Gephi
31 Species Annotation and Visualization Kraken、Kraken2、GraPhlAn、KronamicrobiomeViz、metacoder
32 Assembly, gene annotation and quantification MEGAHIT、metaSPAdes、QUAST、Prokka、cd-hit、Salmon
33 Gene Function Annotation KEEG、COG/EggNOG、CAZy/dbcan2、ARDB/Resfams/CARD、Uniref、VFDB、TCDB
34 Binning Theory , MetaWRAP , VizBin
35 Bacterial genome evolution Bins extract conserved genes, multi-gene evolutionary tree, understand evolutionary tree in one article Evolview basic advanced iTOL beautification advanced ,

antiSMASH biosynthetic gene cluster

36 Summary talk Review and summary of metagenomic analysis routines
37 Exam 50 questions Self-assessment of learning effects, review of knowledge points
41 Q & A - Online Questions and answers, lectures on the content of the exam

The brief introduction of the course content is as follows:

1. Analysis platform construction

" If a worker wants to do a good job , he must first sharpen his tools." If you want to analyze big data without your own analysis platform, how can you do it. The amount of metagenomic data is huge, and it is still difficult to process the original off-machine big data in your own books. Fortunately, at this stage, general universities, scientific research institutes, and research groups have their own servers. Even if they do not have servers, they can also rent domestic services such as Alibaba Cloud and Tencent Cloud. Now that the analysis conditions are available, how to turn the server into a powerful tool for metagenomic analysis is a very complicated professional problem, and you can learn it right away here!

af3c7dc8ca0f04ed2604082db32a652d.jpeg

Figure 1. Construction of the metagenomic analysis process - system, installation method and main software

The server recommends using the Ubuntu system. The minimum configuration is 32G memory and 8 cores; 256G memory and 24 threads are recommended; the higher the configuration, the faster and smoother the analysis.

A computer without software is just a pile of scrap iron, and a server without a metagenomic analysis system has nothing to do with your data analysis. If you want to build a complete set of metagenomic analysis process, the resources on the Internet are scattered and scarce. The Yishengxin team will share years of experience in exploring excellent software and layout skills, and share all source codes , allowing you to quickly deploy several components that the metagenomic analysis process depends on on mainstream Linux server systems (Ubuntu 16/18.04, CentOS7 and other mainstream distributions) Ten commonly used software, hundreds of dependent R and Python packages, easy to own a professional analysis platform.

503468a300b6f98f1692394ccda3e5f2.jpeg

Figure 2. Yishengxin pioneered the data statistical analysis and visualization process based on Win10 optimization, and the laptop becomes a big data analysis platform in seconds

It is recommended to use Windows 10 system, 8G memory analysis is faster and smoother.

The so-called big data of high-throughput sequencing refers to the volume of raw data and analysis process, but the result is not large. Usually, metagenomic analysis will obtain sample species composition and functional composition tables. These tables are the starting point for downstream analysis, advanced analysis, and personality analysis. Most of the work can be done on our notebooks, but many people don’t know how to start. .

In fact, your personal computer is a powerful tool for statistical analysis of the data table (abundance matrix). The Yishengxin team has created a cross-platform analysis process , which can easily realize most of the analysis of amplicons, macrogenome statistics, and visualization on everyone's Windows notebooks. The third lesson will take you easily on your own notebooks Build a statistical analysis and visualization platform for data tables, optimize and test based on the most mainstream Win10 at present, and turn your notebook into a data analysis and visualization platform in seconds .

We will also take you to configure the entire analysis and visualization platform on Linux (Mac is similar to Linux, no distinction is made, but some software may be installed in different ways, no in-depth testing has been done, and it is not recommended to use it for training).

2. The basis of life letter

With the bioinformation analysis platform, how to use it flexibly still needs to learn something unique. The most important thing in the 21st century is talents. It is best for talents to master three languages, which will make you invincible in life and is an indispensable talent in any team. These three languages ​​are Chinese, English and computer language . Chinese is used in school every day, and English has been in contact with doctors for at least 10 years and can be used to read and write literature. As for programming languages, everyone has learned Visual Basic, Visual Foxpro, or C language in college, but they can use it at work. Applications are absolutely rare. What's more, these languages ​​are very inefficient in the field of life sciences, and learning is not advocated.

The three most commonly used languages ​​in life letters are Shell + R + Python/Perl, and the first two are the foundation to ensure that you complete the project analysis. In the class, we will also explain the basic knowledge of Shell and R language that biologists must master to ensure that you can use the metagenomic analysis platform efficiently and stably, and ensure the skills required for big data analysis and later visualization to the publication stage. We provide a learning video for preview in advance .

6d573c58094d6703c2c4c7b213c4c5a2.jpeg

Figure 3. Shell and R learning outline, the first mouse click in Rstuio to complete Shell script and R language analysis, which not only opens the door to student information, but also does not increase the time cost of biologists

When you use a few hours to walk into the door of big data analysis and visualization, you will discover a whole new world. Many people will feel that it is too late to meet each other, fall in love with analysis, and go on the fast lane of life from then on. Even if you are not interested in programming, the concepts used here will definitely benefit you for life, and you will get twice the result with half the effort in future related analysis, and be better than others. Besides , even elementary school students are learning Python now , and if not, the children will not be able to take care of it.

3. Chart Interpretation and Drawing Topics

In view of the fact that many teachers lack a systematic background in life information, do not understand and analyze charts in articles, and are at a loss for drawing various charts, we have launched the following two series, a total of 16 original articles, drawing on 8 graphics and R language Explain.

But these are just introductions. During the training, we will combine the published high-level articles to further explain the principles and scope of use of 16 commonly used analysis diagrams, so that you can not only understand the diagrams, but also know how to apply them to your own research, and you can easily Finish drawing.

To solve the problem of high learning time and cost of using R language drawing, Yishengxin team has developed a free drawing website for 16 commonly used drawings , one-click drawing, and the individual style of the drawing can be modified by clicking parameters with the mouse.

8e7f8d66f227e1e3f5e162b51b9cf98e.jpeg

Figure 4. The meaning, usage scenarios and drawing of 16 commonly used graphics. This can be achieved using our online drawing tool.

In order to make various statistical pictures achieve publication-level group pictures, a class of Adobe Illustrator picture editing and typesetting is specially set up to explain the basic usage skills and easily master the essence, so that your article graphics can be in line with CNS, and easily become a laboratory editing picture and puzzle masters.

6d12c74212ac076c15297b88443ac886.jpeg

Figure 5. An example of an AI typesetting subgraph is a CNS publication-level group diagram (Science, 2016 cover article)

4. Overview of metagenomics

After building a comprehensive scientific research foundation on the first day, we will start the journey of metagenomic big data analysis.

As professional basic knowledge, we will learn the following.

  1. Background: International Microbiome, China Microbiome Project

  2. Research object: human, animal, plant, environment

  3. Research methods: culture omics, amplicon, metagenome, metatranscriptome, metaproteome, metabolome, metagenome association analysis, macroepome...

  4. Research hotspots of metagenomics: culture group, intestinal bacteria and disease, metagenomics association analysis (MWAS), multi-omics joint analysis...

  5. History and Principles of Sequencing Development

  6. Selection of sample preparation, experimental replicates, and sequencing data volume

  7. Common routines for metagenomic analysis of SCI articles

  8. Comparison of the advantages and disadvantages of metagenomics and amplicons

  9. Raw data evaluation and judgment of assembly results

d9d3a30098dc842190171b594fad013d.jpeg

Figure 6. Commonly used methods of metagenomics: scientific questions that can be answered by amplicons, metagenomics, and metatranscriptomes

5. Parametric analysis process of metagenomics

Just getting started with the data of several gigabytes to tens of gigabytes per sample, if you have no way to start, it is recommended to do a set of parameter analysis immediately to quickly obtain the species composition and functional composition of the sample. Reference-based method, as the name suggests, is to directly use the current species and functional gene annotation database, and the data can quickly obtain the relative abundance matrix of the corresponding species and functional genes only through quality control and comparison. This method is also highly praised in the latest review of Rob Knight, the first analysis expert in this field, " Nature Review | Rob Knight and others teach you how to analyze flora data (full text translation: 18,000 words) ".

This method has obvious advantages, few steps, fast speed, time-saving and labor-saving, and is suitable for fields with good reference databases such as human intestinal tract, model organisms, and oceans. The disadvantage is that the functional genes of unreported species cannot be identified, and a lot of information will be lost when analyzing plants, soil, and extreme environmental samples.

b617577dffee67f7141cfef2e7820293.jpeg

Figure 7. The basic idea of ​​metagenomic analysis - the analysis process with parameters . The species composition is obtained mainly through MetaPhlAn2 based on all reported microbial genomes, and the functional composition is determined based on protein databases such as UniRef, EggNOG, and KEGG. The 16S amplicon data itself only contains species composition, and the functional composition of KEGG/COG can be obtained through PICRUSt.

Main knowledge points:

1. Writing principles of experimental design

2. KneadData process rapid quality control and dehosting process

3. Species composition quantification MetaPhlAn2

4. Quantification of Functional Components of HUMAnN2

6. No-parameter analysis process of metagenomics

Metagenome non-parameter analysis has two main purposes: one is to obtain unannotated species and gene expression; the other is to mine the genomes of new species through Binning. It looks beautiful, but the actual operation requires a lot of calculation. The analysis process involves more steps such as assembly, gene prediction, non-redundant gene set construction and gene annotation.

523f096321268a2e55b2273c5a5d1128.jpeg

Figure 8. Metagenome analysis process without parameters .

Key steps and software used:

  1. Data quality control fastqc, Trimmomatic, MultiQC, khmer

  2. Assemble the spliced ​​MEGAHIT and evaluate the quast

  3. Gene Annotation Prokka

  4. Constructing non-redundant gene sets: CD-HIT

  5. Gene abundance estimation: Salmon and other methods can quickly quantify gene abundance , and subsequent comparisons of overall group differences such as PCA, PCoA, and CCA can be performed; edgeR, MetaStat, and LEfSe can also be used to further analyze differences in genes between groups;

  6. Species annotation: Obtain species annotation information of non-redundant gene sets, or use Kraken2 for direct species annotation at the reads level, combined with the abundance value in step 6 to analyze species differences between groups;

  7. Gene function classification annotation: metabolic pathway (KEGG), homologous gene cluster (eggNOG) annotation, combined with 6 abundances for functional comparison of group differences;

aaacc284e76799b6aedbcccb30ec6955.jpeg

Figure 9. Metatranscriptome analysis workflow . Metatranscriptome has one more step to remove rRNA gene sequence than metagenomics. The disadvantage of this method is that the real species composition cannot be obtained, but it reflects the active species and functional gene expression level composition under specific spatio-temporal conditions.

7. Advanced analysis and visualization

  1. R Language Statistical Drawing and Repeatable Computing

  2. Identification of single bacteria in the metagenomics (bin): MetaWRAP

  3. Bin result evaluation and visualization: CheckM, VizBin

  4. Metagenome Visualization: Circos

  5. Online processes: MEGAN, MG-RAST, EBI-metagenome

  6. Network analysis:  igraph , WGCNA, Cytoscape

  7. Multigene junction tree construction: RaxML , fasttree, iTOL

  8. Other commonly used: Graphlan , Krona

f113c7c161484ea4b23902a80c72c474.jpeg

Figure 10. Metagenome genome composition, abundance, coverage and other information visualization

4b376e36abb4a2ef1f9160a1eb19c755.jpeg

Figure 11. Construction and beautification of evolutionary tree based on polygene connection (Levy-2018-NatureGenetics)

What can you gain after studying this course?

Deeply understand the basic idea of ​​biological sequencing data

d9d506e0228adfa17d66ba4a0a10a506.jpeg

A comprehensive solution for three modes of metagenomic analysis, as well as statistical analysis of the results

b86c59675fb1d2dcc97336d142e6f6e4.jpeg

  • 16S amplicon data PICRUST prediction metagenomics

  • Humann2 quantification of species and function from metagenomic data

  • Denovo metagenomic assembly and binning

Experience in using dozens of software databases

  • Dozens of software installation and use tutorials in this field

  • Understanding and use of common functional annotation databases

Visualization of demanding results

  • Comparison of Differences in Results

  • Various visualization schemes

Lecturer

The lecturers include the Institute of Microbiology, Institute of Genetics and Development, Institute of Genome, Institute of Biophysics, Chinese Academy of Sciences, Chinese Academy of Agricultural Sciences, Tsinghua University, Peking University, Zhongnong and many other front-line technical experts in this field.

Liu Yongxin , PhD in Bioinformatics, Researcher, Doctoral Supervisor, Executive Editor-in-Chief of iMeta Journal, Founder of Metagenome Official Account. The research directions are microbiome method development, food microbiome function research and science communication. At present , he has published 50+ papers in Science , Nature Biotechnology , Cell Host & Microbe and other journals as the first author (including co-author) or the person in charge of microbiome data analysis , with citations of 12,000+ times. Participant of the microbiome analysis platform QIIME 2 project .  Invited to publish microbiome research method reviews in Protein & Cell , Current Opinion in Microbiology , Genetics and other journals as the first author and/or corresponding author (including co-authors) . In July 2017, he founded the "Metagenomics" public account, and currently shares more than 3,100 original articles in this field. Representative works include "Microbiome Chart Interpretation, Analysis Process, and Statistical Drawing" , "QIIME2 Chinese Tutorial" , etc., with 15 followers 10,000+, accumulatively read 40 million+. 

Chen Tong , Ph.D., graduated from the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences in 2015, Ph.D. in bioinformatics, in Cell Stem Cell (IF=23.2, first author and cover article), Nature Communications, Nucleic Acids Research X 4, Protein & Cell  and other high-level magazines publish articles with the first or main author, and operate the WeChat official account of "Life Letter Collection" which is followed by tens of thousands of people , giving you a different experience in learning life letters.

Moments from previous courses

The trainees are mainly from universities and research institutes in mainland China, and there are also researchers from major companies such as Moutai, Wuliangye, Angel Yeast, Huawei, etc., and even overseas Chinese who have traveled thousands of miles from Europe, Australia, the United States, Canada, New Zealand, Singapore, Thailand and other places fly to Beijing or participate in special learning seminars online.

1757630e3c2e86868f1232fa59075f62.jpeg

73c087f36490025ef975ed2a6ca5ff87.jpeg

31ba66f591c29e2a9d4cf6132e487512.jpeg

cf5c10629c2ec03c78a3c688342b0ec8.jpeg

53de6203003ba36da5ca69fd42684662.jpeg

Assistant team

More than a dozen doctors (including students) from the Chinese Academy of Sciences, Tsinghua University, and Peking University, as well as rotating lecturers and teaching assistants, assist students in learning and correcting deficiencies in the training process.

teaching mode

This course focuses on explaining the process and practical operation, and adopts an original four-stage teaching method:

  • The first stage is 3 days of intensive teaching;

  • The second stage is self-practice for 2 weeks;

  • The third phase of online live Q&A;

  • The fourth stage of training video to continue learning;

  • Realize the unified coordination of the four links of teaching-practice-answer-use.

training period

From 9:00 am to 6:00 pm every day, semi-closed teaching (the last 1 hour is for round table discussion time to increase interaction. The last day will be slightly earlier to allow more time for discussion, and it is also convenient for the teacher to return by car)

Registration time: on the day of class

teaching location

Simultaneous online and offline classes: online conference platforms, such as Tencent Conference.

The offline location is C1105, Block C, Caizhi International Building, No. 18 Zhongguancun East Road, Haidian District, Beijing.

course price

  1. 4,500 yuan/person two weeks before class starts

  2. The number of places is limited, and the registration channel will be automatically closed after 40 people sign up for each course

  3. Provide Yihanbo Gene Technology internship opportunities or job opportunities

Course Benefits

  1. Seats are sorted from front to back according to the order of successful registration and payment (or pre-payment)

  2. Give a basic course of programming (http://bioinfo.ke.qq.com)

  3. If multiple people (N, 10>N>1) sign up in a group and pay the fee at the same time, each person can also deduct N-1 hundred yuan (up to 500)

  4. A free Kingston U disk (32G including training data and scripts)

  5. Attach the recommendation and share the corresponding enrollment information to Moments, and send the screenshot to [email protected] to get a 200 yuan student letter book Tencent classroom course coupon (can be split for multiple courses)

  6. Yishengxin launched a number of related courses at the same time, with continuous discounts! Amplicon (preliminary research project) + metagenomics (high precision), I wish you a higher level of analysis.

Precautions *

  1. You need to bring your own laptop, it is recommended to use Win10 system, 4G or more memory (8G is recommended). Course practice will provide cloud computing platform as needed

  2. All the data of the training course, the documents are internal materials, for reference only, and shall not be reproduced for publication without permission

  3. Recording and video recording are prohibited during class

  4. Students who have successfully paid can apply for an extension and change to a follow-up training class if they are temporarily unable to come due to urgent matters; they can also apply for a refund

  5. If you apply for a refund 2 weeks (inclusive) before the start of the course, 85% of the fee can be refunded; if you apply for a refund 3 working days (inclusive) before the start of the course, you can refund 70% of the fee (if the invoice has been issued, you must bear the corresponding handling fee)

  6. Cannot be postponed first and then refunded
    . For more detailed introductions of courses, please scan the QR code below.

f7ede4185c3777f8f3065ffcb785b030.jpeg

Yishengxin launched a number of related courses at the same time, and there are discounts for continuous registration and group purchases!

  1. 连报优惠——Registration for nmultiple courses, each course is n-1100 yuan cheaper;

    The discount for consecutive enrollment courses does not appear in the first course, but in the form of accumulation in the following courses, that is, the discount for the second course is available only after the first course is completed.

  2. 老学员优惠

    The second course will be reduced by 100, and the third course will be reduced by 200. In the following analogy, the maximum amount will not exceed 500 yuan.

  3. Discounts for multi-person groups, the discount range is 参团人数-1100 yuan (if a group member withdraws when paying at the time of registration, the discount will be calculated according to the actual number of participants).
    4. The minimum final price after the discount is not less than 4,000 yuan. The preferential information is changing dynamically, and the price is subject to the correct price finally calculated by the system.

It can also be discounted at the same time as group buying! It is recommended to study amplicon (beginner) + metagenomics (advanced) in order. I wish you a higher level of analysis and become an indispensable person in the experiment. Hurry up and sign up! .

918d002a2b63a2d69a964ef603ef46b7.jpeg

To become an indispensable person in the experiment, copy the link http://www.ehbio.com/Training/ or click to read the original text , sign up now!

Guess you like

Origin blog.csdn.net/woodcorpse/article/details/131950508