Sub-journal of Nature | "Second Generation + Third Generation" Metagenomics Reveals Personalized Structural Variations of Gut Microbiota

     In 2022, in the research paper "Short- and long-read metagenomics expand individualized structural variations in gut microbiomes" published in the journal "Nature Communications", a new method of hybrid assembly of ONT third-generation sequencing and Illumina second-generation sequencing data was established to characterize Refined genetic variation in structural variation (SV) in hundreds of gut microbiomes from healthy humans. Studies have shown that long read lengths significantly improve the quality of metagenomic assemblies, while enabling reliable detection of a large number of extended structural variation types (especially including large insertions and inversions).

Journal: Nature communications

Impact factor: 17.694

Release time: 2022

DOI: 10.1038/s41467-022-30857-9

1. Research background

Gaining insight into the genetic variation of the gut microbiota is an important requirement for understanding its function and impact on host health and disease. Most insights into the composition and function of the microbiome have been obtained based on shotgun metagenomic sequencing data, which supports the analysis of single nucleotide polymorphisms (SNPs) and structural variations (SVs) across populations. The relatively long read length of ONT has been widely used to assemble complex eukaryotic genomes and resolve difficult regions including tandem repeats and large structural variations.

2. Experimental design

This study established a new method for hybrid assembly of ONT third-generation sequencing and Illumina next-generation sequencing data, and detected more microbial structural variations (SVs) including insertion mutations, deletion mutations and gene inversions. At the same time, a joint analysis of metagenomics and metabolomics was performed on the cross-sectional cohort of 100 healthy people and the longitudinal follow-up cohort of 10 people. The specific experimental design is shown in the figure below.

Figure 1 Experimental design

3. Experimental results

1. Hybrid sequencing improves the quality of human intestinal metagenome assembly

Compared with the assembly results of illumina metagenomics alone, the second-generation + third-generation mixed assembly method obtained fewer contigs, and the total number of assembled sequences increased by 5.1%, and the average N50 value increased by more than 2 times. After binning the contigs, metagenomic assembly genomes (MAGs) were obtained, and 9,612 MAGs (20-83 per sample) were obtained through mixed assembly, with an average N50 of 117kb, and 692 MAGs were obtained after removing redundancy (Figure 2b , 2c), of which 623 are available in the UHGG database, and there are 208 high-quality MAGs, and the remaining 67 MAGs are new MAGs. In terms of comprehensiveness, 159 non-redundant MAGs all contained 23S, 16S and 5S rRNA sequences, and 448 MAGs (64.74%) contained at least one type of rRNA sequence.

In contrast, the number of MAGs obtained by the Illumina-based assembly method was 11% less (616), the average N50 value was about half that of the mixed assembly, and only 9 MAGs (1.46%) contained three types of rRNA sequences , only 258 MAGs (41.88%) contained at least one rRNA sequence.

Figure 2 The second-generation + third-generation assembly method enhances the detection and verification of structural variations (SVs)

2. Expand the detection range of intestinal microbiota structure variation

Based on the long sequence of ONT, more characteristics of SVs can be found. In this study, through the comparison of MAGs, various types of SVs were found. Using dRep alignment for 189 strains, 317,558 insertion mutations, 34,129 deletion mutations and 1,373 gene inversions were identified (Fig. 2d). Among them, SVs larger than 500 bp accounted for a large proportion of each SV type (Fig. 2e–g).

Two peaks were observed in the distribution of insertions and deletions, thus hypothesizing that the two peaks of SVs are the result of different biological processes in prokaryotic genomes, especially with regard to the activity of transposons/prophages and other mobile elements. In view of this, the SVs fragments in the two peaks of insertion mutation and deletion mutation (140~160bp and 1050~1150bp, Figure 2e) were randomly selected for analysis. The results showed that there were significant differences between the SVs in the two peaks, and the mobile elements were in short There are more in SVs fragments, so it is inferred that short-sequence SVs may be related to phage integration and other mobile elements; but not all SVs have detectable mobile elements, which only provides a partial and reasonable explanation.

Next, the reliability of the detected SVs was further verified by re-matching the reference MAG or sequences containing SVs in the MAG. Manual inspection finally confirmed that more than 97% of the randomly selected SVs sets were consistent with the number of Reads at multiple positions in the ONT, thus verifying the reliability of single-molecule sequencing to obtain specific SVs (Figure 3a), and also found that the SVs of the same bacterial gene in the same individual low heterogeneity.

A clear trend in the SV datasets of this study is that the frequency of SVs in bacterial genomes is not uniform across taxa. Analysis of SVs at the species level (MAGs) found that the total number of SVs was proportional to the number of MAGs in all samples and the size of the sample genome.

Figure 3 Validation and characterization of structural variations (SVs) in the human gut microbiota

3. SV is functionally informative as a highly individualized feature of the gut microbiome

The analysis of 189 MAGs in the two populations found that there were 16.7 SVs per Mb genome among different individuals, while the median value of SVs per Mb genome in the same individual at different time points was 0 (Fig. 3d). Therefore, SVs can well discriminate bacterial species and collective gut microbiota between different individuals.

Functional enrichment analysis of SV-associated gene functions at a population scale revealed a total of 267 pathways associated with insertion and deletion mutations (Fig. 4a), but no pathways associated with gene inversions were found, possibly due to their number less than insertions/deletions. Of the 30 most affected pathways (ranked by enrichment), 19 were metabolism-related, including for example pathways for 'glycan degradation', 'sphingolipid metabolism' and various carbohydrate metabolisms.

Figure 4. Functional correlations of structural variants (SVs) in the human gut microbiota

4. SVs complicate the link between bacteria and metabolites and host phenotypes

Metabolome analysis of different samples in a healthy population-based cross-sectional cohort revealed that SVs complicate the correlation between bacterial species and metabolites, leading to strain-level functional differences within the same bacterial species significantly associated with metabolites. The association analysis between SVs and metabolism found that 70 SVs affected the significant association between bacteria and 74 fecal metabolites, 31 SVs affected the association between bacteria and 66 urine metabolites, and 2 SVs affected the association between bacteria and 2 serum metabolites. Metabolites were significantly associated.

The presence of 12 SV-affected genes made the association between Fusicatenibacter saccharivorans and new trehalose metabolites in fecal samples insignificant (Fig. 4d); similarly, the presence of 33 SV-affected genes made there no longer a significant relationship between Agathobacter rectalis and F1P. Correlation (Fig. 4e). Among the metabolites and SV-affected genes, four SV-affected metabolites were found, and a total of 11 SV-affected genes were classified into four KEGG pathways, in which both SV-affected genes and metabolites were involved, These findings strongly suggest that SV shapes bacteria-metabolite correlations by affecting the function of related genes.

In order to further study the effect of SVs mutation on phenotype, two metabolites F1P and neotrehalose affected by SVs in the cross-sectional cohort samples were selected for correlation analysis with fasting blood glucose. It was found that both F1P and neotrehalose were significantly negatively correlated with fasting blood glucose, and F. saccharivorans were also significantly negatively associated with fasting glucose, but in the subgroup of SVs, the association became insignificant (Fig. 4h); the presence of SVs also weakened the association of A. rectalis with glucose (Fig. 4i).

Thus, our findings suggest that incorporation of SV can improve the detection power of assays related to bacterial and host health phenotypes by controlling for the effect of SV, which complicates the correlation between bacterial abundance and metabolite concentrations.

5. At the community level, phage and CRISPR structures are highly correlated

All MAGs were analyzed using the machine-learning-based software ProphageHunter, resulting in 2,247 phages with genome sizes between 1,236 bp and 91,792 bp, dominated by long-tailed phages Siphoviridae and muscle-tailed phages Myoviridae (Fig. 5a). Association analysis of phage elements and bacterial genomes yielded 1,077 phage-host pairs (Fig. 5b), of which only 72 were in the MVP database. In contrast, only 1815 phages were detected in the next-generation sequencing data, 80.77% of which were detected in the mixed assembly. It can be seen from the results that ONT-second generation mixed assembly data is more conducive to the discovery of phages.

In addition to phages, there is also a CRISPR-Cas system used to resist viral superinfection in the flora genes to defend against phage reinfection. The analysis of all MAGs found 150,058 CRISPR spacers, with an average of 1665±560 spacers in each sample. Most of the spacers were newly discovered, only 17,600 (11.73%) appeared in the CRISPROpenDB database, and 22,962 (15.30%) appeared in the CRISPROpenDB database. in the gut microbiota of Western populations. In contrast, only 9,542 spacers were found based on next-generation sequencing assembly. Therefore, the new metagenomic assembly method has a stronger ability to discover genetic elements (such as CRISPR spacers).

Beta-diversity analysis of prophage/CRISPR spacers found significantly greater variability within individuals in the cross-sectional cohort than within individuals in the follow-up cohort. Population-level compositional analysis of prophage and CRISPR spacers showed strong covariation between the two, and the results of Platts analysis revealing correlations between prophage and virus community composition showed that prophage and virus among different individuals in the cross-sectional cohort Composition was significantly correlated (Fig. 5c). Analysis of active viral sequences in metagenomic data found that 47 of 2,247 identified prophages were potentially active, thereby indicating the presence of a large number of inactive prophages in bacterial genes, thereby maintaining the stability of SVs.

Figure 5 ONT-improved metagenomes contain highly diverse phage and CRISPR spacers in the human gut microbiome

 4. Research conclusion

In summary, this study established a hybrid assembly method based on third-generation sequencing and next-generation sequencing, which not only improves the data quality, expands the detection range of genetic variation, but also facilitates the discovery of genetic elements such as prophage and CRISPR spacers. SVs regulate bacterial functions that affect the host metabolome and health, calling for more nuanced studies of bacterial contributions to human health and disease beyond a focus on bacterial abundance. Further incorporation of long-read (ONT) into gut microbiome studies will facilitate in-depth dissection of gut microbiome function at specific times and deepen researchers' understanding of various gut disease axes in humans.

references

Short- and long-read metagenomics expand individualized structural variations in gut microbiomes. Nature communications, 2022.

Guess you like

Origin blog.csdn.net/SHANGHAILINGEN/article/details/127442139