Ordination analysis and R implementation

In fields such as ecology, statistics, and biology, ordination analysis is a multivariate statistical technique used to explore and demonstrate the structure of data. This analysis method works by mapping samples or variables in a multidimensional data set into a low-dimensional space to make it easier to understand and visualize the relationships between the data. Ordination analysis is often used to study ecological and biological issues such as species composition and ecosystem structure.

1. Common ranking analysis methods:

  1. Principal Component Analysis (PCA):  used to reduce dimensionality and identify the main direction of data variation. Suitable for data sets with strong linear relationships, such as species richness or environmental variables in ecology.

  2. Correspondence Analysis (CA): Mainly used to analyze the relationship between two categorical variables. Often used to analyze the relationship between species and environmental factors in ecology.

  3. Multidimensional scaling analysis (Non-metric Multidimensional Scaling, NMDS): used for data with strong nonlinear relationships or where Euclidean distance is not applicable. It is suitable for problems such as habitat similarity analysis in ecology.

  4. Canonical Correspondence Analysis (CCA):  Similar to correspondence analysis, but focuses on explaining the structure in the data and finding constrained correspondences by maximizing explainable variation. Suitable for analysis of relationships between species and environmental variables. Used to analyze the relationship between two tables (for example, species data and environmental data), combining features of correspondence analysis and multiple regression.

  5. Factor Analysis: Used to identify potential variables (factors) hidden behind observed data, usually used to explore the intrinsic structure of the data.

2. R implementation of classic sorting analysis method

Download Data.

library(microbiome)
library(phyloseq)
library(ggplot2)
data(dietswap)
pseq <- dietswap

# Convert to compositional data
pseq.rel <- microbiome::transform(pseq, "compositional")

# Pick core taxa with with the given prevalence and detection limits
pseq.core <- core(pseq.rel, detection = .1/100, prevalence = 90/100)

# Use relative abundances for the core
pseq.core <- microbiome::transform(pseq.core, "compositional")

Project the sample using the given method and difference measure. 

# Ordinate the data
set.seed(4235421)
# proj <- get_ordination(pseq, "MDS", "bray")
ord <- ordinate(pseq, "MDS", "bray")
Multidimensional scaling (MDS / PCoA)
plot_ordination(pseq, ord, color = "nationality") +
                geom_point(size = 5)

Canonical correspondence analysis (CCA)

# With samples
pseq.cca <- ordinate(pseq, "CCA")
p <- plot_ordination(pseq, pseq.cca,
       type = "samples", color = "nationality")
p <- p + geom_point(size = 4)
print(p)

# With taxa:
p <- plot_ordination(pseq, pseq.cca,
       type = "taxa", color = "Phylum")
p <- p + geom_point(size = 4)
print(p)

Split plot

plot_ordination(pseq, pseq.cca,
              type = "split", shape = "nationality", 
                  color = "Phylum", label = "nationality")

t-SNE

t-SNE is a popular new ranking method.

library(vegan)
library(microbiome)
library(Rtsne) # Load package
set.seed(423542)

method <- "tsne"
trans <- "hellinger"
distance <- "euclidean"

# Distance matrix for samples
ps <- microbiome::transform(pseq, trans)

# Calculate sample similarities
dm <- vegdist(otu_table(ps), distance)

# Run TSNE
tsne_out <- Rtsne(dm, dims = 2) 
proj <- tsne_out$Y
rownames(proj) <- rownames(otu_table(ps))

library(ggplot2)
p <- plot_landscape(proj, legend = T, size = 1) 
print(p)

Applicability depends on the nature of the data and the objectives of the study. When choosing a ranking analysis method, you need to consider the linearity, distribution, correlation, and possible underlying structure of the data. Choosing an appropriate ranking analysis method can help you better understand patterns and relationships in your data set.

References:

Ordination analysis

Guess you like

Origin blog.csdn.net/qq_42458954/article/details/134719029