Practical application of R language multivariate data statistical analysis in ecological environment

Research in the field of ecological environment often faces a large number of different types of data or variables. When multiple dependent variables (y) are to be analyzed at the same time, multivariate statistical analysis (multivariate statistical analysis) is required. Multivariate statistical analysis is rich in content and widely used. It is a very important and practical multivariate data analysis method and statistical tool. Among them, classification/grouping and gradient/order analysis are the core of multivariate statistical analysis. content. Classification/grouping analysis mainly includes clustering (such as hierarchical clustering and k-means clustering, etc.) and difference analysis (such as discriminant analysis and mental test, etc.); gradient/sorting analysis is divided into unconstrained sorting (such as PCA and CA, etc.) and constraint sorting (such as RDA and CCA, etc.)

Are you often very confused when applying multivariate statistical analysis methods and do not know where to start? ? ?

For example, there are many multivariate statistical methods, classification or sorting? Constrained sort or unconstrained sort? Which method or technique is more suitable for my research purpose or data? Secondly, many terms in multivariate analysis have other names, such as unconstrained sorting is also called indirect gradient analysis; again, multivariate data types include continuous data, count data, categorical data and mixed type data, how to choose the appropriate method for different data types ? Etc., etc

The different application scenarios of classification/grouping and sorting/gradient analysis methods in multivariate statistics will be sorted out. Through specific cases, R language related packages will be used to demonstrate the implementation of various methods, so that everyone can be familiar with multivariate data analysis and face it calmly.

Overview of multivariate data analysis of ecological environment  (Working with multivariate data)

1. Concept definition of multivariate statistical methods

2. Application scenarios, similarities and differences of various multivariate statistical methods

3. Multivariate statistical methods Data or variable types and structures

 

Introduction to R and Rstudio and the basics of getting started and drawing

Introduction to R and Rstudio: background, software and package installation, basic settings, etc.

2) Basic operation of R language, including vector, matrix, data frame and data list generation and data extraction, etc.

3) R language data file reading, sorting (cleaning), result storage, etc. (including tidverse)

4) R language basic drawing (including ggplot): basic drawing, typesetting, publication quality drawing output storage

 

 Community data preparation and exploratory analysis

 Biome data preparation: species composition, environmental variables, species functional attributes, phylogenetic trees, etc.

2) Biome data inspection: missing values ​​and outliers (outliers), etc. - to avoid model errors (GIGO)

3) Species diversity calculation: species diversity (TD), functional diversity (FD) and phylogenetic diversity (PD)

4) Introduction of species similarity/dissimilarity matrix association measure

Cluster analysis: non-hierarchical clustering (NHC) 

 1) Overview of clustering and non-hierarchical clustering methods

2) Non-hierarchical clustering: K-means clustering method (kmeans;pam;clara)

3) Comparative analysis of K-means clustering of bird habitat data: cluster number determination, cluster stability, cluster result evaluation, and composite cluster value construction

 

Under Cluster Analysis: Hierarchical Clustering (NHC) 

 Introduction to Hierarchical Clustering Methods: Multivariate Aggregative Hierarchical Clustering (PAHC) vs Multivariate Differentiation Hierarchical Clustering (PDHC)

2) Comparative analysis of hierarchical clustering methods (hcluster and agnes) classification results: gravel map, contour width, correlation map of the same phenotype, etc.

3) Hierarchical cluster analysis of bird habitat data in case 1; hierarchical cluster analysis of fish habitat data in case 2

 

Discrimination analysis: Group Difference Test 

1) Brief introduction to group difference analysis and test of multivariate community data

2) (Non-parametric) Application of multivariate analysis of variance (NP-MANOVA/ADONIS/PERMANOVA), multivariate permutation process (MRPP), multivariate similarity analysis (ANOSIM), and Mantel test (MANTEL) in the multivariate data difference test

3) Multivariate difference test Multi-group data comparison implementation method: MRPP, Mantel

4) Application of the Mental method in the 'standard' community ecology: spatial sampling distance, environmental factors and species composition relationships and their partial Mental analysis

5) Test of differences in suitable habitats for turtles in case 1; analysis of differences in microbial composition data in case 2; analysis of mutual correlation among fish communities, spatial distance and environmental factors in case 3

 

Discrimination (Discrimination) analysis: linear discrimination analysis (LDA) 

1) The multifaceted nature of discriminant analysis (DA)

2) Basic principles and process of linear discrimination (LDA) analysis: data inspection, evaluation assumptions, sample size, variable selection, model determination, result interpretation and model validation

3) Introduction of other discrimination analysis methods (QDA, KNN, etc.)

4) Identification and prediction of the suitable habitat for the case turtle

 

Under Discrimination Analysis: Classification Regression Tree (CART) and Random Forest Model (RFM) 

An Introduction to Classification and Regression Trees for Biome Data

2) Classification regression tree analysis (CART) implementation: differentiation criterion, node complexity, gini index, prior probability effect, misclassification cost, classification tree pruning, Monte Carlo test, variable importance evaluation, model prediction, etc.

3) Implementation of Random Forest Model (RFM): algorithm flow, model evaluation, variable importance evaluation, model classification and regression, etc.

4) Case 1: Classification and regression tree-based habitat division and prediction for turtle communities

5) Case 2: Rhizosphere microbial community and plant growth relationship and variable importance assessment based on random forest model

Indirect gradient analysis - on Unconstrained ordination: PCA 

 

Introduction to Unconstrained Ranking of Biome Data

2) The basic principles of principal component analysis (PCA): assumptions, data requirements, etc.

3) Case: Implementation of PCA sorting analysis of fish habitat data - data preparation, inspection (outlier, multivariate normality, linear relationship, sample independence, etc.), result verification, sorting axis selection (characteristic root criterion, cumulative interpretation rate, random stick breaking criterion, etc.), result interpretation, bisequence diagram, etc.

 

Indirect gradient analysis - under Unconstrained ordination: PCoA, CA, DCA and NMDS 

Introduction and application scenarios of other non-binding methods: CA, DCA, PCOA and NMDS

2) Correspondence analysis (CA) and detrended correspondence analysis (DCA) of bird community composition data in case 1: data preparation, assumptions, total inertia, characteristic root, selection of sorting axis, interpretation of results, occasional species effect/bow effect wait

3) Case 2 Principal coordinate sorting (PCoA) based on distance/similarity index or matrix: distance/similarity index selection, model assumption, negative characteristic root problem, result interpretation, sorting diagram, etc.

4) Case 3 Application of NMDS ranking method: assumptions, basic analysis process, ranking effect evaluation (stress value), ranking diagram, etc.

5) Case 4 Effects of Drugs on Gut Microflora: PCoA+PERMANOVA

6) Case 5 Prediction of multi-dimensional attribute characteristics of ants based on random forest model: RF+PCA+PCoA+PERMANOVA

 

Direct Gradient Analysis - Constrained ordination: RDA 

1) Introduction to Constrained Sorting of Biome Data: Asymmetric Constrained Sorting VS Symmetric Constrained Sorting

2) The basic process of sorting biome data with asymmetric constraints: response variable/species selection (matrix Y), response variable data preprocessing (transformation or standardization), analysis method selection (RDA/db-RDA/CCA), explanatory variable/ Constraint variable selection (matrix X) and analysis and result interpretation, evaluation and presentation

3) Effects of case landscape, patches and site conditions on the species composition of moth communities in forest landscapes

 

Direct Gradient Analysis - Constrained ordination: dbRDA, CCA and symmetric constrained ordination methods 

Case 1: Redundancy Analysis Based on Distance Matrix (dbRDA): Species Composition Data and 0, 1 Data Analysis

2) Case 2: Canonical Correspondence Analysis (CCA) of Community Species Abundance Data: Unimodal Environmental Gradient Analysis Method

3) Case 3: Introduction to the symmetric constraint sorting method and the fourth angle analysis of the correlation between species composition, species attributes and environmental variables

 

Direct gradient analysis - Constrained ordination: Variance Partitioning 

Introduction to Variation Decomposition for Multivariate Statistical Analysis of Community Data

2) Partial regression analysis and variance decomposition

3) Variation decomposition of species composition variation of moth community in forest landscape, patch and site conditions and spatial factors in the case study

 

ggplot (Plotting the results)

  1. Community data and statistical analysis results mapping data preparation: result extraction and sorting
  2. Unconstrained ranking diagrams such as PCA, CA, PCoA, and NMDS: ranking diagrams and biplots
  3. PCoA+PERMANOVA result chart: sorting chart + grouping + PERMANOVA significance of difference + multiple comparisons
  4. RDA, db-RDA and CCA and other constraint sorting diagrams: triple sequence diagram (triplot) and Venn diagram (venn)

 

●Advanced technical application of R language Bayesian method in the field of ecological environment
●R language biome (ecological) data statistical analysis and drawing practice technology application

Guess you like

Origin blog.csdn.net/weixin_46433038/article/details/130467708