PCA , PCoA in life letter

PCA , PCoA in life letter

Both PCA and PCOA are dimensionality reduction processing data sets, except that the former is principal component analysis and the latter is principal coordinate analysis. However, in bioinformatics analysis, it is used to check the similarity or difference between samples.
First of all, we need to understand the reason why data dimensionality reduction is required in the biometrics. The simple reason is that the data set is too large for us to do data statistics and analysis. For example, we do a 16sDNA analysis with 200 samples. Each sample is a dimension, which is equivalent to 200 dimensions. We can't understand so many dimensions, just like two-dimensional space cannot understand three-dimensional space.
So we perform dimensionality reduction. Dimensionality reduction to a dimension that we can understand is convenient for our statistical analysis.
PCA (Principal Component Analysis)
carries out the idea of ​​dimensionality reduction, and then uses variance decomposition to make the difference contribution value be placed on the two-dimensional coordinate axis, as shown in the following figure (here is the pca diagram drawn by iris data) The same kind of iris flowers
use the same color Indicates that the closer the points are, the more similar they are. The X-axis is the contribution value of the first principal component, and the Y-axis is the contribution value of the second principal component.
insert image description here
PCoA

The idea of ​​dimensionality reduction is also adopted, but PCoA uses different distance algorithms to obtain the projection of the sample distance matrix. The distance of the sample points in the graph is equal to the difference data distance in the distance matrix. (Let me explain here. Generally speaking, PCA directly reduces the dimensionality of the abundance data obtained by the sample, while PCoA undergoes distance calculations, such as Euclidean distance. One of the characteristics here is that there are only two sets of abundance data directly. The dimensionality reduction of the sample is 1, so it cannot be analyzed by pca, but if it is calculated based on the Euclidean distance, it is not possible to reduce the dimensionality of twenty all samples in the group into nxn and reduce the dimensionality to nx(n-1), here pca is possible again.)
PCoA is close to the picture and has a high similarity.
(Python has too few calculations about PCoA, and they are basically drawn in R language. Here I also use R to show it)
insert image description here

Guess you like

Origin blog.csdn.net/whiteof/article/details/125500838