Principal Component Analysis R language Kmeans Clustering

EG
set.seed (123)
X1 = Matrix (rnorm (1000,0,0.3), ncol = 10) # 0 mean and standard deviation 0.3 100 10 normal random matrix
x2 = matrix (rnorm (1000,1,0.3 ), ncol = 10) # mean 1 and standard deviation 0.3 100
10 normal random matrix
X = rbind (x1, x2) # form a random matrix of 200 * 10
H = hclust (dist (X-))
Plot (H)
rect.hclust (H, 2)
Here Insert Picture Description
cutree (H, 2) clustering system #
Here Insert Picture Description
km = kmeans (X, 2) #kmeans clustering
km $ cluster # classification result
Plot (X-, PCH km $ = Cluster)
Here Insert Picture Description
EG analog 10 normal random variable matrix samples 2000

set.seed(123)
x1=matrix(rnorm(10000,0,0.2),ncol = 10)
x2=matrix(rnorm(10000,1,0.2),ncol = 10)
km=kmeans(X,2)
kc=km$cluster
table(km)
plot(X,pch=kc,col=kc)

Here Insert Picture Description
Principal component analysis
Principal component analysis is a statistical analysis of several indicators into a composite indicator of the few methods that dimension reduction technique by the method of the principal component of the few more variables into.
EG
14 students height and weight data, do the relevant map to show the relationships between variables

x1=c(147,171,175,159,155,152,158,154,164,168,166,159,164,177)
x2=c(32,57,64,41,38,35,44,41,54,57,49,47,46,63)
cor(x1,x2)

Here Insert Picture Description

plot(x1,x2,xlim = c(145,180),ylim = c(25,75))

Here Insert Picture Description

lines(c(146,178),c(30,66),lwd=2)
lines(c(163,166),c(54,47))
library(shape)
lines(getellipse(24,3,mid = c(162,48),angle = 48),lty=3)

Here Insert Picture Description
The figure coordinate rotation.
Some explanation of the main components: the main purpose of principal component analysis with fewer variables to explain most of the variation in raw materials, that can expect a lot of highly relevant variables transformed into the hands of another independent variable, and can explain several new variable variation of most of the information, the so-called principal components.

X=data.frame(x1,x2)
R=cor(X)#相关系数阵(统计学)
R
S=cov(X)#协方差阵(数学)
S

Here Insert Picture Description
Based on the correlation coefficient matrix is calculated later.
Lord component is to find a linear function, the corresponding maximum variance.

svd(S)#协差阵的奇异值分解
svd(R)#相关阵的奇异值分解

Spectral Decomposition
Usage princomp principal component analysis function () is
princomp (X, COR = FALSE, TRUE = Scores, ...)
X data matrix or a data frame, whether a correlation matrix COR, default covariance matrices, whether the output component score Scores

pc=princomp(X)
pc
pc$sdev^2#主成分方差

Here Insert Picture Description
Loading a main component (main component Here Insert Picture Description)

pc$loadings

Here Insert Picture Description
The principal component coefficients are used Here Insert Picture Description
in component score to winner

pc$loadings
pc$scores

Here Insert Picture Description

pc$scores
plot(pc$scores,asp=1)
abline(h=0,v=0,lty=3)

Here Insert Picture Description
Basic information on ingredients 1, 2 little in composition, further do see more clearly

biplot(pc$scores,pc$loadings)
abline(h=0,v=0,lty=3)

Here Insert Picture Description
(Biplot (scores, loading ...) scores are scores, loadings load factor)
evaluate a main component an integrated statistics:

summary(pc)

Here Insert Picture Description
Variance (standard deviation), the variance contribution ratio (proportion of variance), cumulative contribution of variance (cumulative proportion)
principal component analysis step:
1. The raw data were normalized to give a standard data matrix
2. establish correlation coefficient matrix
3. demand feature and eigenvectors
4. the main component obtained
main component Notes
1. principal component analysis, preferably in the correlation matrix based
2. in order to achieve maximum variance, principal component analysis is not typically be rotating shaft
3. the eigenvalues will typically be less than component 1 abandoned, leaving only the component 1 is greater than
4. in an actual study, if the component with 3 or 5, can be explained variance of 80% of line
5. principal component, a maximum variance causes the variables, and the component separate from each other
Here Insert Picture Description

Published 10 original articles · won praise 2 · Views 229

Guess you like

Origin blog.csdn.net/m0_46445293/article/details/104754559