EG
set.seed (123)
X1 = Matrix (rnorm (1000,0,0.3), ncol = 10) # 0 mean and standard deviation 0.3 100 10 normal random matrix
x2 = matrix (rnorm (1000,1,0.3 ), ncol = 10) # mean 1 and standard deviation 0.3 100 10 normal random matrix
X = rbind (x1, x2) # form a random matrix of 200 * 10
H = hclust (dist (X-))
Plot (H)
rect.hclust (H, 2)
cutree (H, 2) clustering system #
km = kmeans (X, 2) #kmeans clustering
km $ cluster # classification result
Plot (X-, PCH km $ = Cluster)
EG analog 10 normal random variable matrix samples 2000
set.seed(123)
x1=matrix(rnorm(10000,0,0.2),ncol = 10)
x2=matrix(rnorm(10000,1,0.2),ncol = 10)
km=kmeans(X,2)
kc=km$cluster
table(km)
plot(X,pch=kc,col=kc)
Principal component analysis
Principal component analysis is a statistical analysis of several indicators into a composite indicator of the few methods that dimension reduction technique by the method of the principal component of the few more variables into.
EG
14 students height and weight data, do the relevant map to show the relationships between variables
x1=c(147,171,175,159,155,152,158,154,164,168,166,159,164,177)
x2=c(32,57,64,41,38,35,44,41,54,57,49,47,46,63)
cor(x1,x2)
plot(x1,x2,xlim = c(145,180),ylim = c(25,75))
lines(c(146,178),c(30,66),lwd=2)
lines(c(163,166),c(54,47))
library(shape)
lines(getellipse(24,3,mid = c(162,48),angle = 48),lty=3)
The figure coordinate rotation.
Some explanation of the main components: the main purpose of principal component analysis with fewer variables to explain most of the variation in raw materials, that can expect a lot of highly relevant variables transformed into the hands of another independent variable, and can explain several new variable variation of most of the information, the so-called principal components.
X=data.frame(x1,x2)
R=cor(X)#相关系数阵(统计学)
R
S=cov(X)#协方差阵(数学)
S
Based on the correlation coefficient matrix is calculated later.
Lord component is to find a linear function, the corresponding maximum variance.
svd(S)#协差阵的奇异值分解
svd(R)#相关阵的奇异值分解
Usage princomp principal component analysis function () is
princomp (X, COR = FALSE, TRUE = Scores, ...)
X data matrix or a data frame, whether a correlation matrix COR, default covariance matrices, whether the output component score Scores
pc=princomp(X)
pc
pc$sdev^2#主成分方差
Loading a main component (main component )
pc$loadings
The principal component coefficients are used
in component score to winner
pc$loadings
pc$scores
pc$scores
plot(pc$scores,asp=1)
abline(h=0,v=0,lty=3)
Basic information on ingredients 1, 2 little in composition, further do see more clearly
biplot(pc$scores,pc$loadings)
abline(h=0,v=0,lty=3)
(Biplot (scores, loading ...) scores are scores, loadings load factor)
evaluate a main component an integrated statistics:
summary(pc)
Variance (standard deviation), the variance contribution ratio (proportion of variance), cumulative contribution of variance (cumulative proportion)
principal component analysis step:
1. The raw data were normalized to give a standard data matrix
2. establish correlation coefficient matrix
3. demand feature and eigenvectors
4. the main component obtained
main component Notes
1. principal component analysis, preferably in the correlation matrix based
2. in order to achieve maximum variance, principal component analysis is not typically be rotating shaft
3. the eigenvalues will typically be less than component 1 abandoned, leaving only the component 1 is greater than
4. in an actual study, if the component with 3 or 5, can be explained variance of 80% of line
5. principal component, a maximum variance causes the variables, and the component separate from each other