Andrew Ng "machine learning" Course summary (14) _ dimensionality reduction

A motive Q1: Data Compression

It will feature dimension reduction, such as the two-dimensional correlation dropped to a one-dimensional:

Three-dimensional change:

The so-dimensional data 1000 to the 100-dimensional data reduction. Reduce the space occupied by memory

Q2 Motivation II: Data Visualization

The dimensions of the data 50 can not be visualized using a dimensionality reduction process can be reduced to make it two-dimensional, and then visualized.

Dimensionality reduction algorithm is only responsible for reducing dimensions, meaning newly generated characteristics we must have our own to find.

Principal component analysis of the problem Q3

(1) Description of the Principal Components Analysis:
issue was going down to the n-dimensional k-dimensional data, the goal is to find a vector k, that minimizes the total error in the projection.

(2) Comparison of principal component analysis and linear regression:

 

Both algorithms are different, the former projection error is minimized, which is the minimum prediction error; without any analysis of the former, the latter object is to predict the result.

Linear regression is perpendicular to the axis of projection, the projection is perpendicular to the principal component analysis to the red line. As shown below:

(3) PCA is a vector of importance "pivot" seek out new sort, according to an important part of the need to front, the dimension of the latter is omitted.

One advantage (4) PCA is totally dependent on the data, without the need to manually set parameters, are independent of the user; this was also the disadvantage can also be seen, since, if the user has some prior knowledge of the data, can not be come in handy, you could not get the desired effect.

Q4 principal component analysis algorithm

PCA is reduced to the n-dimensional k-dimensional:

(1) Mean normalized, i.e. divided by the mean variance reduction;

(2) calculation of the covariance matrix;

(3) calculate eigenvectors of the covariance matrix;

For an nxn matrix of dimension, the formula U is a matrix having a projection direction vector of the minimum error between the data configuration, just go in front of the vectors k vectors obtained nxk dimensions, with U the reduce represented, then It is calculated as follows to obtain the required new feature vector Z (I) = the U- T the reduce * X (I) .

Select the number of principal components Q5

Principal component analysis is projected to reduce the average mean square error, the variance of the training set:

希望可以尽可能的减少二者的比值,比如希望二者的比值小于1%,选择满足这个条件的最小维度。

Q6重建的压缩表示

降维式子:

重建(即从低维回到高维):

示意图如下所示:左图是降维,右图是重建。

Q7主成分分析法的应用建议

正确使用案例:

100 x 100像素的图片,即1000维特征,采用PCA将其压缩至1000维,然后对训练集运行学习算法,在预测时,对测试集采用之前学到的Ureduce将测试集的x转换成z,再进行预测。

错误使用情况:

(1)尝试用PCA来解决过拟合,PCA是无法解决过拟合的,应该用正则化来解决。

(2)默认把PCA作为学习过程的一部分,其实应该尽量使用原始特征,只有在算法运行太慢或者占用内存太多的情况下才考虑使用主成分分析法。

Guess you like

Origin www.cnblogs.com/henuliulei/p/11286991.html