Machine Learning - principal component analysis PCA

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/xiao_lxl/article/details/97390597


Ten of machine learning algorithms - principal component analysis PCA (principal components analysis)

Dimensionality reduction

Universe, is the sum of time and space. Time is one-dimensional, and the dimensions of space, say, so far inconclusive. 9-dimensional string theory said, Hawking accepted by M-theory is considered to be 10 dimensions. They explained that the dimensions other than the three-dimensional human beings are perceived by curling in a very small spatial scales. Of course, these are not for talking about selling "three-body" series of books, more than guide the reader to explore the true meaning of the universe, or even suspect that the nature of life, but to elicit machine learning classroom theme today - dimensionality reduction.

Dimensionality of the data in machine learning and spatial dimensions of the real world this same Moli. In machine learning, data often need to be represented as a vector to form an input to train the model. But we all know, high-dimensional vector processing and analysis, will greatly consume system resources, and even curse of dimensionality. For example in a CV (Computer Vision) Field of the RGB image feature extracting a 100x100 pixels, dimensions will reach 30,000; based on the NLP (Natural Language Processing) field <document - word> feature matrix, also produced a few hundreds of web Feature vector. Therefore, to reduce the dimensions that characterize the original high-dimensional vector with a low dimension is particularly important. Just think, if the universe really as M-theory said, the position of each celestial body consists of a ten-dimensional coordinates to describe, there should be no normal person can imagine a structure in which the space. But when we put these planets projected onto a two-dimensional plane, the entire universe will be as intuitive as the Milky Way above them.

Common dimensionality reduction methods include principal component analysis (PCA), Linear Discriminant Analysis (LDA), isometry (Isomap), locally linear embedding (LLE), Laplacian feature mapping (LE), locality preserving projection (LPP )Wait. These methods can turn as a linear / non-linear, supervision / unsupervised global / local, differently divided. Where the PCA as the most classic method, has been 100 years of history, it belongs to a linear, unsupervised, global dimensionality reduction algorithm. Today we take a look back at this century classic enduring.

Principal component analysis statistical belongs, through a set of orthogonal transform to convert the variable correlation may exist as a set of linear uncorrelated variables, the set of the converted variables called principal components.

Here Insert Picture Description

Some practical applications include principal component analysis data compression, simplify data presentation, data visualization. It is worth mentioning that the domain knowledge needed to determine the suitability of using principal component analysis algorithm. If noise data is too large (ie, the variance of the individual components are large), not suitable for the use of principal component analysis algorithm.

PCA algorithm

PCA principle and objective function

PCA (principal components analysis), principal component analysis, the main component is intended to find the data, and using these original data characterizing a main component, so as to achieve the purpose of dimensionality reduction. As a simple example, a series of data points in three dimensional space, these points are distributed in a plane through the origin. If we use the natural coordinates x, y, z data to represent the three axes, the need to use three dimensions, and in fact only in these points on a two-dimensional plane, so that if we rotate the plane coordinate system and the data is located by x, y plane coincident, then we can ', y' raw data expressed by two dimensions x, and without any loss, thus completing the data dimension reduction, and X ', y' information contained in the two axes It is the main ingredient we want to find.

But in high-dimensional space, we tend to not like this just intuitively imagine a form of distributed data, it is more difficult to pinpoint the main shaft component corresponds to what. Wish, we start with the most simple two-dimensional PCA data to look at exactly how this works.
Here Insert Picture Description
Upper (left) is passed through a set of two-dimensional space of the data center, we can easily see the general direction of where the axis of the main component (hereinafter referred to as a spindle), i.e., the right axis of the green line is located. Since the shaft is located in the green line, data distribution is more dispersed, which means greater variance data in this direction. In the field of signal processing we believe that the signal has a large variance, noise has a smaller variance than the signal to noise ratio is called signal to noise ratio, the better the quality the greater the signal to noise ratio means that data. Thus we can elicit PCA goal, that is to maximize the projected variance, that is, let the data on the spindle projected maximum variance.
Here Insert Picture Description
Here Insert Picture Description

PCA solution method

Readers familiar with linear algebra will soon find the original, the variance of x is the projection of the eigenvalues ​​of the covariance matrix. We want to find the largest variance is the largest eigenvalues ​​of the covariance matrix, the best projection direction is the greatest feature of the feature vector corresponding to the value. Orthogonal spatial sub-optimal projection direction of the projection is located in the optimum direction, it is the second largest eigenvector corresponding to the eigenvalue, and so on. So far, we have been solving method of PCA:

Here Insert Picture Description
Here Insert Picture Description

PCA least square error theory

Problem Description

In fact, solve the PCA observed that the best projection direction, that is a straight line, which is a mathematical problem in linear regression goals coincide, whether the definition of PCA from the perspective of return objectives and accordingly solve the problem?

analysis

Here Insert Picture Description

We still consider these two-dimensional space sample points, the maximum angle variance solving is a straight line, so that the sample points are projected to the maximum variance on this line. From ideas for solving linear, it is easy to think of linear regression problem in mathematics, the goal is to solve a linear function such that the corresponding straight line to better fit the sample collection point. If we define the target PCA from this perspective, then the problem will be transformed into a regression problem.

Along the way, in the high-dimensional space, we actually want to find a d-dimensional hyper-plane, so that the data point to the square of the distance and hyper-plane minimum. For the one-dimensional case, a hyperplane is a straight line degradation, i.e. the optimal sample points onto a straight line, it is to minimize the sum of squared distances to all the points of a straight line and, as shown in FIG.
Here Insert Picture Description

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

W has nothing to do with our first xkTxk selected, is a constant. We have just obtained by the projection vector representation of the second and third terms, respectively, to continue
Here Insert Picture Description
where ωiTxk and ωjTxk represents a projected length, are digital. And when i ≠ j, ωiTωj = 0, so the formula of the cross-term only item d.

Here Insert Picture Description
We want to minimize the equation that is the summation of all k can be written as
Here Insert Picture Description

If we d W in a base ω1, ω2, ..., ωd in order to solve, and you will find on a fully equivalent method. For example, when d = 1, we actually solve the problem is
Here Insert Picture Description
consistent with the optimal projection of the best straight line for Solving ω and varimax direction, i.e., the maximum eigenvalue of the covariance matrix of the corresponding eigenvectors, the only difference is the covariance matrix a multiple of Σ, and a constant bias, but this does not affect our optimized to the maximum.

Guess you like

Origin blog.csdn.net/xiao_lxl/article/details/97390597
Recommended