Getting Started with Udacity Machine Learning - Principal Component Analysis PCA

Is the following data one-dimensional or two-dimensional


Exercise 1: Find the center (2,3) of the new coordinate system through PCA, △x=1, move to the right along the coordinate system x', then △y=1; △y=1, move up along the coordinate system y' , Then △x=-1

Vector length √2, calculated according to the original coordinate system


Exercise 2: Find the center (3,3) of the new coordinate system through PCA, △y=-1, move along the coordinate system x', then △x=2; △x=1 move along the coordinate system y', then △y =2

 x’=0.5a+3.5     

 x’+△y=0.5(a+△x)+3.5     

Because △y=1, then △x=2


y'=2a-3 (the product of the vertical slopes of the two straight lines -1)           

y'+△y=2(a+△x)-3   

Because △x=1, then △y=2


Exercise: What data can be used for PCA?


Exercise: When the Axis Dominates

Is the long axis dominant (the long axis eigenvalues ​​are much larger than the short axis eigenvalues)? 1 and 3 the long axis captures the full data, 2 the short and long axis extend continuously, so its two eigenvalues ​​may be of the same magnitude, and we don't actually get more information by running pca



From four features to two


In the case of not knowing how many features there are, select the size and neighborhood features and use SelectKBest (only keep the K most suitable features)

SelectPercentile (percentage of features retained)



Compound Features:

Two-dimensional features are mapped to one-dimensional


Exercise: Mapping along the dimension with the greatest variance preserves the most information in the original data


Exercise: Maximum number of principal components = minimum of number of training points and number of features

PCA Review/Definition:

1. PCA is a systematic way of transforming input features into their principal components, which are then available to you instead of the original input features, which you use as new features in regression or classification tasks.

2. The definition of principal components is the direction in the data that maximizes the variance, which minimizes the possibility of information loss when you perform projection or compression on these principal components,

3. You can also classify the principal components. The greater the variance of the data due to a specific principal component, the higher the level of the principal component. Therefore, the principal component with the largest variance is the first principal component, and the variance is the first principal component. The second largest is the second principal component, and so on.

4. The principal components are in a sense perpendicular to each other, so from a mathematical point of view, the second principal component will never overlap the first, and the third will not pass through the second to the first. overlap, etc, so in a sense you can treat them as separate features

5. There is an upper limit on the number of principal components = the number of input features in the dataset


When to use PCA:

1. If you want to access hidden features that you think might show up in the pattern of your data, all you're trying to do is probably determine if there are hidden features, in other words, you just want to know The size of the first principal component (example: is it possible to estimate who is the tycoon of Enron)

2. Three use cases for dimensionality reduction

    - Visualize high dimensional data (how to plot three, four or more features that can represent a data point when there are only two dimensions, project it to the first two principal components, then just plot and plot scatter plot)

    - Suspect that there is noise in the data, almost all data has noise, hope the first or second is your most powerful principal component that captures the true pattern in the data, while the smaller principal components only represent Noisy variants of these modes, thus removing these noises by discarding less important principal components

    - Use PCA for preprocessing before using another algorithm, i.e. induction or classification tasks, if you have high dimensionality, and the algorithm complexity is high, then the variance of the algorithm will be high, which will eventually be lost by the data in the data Noise Assimilation (Eigenfaces, a method of using PCA for photos of people, there are many pixels in the photo, if you want to identify the identity of the person captured in the picture, run some kind of face recognition or identify the content of the photo, using PCA can be very Reduce the input dimension to maybe one-tenth of its original size, fill it in the SVM and then do the real recognition)


PCA in sklearn

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver = 'auto', toll = 0.0, whiten = False)
>>> print(pca.explained_variance_ratio_)  
[ 0.99244...  0.00755...]
>>> print(pca.singular_values_)  
[ 6.30061...  0.54980...]

 PCA Mini Project:

Exercise: We mentioned that PCA sorts the principal components, the first having the largest variance, the second having the second largest variance, and so on. How much variance can the first principal component explain? 0.19346534 What about the second one? 0.15116844

Variance ratio (is the specific form of feature worth):

print pca.explained_variance_ratio_#Explained variance

Exercise: Now you will try to keep different numbers of principal components. In multi-class classification problems like this one (where more than two labels are applied), the metric of accuracy is not as intuitive as in the two-class case. Instead, the more commonly used metric is the F1 score.

We'll learn about F1 scores in the Evaluation Metrics course, but it's up to you to figure out for yourself whether a good classifier is characterized by a high or low F1 score. You will determine this by varying the number of principal components and seeing how the F1 score changes accordingly.

When adding more principal components as features to train a classifier, do you want it to perform better or worse?

could go either way

While ideally, adding components should provide us additional signal to improve our performance, it is possible that we end up at a complexity where we overfit.


Exercise:  Change n_components  to the following values: [10, 15, 25, 50, 100, 250]. For each principal component, note Ariel Sharon's F1 score. (For 10 principal components, the plotting functionality in the code will fail, but you should be able to see the F1 score.)

If you see a high F1 score, does this mean that the classifier is performing better or worse? better

n_components =10


n_components = 15


n_components = 25


n_components = 50

n_components = 100


n_components = 150


n_components = 250



Exercise: Do you see any evidence of overfitting when using a large number of principal components? Does PCA dimensionality reduction help improve performance?

Yes, when there are more PCs, the performance will decrease. When there are 250 features, it is significantly lower than 100.

Exercise: Choose Principal Components

The main method is to try until at a certain point the increase will cause it to fall or not move, that's the end of it

Train for different numbers of principal components, and then observe, for each possible number of principal components, how accurate the algorithm is. After repeating several times, it is found that there will be diminishing returns at a certain point, that is, adding more There is not much difference in the results after more main ingredients

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325965790&siteId=291194637