t-SNE and PCA

1.t-SNE

Know almost

  • Embedding Algorithm field distribution t-
  • Although the main non-linear high-dimensional data dimensionality reduction, but rarely used, because
  • Suitable for visual comparison, the effect of the test model
  • Ensure high similarity in the distribution of low-dimensional feature space on the original data distribution

Therefore, to view the effect of the classification more

1.1 reproduce demo

# Import TSNE
from sklearn.manifold import TSNE 

# Create a TSNE instance: model
model = TSNE(learning_rate=200)

# Apply fit_transform to samples: tsne_features
tsne_features = model.fit_transform(samples)

# Select the 0th feature: xs
xs = tsne_features[:,0]

# Select the 1st feature: ys
ys = tsne_features[:,1]

# Scatter plot, coloring by variety_numbers
plt.scatter(xs,ys,c=variety_numbers)
plt.show()

2.PCA

  • When the characteristic variable lot of time, there is often a multi-collinearity between variables.
  • Principal component analysis, dimension reduction for high-dimensional data, the main data extracted characteristic component
  • PCA can "kill two birds with one stone," the place is
    • Either a representative selection feature,
    • Linear independence between each of the features
    • What is the best linear combination of the summary of the original feature space
      has a very straightforward chestnuts know almost

2.1 Mathematical Reasoning

You can refer to [] Machine learning dimensionality reduction --PCA (very detailed)
Making Sense of Principal Component Analysis, & Eigenvalues Eigenvectors

2.2 reproducibility

# Perform the necessary imports
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Assign the 0th column of grains: width
width = grains[:,0]

# Assign the 1st column of grains: length
length = grains[:,1]

# Scatter plot width vs length
plt.scatter(width, length)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation
correlation, pvalue = pearsonr(width, length)

# Display the correlation
print(correlation)

# Import PCA
from sklearn.decomposition import PCA

# Create PCA instance: model
model = PCA()

# Apply the fit_transform method of model to grains: pca_features
pca_features = model.fit_transform(grains)

# Assign 0th column of pca_features: xs
xs = pca_features[:,0]

# Assign 1st column of pca_features: ys
ys = pca_features[:,1]

# Scatter plot xs vs ys
plt.scatter(xs, ys)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation of xs and ys
correlation, pvalue = pearsonr(xs, ys)

# Display the correlation
print(correlation)

<script.py> output:
    2.5478751053409354e-17

Guess you like

Origin www.cnblogs.com/gaowenxingxing/p/12313864.html