Feature data extraction

Data expressed: Sometimes we original characteristic by converting the data set to generate a new "feature" or ingredients, will be better than the direct use of the original features of the effect, i.e., data representation (data representation)

Feature extraction: image recognition, data representation is very important, because the image is composed of thousands of pixels, each pixel has different RGB color values, so we have to use the feature extraction such data processing method, feature extraction is the use of a computer method and procedure information belonging to the extracted image characteristic.

1. PCA principal component analysis for feature extraction

############################# PCA principal component analysis for feature extraction ############ ########################### 
# import drawing tools 
import matplotlib.pyplot AS PLT 
# import splitting tool data set 
from sklearn.model_selection import train_test_split 

# import data set acquisition tool 
from sklearn.datasets import fetch_lfw_people 
# Loading face dataset 
faces = fetch_lfw_people (= 20 is min_faces_per_person, a resize = 0.8) 
image_shape faces.images = [0] .shape 
# photo print 
fig, axes = plt.subplots (3,4-, figsize = (12,9), subplot_kw = { 'xticks' :(), 'yticks' :()}) 
for target, Image, AX in ZIP (faces.target, faces.images , axes.ravel ()): 
    ax.imshow (image, CMap = plt.cm.gray) 
    ax.set_title (faces.target_names [target]) 
# display image 
plt.show ()

# Import neural network 
from sklearn.neural_network Import MLPClassifier 
# split data 
X_train, X_test, y_train, android.permission.FACTOR. = Train_test_split (faces.data/255,faces.target,random_state=62) 
# train the neural network 
mlp = MLPClassifier (hidden_layer_sizes = [100,100], = 62 is random_state, max_iter = 400) 
mlp.fit (X_train, y_train) 
# print model accuracy 
print ( 'model recognition accuracy: {: .2f}'. format (mlp.score (X_test, y_test) ))

Model recognition accuracy rate: 0.88

# Import the PCA 
from sklearn.decomposition Import the PCA 
# using whitening facial data processing functions 
the PCA = the PCA (Whiten = True, n_components = 0.9, random_state = 62) .fit (X_train) 
X_train_whiten = pca.transform (X_train) 
X_test_whiten = the PCA. Transform (X_test) 
# whitened data print form 
print ( 'whitened data form: {}'. format (X_train_whiten.shape ))

Data whitened form: (50, 21)

# Data using the neural network after the training albino 
mlp.fit (X_train_whiten, y_train) 
# print model accuracy 
print ( 'recognition accuracy after ablation model data: {: .2f}'. Format (mlp.score (X_test_whiten, y_test) ))

After ablation model data recognition accuracy: 0.94

2. NMF used for feature extraction

NMF (Non-Negative Matrix Factorization, NMF): matrix decomposition, is to disassemble a product of a matrix of n matrices, rather than negative matrix factorization, is that all the original values of the matrix must be greater than or equal to 0, of course, after the decomposition of the data matrix is greater than or equal to 0.

############################# NMF used for feature extraction ############# ########################## 
# import NMF 
from sklearn.decomposition import NMF 
# using NMF data processing 
nmf = NMF (n_components = 15, random_state = 62 is) .fit (X_train) 
X_train_nmf = nmf.transform (X_train) 
X_test_nmf = nmf.transform (X_test) 
# NMF printing form processed data 
print ( 'post-processed data NMF form: {}'. format (X_train_nmf.shape ) )

NMF processed data form: (50, 15)

# Train the neural network with data processing NMF 
mlp.fit (X_train_nmf, y_train) 
# print model accuracy 
print ( 'post-processing model accuracy nmf: {:. 2f}'. Format (mlp.score (X_test_nmf, y_test) ))

Model accuracy nmf after treatment: 0.94

to sum up:

　　NMF NMF and PCA principal component analysis is different, if we reduce the number of components of NMF, it will regenerate a new ingredient, and the new components and the original ingredients are completely different,

　　Further, NMF ingredients are not ordered, and this point there are different PCA.

　　NMF on n_components parameter does not support floating point numbers can only be set to a positive integer, which is the difference from the PCA.

Quoted from the article: "layman's language python machine learning"