Data expressed: Sometimes we original characteristic by converting the data set to generate a new "feature" or ingredients, will be better than the direct use of the original features of the effect, i.e., data representation (data representation)
Feature extraction: image recognition, data representation is very important, because the image is composed of thousands of pixels, each pixel has different RGB color values, so we have to use the feature extraction such data processing method, feature extraction is the use of a computer method and procedure information belonging to the extracted image characteristic.
1. PCA principal component analysis for feature extraction
############################# PCA principal component analysis for feature extraction ############ ########################### # import drawing tools import matplotlib.pyplot AS PLT # import splitting tool data set from sklearn.model_selection import train_test_split # import data set acquisition tool from sklearn.datasets import fetch_lfw_people # Loading face dataset faces = fetch_lfw_people (= 20 is min_faces_per_person, a resize = 0.8) image_shape faces.images = [0] .shape # photo print fig, axes = plt.subplots (3,4-, figsize = (12,9), subplot_kw = { 'xticks' :(), 'yticks' :()}) for target, Image, AX in ZIP (faces.target, faces.images , axes.ravel ()): ax.imshow (image, CMap = plt.cm.gray) ax.set_title (faces.target_names [target]) # display image plt.show ()
# Import neural network from sklearn.neural_network Import MLPClassifier # split data X_train, X_test, y_train, android.permission.FACTOR. = Train_test_split (faces.data/255,faces.target,random_state=62) # train the neural network mlp = MLPClassifier (hidden_layer_sizes = [100,100], = 62 is random_state, max_iter = 400) mlp.fit (X_train, y_train) # print model accuracy print ( 'model recognition accuracy: {: .2f}'. format (mlp.score (X_test, y_test) ))
Model recognition accuracy rate: 0.88
# Import the PCA from sklearn.decomposition Import the PCA # using whitening facial data processing functions the PCA = the PCA (Whiten = True, n_components = 0.9, random_state = 62) .fit (X_train) X_train_whiten = pca.transform (X_train) X_test_whiten = the PCA. Transform (X_test) # whitened data print form print ( 'whitened data form: {}'. format (X_train_whiten.shape ))
Data whitened form: (50, 21)
# Data using the neural network after the training albino mlp.fit (X_train_whiten, y_train) # print model accuracy print ( 'recognition accuracy after ablation model data: {: .2f}'. Format (mlp.score (X_test_whiten, y_test) ))
After ablation model data recognition accuracy: 0.94
2. NMF used for feature extraction
NMF (Non-Negative Matrix Factorization, NMF): matrix decomposition, is to disassemble a product of a matrix of n matrices, rather than negative matrix factorization, is that all the original values of the matrix must be greater than or equal to 0, of course, after the decomposition of the data matrix is greater than or equal to 0.
############################# NMF used for feature extraction ############# ########################## # import NMF from sklearn.decomposition import NMF # using NMF data processing nmf = NMF (n_components = 15, random_state = 62 is) .fit (X_train) X_train_nmf = nmf.transform (X_train) X_test_nmf = nmf.transform (X_test) # NMF printing form processed data print ( 'post-processed data NMF form: {}'. format (X_train_nmf.shape ) )
NMF processed data form: (50, 15)
# Train the neural network with data processing NMF mlp.fit (X_train_nmf, y_train) # print model accuracy print ( 'post-processing model accuracy nmf: {:. 2f}'. Format (mlp.score (X_test_nmf, y_test) ))
Model accuracy nmf after treatment: 0.94
to sum up:
NMF NMF and PCA principal component analysis is different, if we reduce the number of components of NMF, it will regenerate a new ingredient, and the new components and the original ingredients are completely different,
Further, NMF ingredients are not ordered, and this point there are different PCA.
NMF on n_components parameter does not support floating point numbers can only be set to a positive integer, which is the difference from the PCA.
Quoted from the article: "layman's language python machine learning"