机器学习基础(九)之PCA&梯度上升2

Jupyter Notebook
PCA3
现在就是都用sklearn里面封装的方法啦,自己就不写了

from sklearn.decomposition import PCA

from sklearn import datasets
from sklearn.model_selection import train_test_split

digist = datasets.load_digits()

X.shape
X = digist.data
Y = digist.target
X.shape
(1797, 64)

Y
X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
KNN 算法来搞一下

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()

%time 
%time knn_clf.fit(X_train,Y_train)
Wall time: 2.99 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

%time 
%time knn_clf.score(X_test, Y_test)
Wall time: 58.8 ms
0.9755555555555555
现在用PCA降维来处理
pca = PCA(0.95)

pca.fit(X_train)

X_train_transform = pca.transform(X_train)
X_test_transform = pca.transform(X_test)

X_train_transform.shape
(1347, 28)
现在降到了28,我们来测试一下时间性能吧

nkk_clf_2 = KNeighborsClassifier()

nkk_clf_2
%time nkk_clf_2.fit(X_train_transform, Y_train)
Wall time: 1.99 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

%time nkk_clf_2.score(X_test_transform, Y_test)
Wall time: 22.9 ms
0.9711111111111111
不降维处理,花费58.8ms,准确率为0.975555
降维时间提升两倍多,准确率才损失0.04,是不是很划算。
要是数据在很多的时候,效能的提升是不是更加明显

猜你喜欢

转载自blog.csdn.net/qq_37982109/article/details/88082337