Learning Diary 2.17

Study Notes (2.17)

1.KNN fruit classification practice

= Pd.read_csv 1.data ( 'E: \ Python-ml \ F.csv')
Import PANDAS AS PD
pd.read_csv CSV file can be read
more parameters can be found https://blog.csdn.net/brucewong0516 / Article this article was / Details / 79,092,579
Here Insert Picture Description
Here Insert Picture Description
2.labelencoder = LabelEncoder ()
data.iloc [:, 0] = labelencoder.fit_transform (data.iloc [:, 0])
this is a supervised learning label encoded in machine learning, the number of complex digital tag, to facilitate processing
A.data.iloc DataFrame screening data can be screened according to the row and column
B. use LOC DataFrame screened data is selected according to the label from the column head, the selection of different columns in the coordinate axis head to get the desired data
C. use ix choose
now pandas officials have not recommended the selection ix, and will be discarded from the pandas in version 0.20.1
Here Insert Picture Description
which is above the corresponding labels from small to large numbers
This is the label above the corresponding number from small to large
3.data. iloc [:, 1:] this syntax again haunts me
to extract all the columns starting from the second row from the first row to the last row
gave me the inspiration that went to the next encounter such a print out data.iloc [: ,1:]
Here Insert Picture Description

  1. knn=KNeighborsClassifier(i)
    KNeighborsClassifier又称K最近邻,是一种经典的模式识别分类方法。sklearn库中的该分类器有以下参数:
from sklearn.neighbors import KNeighborsClassifier;
 
model = KNeighborsClassifier(
    n_neighbors=5, 
    weights=’uniform’, 
    algorithm=’auto’, 
    leaf_size=30, 
    p=2, 
    metric=’minkowski’, 
    metric_params=None, 
    n_jobs=None, 
    **kwargs=object);
 

这里应该用的只是第一个参数, 那几个最近的就分类为一样的不断的调整,在for循环中不断调整K值再把当前K值对应的模型的精确的存放到一个列表里,我们可以挑选出一个精确度最高的K值
5.x_train,x_test,y_train,y_test=train_test_split(data.iloc[:,1:],data.iloc[:,0],test_size=0.3,stratify=data.iloc[:,0],random_state=20
中的参数 stratifysh 是按照中的比列切分
6.整个过程其实不是很复杂,首先获取数据,把数据的标签和特征分开,把数据集中的数据切分为训练集和测试集,用循环的方式找到一个最优的K值放入KNN模型中得到最好分类精确度。

BP神经网络手写数字识别

1.from sklearn.cross_validation import train_test_split
当我写完这行运行时,出现了报错
Here Insert Picture Description
经过百度发现:
这个cross_validatio这个包早就不在使用了,划分到了model_selection这个包中。解决方法:

from sklearn.model_selection import train_test_split

2. The input data is normalized
by subtracting the minimum value of X and then divided by the maximum value, all the data in the matrix X and then divided by the maximum value minus the minimum value, all data can be turned into 0--1 where X data between. Why should normalize it? If there are several hundred million large X, are typically a few tenths of initializing the weights, the weights are multiplied, and X, the filled out into tens of millions of activation function, the activation function is a S-shaped curve, when a large X when, approaching infinity and X 1, X in the derivative is large infinitely close to zero, there is no gradient change weights and the gradient is a relationship, there is no way of learning.

#输入数据归一化
X-=x.min()
X/=X.max()

3. Label binary
image in the label 1 is 1,4 handwritten digit recognition tag is 4, but this does not meet the style neural network, the neural element of the output layer are the values 0 and 1,
if 0 Previous image, his label is 0, we should say so
0> 1000000000
1> to 0100000000
2> the 0.01 billion
3> the 0001000000
4>, 0000100000

BP neural network in the end how it was still not clear, continue to fight

Guess you like

Origin www.cnblogs.com/Eldq/p/12323177.html