100 days to get the machine learning | Day11 achieve KNN

Machine learning 100 days | Day1 data preprocessing

100 days to get the machine learning | Day2 simple linear regression analysis

100 days to get the machine learning | Day3 multiple linear regression

100 days to get the machine learning | Day4-6 logistic regression

100 days to get the machine learning | Day7 K-NN

100 days to get the machine learning | logistic regression mathematical principles Day8

100 days to get the machine learning | Day9-12 SVM

Day7, we learned K nearest neighbor (k-NN), to understand its definition, how it works, describes the common focus distance and k values ​​selected. Day11, implement the algorithm through a case study.

The first step: Import the relevant library
Import numpy AS NP
Import matplotlib.pyplot AS plt
Import PANDAS AS pd
Step Two: Import data set

dataset = pd.read_csv ( '../ datasets / Social_Network_Ads.csv')
for ease of understanding, here we only take Age Age estimation and salary as EstimatedSalary wherein

= Dataset.iloc X-[:, [2,. 3]] values.
Y = dataset.iloc [:,. 4] .values
third step: the data into training and testing sets

train_test_split fromsklearn.model_selectionimport
X_train, X_test, y_train, android.permission.FACTOR. train_test_split = (X-, Y, test_size = 0.25, random_state = 0)
The fourth step: feature scaling

Import StandardScaler sklearn.preprocessing from
SC = StandardScaler ()
X_train = sc.fit_transform (X_train)
X_test = sc.transform (X_test)
Step 5: K-NN training set of training data

Import KNeighborsClassifier neighbors learn from the class sklearn

from sklearn.neighbors import KNeighborsClassifier
set parameters related n_neighbors = 5 (K value selected, the default selection 5), metric = 'minkowski' ( distance metric selection, where Minkowski distance is selected (default parameters)), P = 2 (subsidiary of the metric distance metric parameters, for selecting only the Minkowski distance and a weighted Minkowski distance value of p, p = 1 is the Manhattan distance, Euclidean distance is p = 2. the default is 2)

classifier = KNeighborsClassifier(n_neighbors=5, metric ='minkowski', p =2)
classifier.fit(X_train,y_train)

KNeighborsClassifier (= algorithm 'Auto', leaf_size = 30, Metric = 'Minkowski',
metric_params = None, n_jobs =. 1, N_NEIGHBORS =. 5, P = 2,
weights = 'Uniform')
Sixth step: The test set prediction
y_pred = classifier.predict (X_test)
seventh step: generating a confusion matrix
a confusion matrix may analyze the performance of a classifier, whereby a number of metrics can be calculated, for example: the ROC curve, the correct rate
fromsklearn.metricsimport confusion_matrix
cm & lt confusion_matrix = (android.permission.FACTOR. , y_pred)
Print (cm & lt)
[[64. 4]
[29. 3]]
Print (classification_report (android.permission.FACTOR., y_pred))
prediction set 0 total of 68, a total of 32. In this confusion matrix, there are actually 68 0, but the predicted K-NN 67 (64 + 3) 0, which is actually a three. While the K-NN has predicted that 33 (29 + 4) th 1, 4 is actually zero.

Data Download Link:
https://pan.baidu.com/s/1cPBt2DAF2NraOMhbk5-_pQ
extraction code: vl2g

Guess you like

Origin www.cnblogs.com/jpld/p/11267842.html
Recommended