Python data analysis 7 - data classification

1. Common classification algorithms mainly include:

(1) KNN algorithm

(2) Bayesian method

(3) Decision tree

(4) Artificial Neural Network

(5) Support Vector Machine (SVM)

2. KNN algorithm

(1) KNN application scenarios:

For example, there are a lot of snacks, a lot of electrical appliances, and a lot of clothing in the sample. Given an unknown sample, which category should the sample be classified into? You can use the KNN algorithm. Calculate the distance between the unknown sample and each known sample separately, select the first K samples with the closest distance, and classify the unknown sample into the class with more categories of the K samples.

(2) KNN algorithm implementation steps

① Calculate the distance between a point in a known class dataset and an unknown sample.

②Sort by distance in ascending order

③Select the first k points (that is, the k points with the smallest distance)

④ Determine the frequency of occurrence of the category of the first k points

⑤ Return the category with the highest frequency of occurrence of the first k points as the predicted classification of unknown samples.

import numpy as np
import operator as op
def kNN(k,datasets,labels,x):
    datasize=datasets.shape[0]
    diffMat=(tile(x,(datasize,1))-datasets)**2
    distance=(diffMat.sum(axis=1))**0.5
    sort_distance=argsort(distance)
    dic_k={}
    for i in range(k):
        dic_key=labels[sort_distance[i]]]
        dic_k[dic_key]=dic_k.get(dic_key,0)+1
    dic_count=sorted(dic_k.items(),key=operator.itemgetter(1),reverse=True)
    return dic_count[0][0]

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324883517&siteId=291194637