Machine Learning: KNN algorithm implemented in Python

  KNN (K-Nearest Neighbor) K nearest neighbor, K nearest neighbor is the k nearest neighbors meant to say is it can be used for each sample k nearest neighbors to represent. KNN algorithms for supervised learning classification model to predict the result is discrete machine learning algorithms.

  KNN algorithm works:

  1, the distance is calculated for each test data and training data for each (acquaintance degree);

  2, according to ascending order of distance, sorts the data for the training set;

  3, obtaining the k nearest neighbors acquired the mode (take one line) of the k neighbors;

  4, predict the results of tests on samples taken all the target numbers.

  

  KNN algorithm Python: Handwritten Digital Case

  0,1 handwritten digital constituted as txt file, first we will txt file reading processing, expanded into 32 * 32 feature values ​​1024, txt file name (7_151.txt) moiety is a target value.

  So that we can form the training set, test set.

  Handwritten numeral txt file is read, and are stored in the form of numpy.ndarray train.npy, test.npy.

import numpy as np
import os


def build_data(file_path):
    """
    加载手写数字数据,转为numpy.ndarray
    :param file_path: 文件路径
    :return: data_arr 保存后的np.narray
    """
    file_name_list = os.listdir(file_path)
    data_arr = np.zeros(shape=(len(file_name_list), 1025))
    for file_index, file_name in enumerate(file_name_list):
        file_content = np.loadtxt(file_path + '/' + file_name, dtype=np.str)
        file_str = ''.join(file_content)
        data_arr[file_index, : 1024] = [int(s) for s in file_str]
        data_arr[file_index, -1] = int(file_name.split('_', 1)[0])
    return data_arr


def save_file(file_path, data):
    """ 
    Save the training set, the test data set
    :param file_path: 文件路径
    :param data: 要保存的数据
    :return: None
    """
    if not os.path.exists('./data'):
        os.makedirs('./data')
    np.save('./data/' + file_path, data)
    print("{}保存成功!".format('./data/' + file_path + '.npy'))


def main():
    train = build_data('./trainingDigits')
    test = build_data('./testDigits')

    print(train)
    print(train.shape)
    print(test)
    print(test.shape)
    save_file('train', train)
    save_file('test', test)


if __name__ == '__main__':
    main()

  

  Download Data

 

  Data set, the test set, as a characteristic value before 1024, as a final target value. Knn algorithm to achieve

 

  score_list k_list accuracy corresponding to the different values ​​of k, different accuracy. The results show

  KNN algorithm advantages:

  1, a very simple classification algorithm is not one, user-friendly, easy to understand, easy to implement;

  2, suitable for handling multi-classification problems, such as recommending users;

  3, can be used for numerical data and discrete data may be used for classification it may also be used for return;

  4, is not sensitive to outliers.

  KNN algorithm drawbacks:

  1, the time complexity, the spatial complexity is high;

  2, k values ​​are different, the prediction result will be different;

  3, inert study.

 

 

 

 

Guess you like

Origin www.cnblogs.com/bestwishfang/p/11869993.html