【Introduction to KNN Algorithm】

Introduction to KNN Algorithm

The KNN algorithm is an instance-based learning method that classifies or regresses new data by finding the nearest K neighbors. The main idea of ​​the algorithm is: if most of the K nearest neighbors of a sample belong to a certain category, then the new sample is likely to belong to that category. In classification tasks, KNN uses majority voting to make predictions, while in regression tasks, it uses the average of the K nearest neighbors as the predicted value.

KNN algorithm steps

  1. Calculate the distance: For a given training dataset and a new sample, first calculate the distance between the new sample and each sample in the training set. Commonly used distance measurement methods include Euclidean distance and Manhattan distance.

  2. Determine the K value: Choose an appropriate K value, which will affect the performance of the algorithm. Smaller values ​​of K can make predictions more noise-tolerant, but may lead to overfitting; larger values ​​of K can reduce noise effects, but may ignore local differences between categories.

  3. Determine neighbors: Select the K samples closest to the new sample as neighbors.

  4. Classification or regression: For classification tasks, the category of new samples is determined by majority voting; for regression tasks, the predicted value of new samples is obtained by averaging.

KNN combat project

Project Title: Handwritten Digit Recognition

In this project, we will use KNN algorithm to recognize handwritten digits. We will use the MNIST dataset, which contains a large number of images of handwritten digits, each with a corresponding label. We will first convert the image into a feature vector and then use the KNN algorithm to classify it.

Code example:

# 导入必要的库
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# 加载MNIST数据集
mnist = fetch_openml('mnist_784')
X, y = mnist['data'], mnist['target']

# 将像素值缩放到0到1之间
X = X / 255.0

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 初始化KNN分类器
knn = KNeighborsClassifier(n_neighbors=5)

# 在训练集上拟合模型
knn.fit(X_train, y_train)

# 在测试集上进行预测
y_pred = knn.predict(X_test)

# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print("准确率:", accuracy)

In this example, we use scikit-learnthe library to implement the KNN algorithm and perform handwritten digit recognition on the MNIST dataset.

Guess you like

Origin blog.csdn.net/qq_66726657/article/details/131926392