Summary on Implementing the k-Nearest Neighbor Algorithm

When to use the KNN algorithm?

Answer: The KNN algorithm can be used for both classification and regression prediction. However, it is mainly used in classification problems in the industry. When evaluating an algorithm, we usually start from the following three perspectives: 1. Model interpretability 2. Computation time 3. Predictive ability   

What is the principle of the KNN algorithm?        

We know the correspondence between each data in the sample set and the category to which it belongs.

How to choose factor K?

The first thing to understand is what effect K has in the algorithm. In the previous case, assuming that there are only 6 training data in total, given the value of K, we can divide the boundaries of the two classes. Now let's see the difference in the boundaries of the two classes for different K values.

   

Looking closely, we can see that as the value of K increases, the boundary becomes smoother. As the K value tends to infinity, the classification regions eventually turn all blue or red, depending on whether the blue or red dots dominate.

We can implement the KNN model with the following steps:

  • Download Data.

  • Default K value.

  • Iterate over the data points in the training set to make predictions.

STEPS:

  • Calculate the distance between the test data and each training data. We choose the most commonly used Euclidean distance as the metric. Other metrics are Chebyshev distance, cosine similarity, etc.

  • According to the calculated distance value, sort in ascending order

  • Get top k points from sorted array

  • Get the most frequent category of these points

  • get predicted class

We will use the popular Iris dataset to build the KNN model. You can download it from here (dataset link:

https://gist.githubusercontent.com/gurchetan1000/ec90a0a8004927e57c24b20a6f8c8d35/raw/fcd83b35021a4c1d7f1f1d5dc83c07c8ffc0d3e2/iris.csv)

Copy the data and paste it into the text file. Renamed to: iris.csv (this is the Excel file)

Found after execution:

As you can see, both models predict the same class (“irisi –virginica”) and the same nearest neighbors ([141 139 120]). So we can conclude that the model is behaving as expected.

    Original link: https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/                                                                                                                                                          

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324686847&siteId=291194637