"Introduction to Data Mining" experimental class - Experiment 4, data mining KNN, Naive Bayes

Experiment 4, data mining KNN, Naive Bayes

First, the purpose of the experiment

1. Principle grasp of KNN

2. Naive Bayes grasp the principles of

3. learn to use and Navie Bayes KNN classification problem solving

Second, the experimental tool

1. Anaconda

2. sklearn

Third, the experimental introduction

1. KNN

KNN (K-Nearest Neighbor) How it works: There is a sample data set, also known as the training sample set, and each data sample set are labels exist, that we know each sample set data category corresponding relationship. After entering the no data tag, the data corresponding to the characteristic features of each new data sample concentration, and extracts the most similar feature data sample set (nearest neighbor) class label. Generally speaking, we only select the sample data set before the k most similar data, this is the k-nearest neighbor algorithm, k of the source, usually k is not an integer greater than 20. Finally, select the highest number of classified data k most similar in appearance as the classification of new data.

Description: KNN training process is not shown, it is on behalf of "lazy learning", it just saves the data down in the training phase, the training time cost is 0, and so after receiving the test samples for processing.

2. Navie Bayes

Naive Bayes classifier in the core is the Bayes rule, he is represented by the following formula:

p(c|x)= \frac{p(x|c)p(c)}{p(x)}p(cx)=p(x)p(xc)p(c)

In machine learning, naive Bayes classifier based on Bayes' theorem is a simple probabilistic classification, wherein Naive (Naive) means for each model Feature (feature) has strong independence assumptions, and No correlation between the feature will be included consideration.

Naive Bayes classifier is a relatively well-known application for classification of spam, usually text feature to identify spam text classification is more commonly used method. Naive Bayes classifier by selecting the token (usually a word in the message) to obtain the correlation between spam and non-spam, and then in order to categorize the messages by Bayes' theorem to calculate the probability.

Fourth, the experiment content

1. The use of KNN classification iris data.

(1) method call data is as follows:

from sklearn.datasets import load_iris iris = load_iris()# 从sklearn 数据集中获取鸢尾花数据。

image.png
(2) data KNN classifier

First import iris data set

image.png

Obtain and Data Partitioning

image.png

Statement training and evaluation model

image.png

Conduct sample tests

image.png

2. Use of Iris Data Modeling Navie Bayes

image.png
Output test samples in each class probability value indicia on the predicted value of response,
returns to the test sample score map (accuracy) based on the specified tag.
image.png

3. Do not use the classification sklearn, write your own KNN program (recommended python language), and iris data classification.

4. (OPTIONAL) does not use the classification sklearn, I have written Navie Bayes program (recommended python language), and iris data classification.

Five experiments summarized (write harvest this experiment, problems encountered, etc.)

The experiment in self-study explored sklearn KNeighborsClassifier GaussianNB modeling and classification. Call the package method to achieve a model of training and testing.
But knn understanding is not enough, Python language has been mastered unfamiliar, failed to achieve its own written procedures for knn iris classification, we need to enhance learning!

Guess you like

Origin www.cnblogs.com/wonker/p/11062717.html