"Introduction to Data Mining" experimental class - Experiment 7, data mining K-means clustering algorithm

Experiment 7, data mining K-means clustering algorithm

First, the purpose of the experiment

1. Understand the basic principles of K-means clustering algorithm

2. Learn to achieve K-means algorithm in python

Second, the experimental tool

1. Anaconda

2. sklearn

3. matplotlib

Third, the experimental introduction

About 1 K-means algorithm

k-means algorithm is a clustering algorithm, so-called clustering, i.e., the similarity principle, will have a higher degree of similarity to the same class of data objects into clusters, having a higher dissimilarity data objects are categorized into different classes cluster. Clustering and classification biggest difference is that, unsupervised clustering process is the process, that is, the object data to be processed without any prior knowledge, and the classification process is supervised process, that there is a priori knowledge of the training data set.

2 K-means algorithm theory

K represents the number of clusters by k-means algorithm, the means representative of the class mean cluster data object (which is an average of the clusters by the description), therefore, it referred to as k-means algorithm and k- means algorithm . k-means clustering algorithm is an algorithm based on the division, as the distance between the standard similarity measure data object, i.e. the distance between data objects is smaller, the higher their similarities, they are more likely the same a class cluster. Intermediate data object has a variety of distance calculation, k-means algorithm is usually used to calculate the Euclidean distance between the distance data objects

Fourth, the experiment content

1. Random number generation 100, and 100 numbers for which k-mean clustering (k = 3,4,5,6) (and draw with matplot)

1) Create a random sample of 100 two-dimensional data as a training set

image.png

2) k = 3 cluster

image.png

3) k = 4 cluster

image.png

4) k = 5 clustering

image.png

5) k = 6 cluster, and the cluster distribution observed

image.png

2. iris data of K-means clustering algorithm (and drawing with matplot).

image.png

Five experiments summarized (write harvest this experiment, problems encountered, etc.)

Through this experimental learning and operation, I mastered the basic principles of KMeans algorithms, as well as easy to use sklearn cluster construction method. And using matplot drawing, visual image seen in the different clusters K = 3,4,5 time distribution.

Guess you like

Origin www.cnblogs.com/wonker/p/11079333.html