1. Class practice
# Workshop from sklearn.datasets Import load_iris # Import data iris iris = load_iris () iris iris.keys () Data = iris [ ' Data ' ] # iris data target = iris.target # labels, which flower belonging iris .feature_names # feature name: calyx length, width sepals, petals length, width petal # 'sepal length (cm & lt)', 'sepal width (cm & lt)', 'petal length (cm & lt)', 'petal width (cm & lt)'
2. Homework
1). The k-means clustering process of manual playing of playing cards:> 30 cards, 3 types
First classification | First Class Center | 1 | 8 | 13 |
sum | 18 | 127 | 86 | |
mean | 18/8 | 127/18 | 86/7 | |
Second classification | Second Class Center | 2.25 | 7.05 | 12.28 |
sum | 18 | 107 | 106 | |
mean | 18/8 | 107/16 | 106/9 | |
The third classification | The third new center | 2.25 | 6.68 | 11.77 |
sum | 18 | 107 | 106 | |
mean | 18/8 | 107/16 | 106/9 | |
Clustering center | center | 2.25 | 6.68 | 11.77 |
2). * K-means algorithm is independently written, clustering is performed on the iris petal length data, and displayed with a scatterplot. (Plus points)
3). Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot.
from sklearn.datasets Import load_iris from sklearn.cluster Import KMeans Import matplotlib.pyplot AS PLT # Import iris data IRIS = load_iris () Data = IRIS [ ' Data ' ] # iris data Petal Data = [:, 2] # Petals length data # # n rows into one, meaning any row -1 X_petal petal.reshape = (-1,1 ) MODEL1 = KMeans (= n_clusters. 3) # build a model, the number of cluster centers. 3 MODEL1. Fit (X_petal) # model training Y_petal = model1.predict (X_petal) # after training the model, according to the predicted length of petal classification # C is a color-coded, cmap color is provided # X axis petals data, y-axis is classified iris plt.rcParams [ ' font.sans serif- ' ] = [ ' SimHei ' ] # for normal display tag Chinese plt .scatter (X_petal [:, 0], Y_petal, c = Y_petal, cmap = " rainbow " ) plt.xlabel ( " petal length (cm) " ) plt.ylabel ( " iris classification " ) plt.yticks (range ( 3), labels = [ ' setosa ' , ' versicolor ' , ' virginica ' ])
4). Complete data of iris flowers are clustered and displayed with a scatterplot.
from sklearn.datasets Import load_iris from sklearn.cluster Import KMeans Import matplotlib.pyplot AS PLT # Import iris data IRIS = load_iris () X_iris = iris.data # Iris complete data Model = KMeans (= n_clusters. 3) # build a model, The number of cluster centers. 3 model.fit (X_iris) # model training Y_iris = model.predict (X_iris) # after training the model, all the data of the predicted classification plt.scatter (X_iris [:, 2] , X_iris [:, 3 ], c = Y_iris, cmap = " rainbow " ) plt.xlabel ( " petal length (cm) " ) plt.ylabel (" Petal Width (cm) " )
5). Think about what is used in the k-means algorithm?
The K-means algorithm is a clustering algorithm that can classify data without labels.
In actual life, it can help market segments, and customers can be divided into different market segment groups for marketing and service;
Or you can perform social network analysis and observe the interaction between people, so as to find a group of people who are related to each other.