1. The steps of machine learning
Data, model selection, training, testing, prediction
2. Install the machine learning library sklearn
pip list view version
python -m pip install --upgrade pip
pip install -U scikit-learn
pip uninstall sklearn
pip uninstall numpy
pip uninstall scipy
pip install scipy
pip install numpy
pip install sklearn
https://scikit-learn.org/stable/install.html
2. Import sklearn's dataset
from sklearn.datasets import load_iris
iris = load_iris()
iris.keys()
X = iris.data # Get its feature vector
y = iris.target # Get sample label
iris.feature_names # feature names
3.K Mean Algorithm
K-means is an iterative process. The algorithm is divided into four steps:
(x,k,y)
1) Select K objects in the data space as the initial center, and each object represents a cluster center;
def initcenter(x, k): kc
2) For the data objects in the sample, according to their Euclidean distances from these clustering centers, they are divided into the classes corresponding to the clustering centers (most similar) closest to them according to the criterion of the closest distance;
def nearest(kc, x[i]): j
def xclassify(x, y, kc):y[i]=j
3) Update the clustering center: take the mean of all objects in each category as the clustering center of the category and calculate the value of the objective function;
def kcmean(x, y, kc, k):
4) Determine whether the values of the clustering center and the objective function have changed. If it does not change, the result will be output. If it changes, return 2).
while flag:
y = xclassify(x, y, kc)
kc, flag = kcmean(x, y, kc, k)
Refer to the official documentation:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
4. Homework:
1). The k-means clustering process of manual playing of playing cards:> 30 cards, 3 types
2). * K-means algorithm is independently written, clustering is performed on the iris petal length data, and displayed with a scatterplot. (Plus points)
3). Use sklearn.cluster.KMeans and iris petal length data for clustering and display with scatter plot.
4). Complete data of iris flowers are clustered and displayed with a scatterplot.
5). Think about what is used in the k-means algorithm?
Answer: Classify something, plan integrated processing, etc.!