Machine learning day13 unsupervised learning

Unsupervised learning

Input a large amount of feature data to the machine, and expect the machine to find the common features or structure in the data through learning, or the association between the data. For example, video websites are grouped according to users' viewing behavior and adopt different recommendation strategies.
This kind of unsupervised learning problem does not hope to predict a certain output result like supervised learning.
Unsupervised learning includes two major types of learning methods, data clustering and feature variable relationships. The input data has no label information.
Clustering algorithms often find the optimal segmentation of data through multiple iterations, and feature variable association uses various correlation analysis methods to find the relationship between variables.

K-means clustering

SVM, LR, and DT machine learning algorithms are mainly used for classification problems. According to the samples that have been classified, the classifier is trained, and then the unknown samples are classified.
The clustering problem is different. Clustering is divided into several categories based on the relationship between the data without knowing the sample label in advance.
Classification problems are supervised learning, and clustering is unsupervised learning. K-means clustering is the most basic and most commonly used clustering algorithm. The idea is to find a division scheme of K clusters by iterative method, so that the cost function corresponding to the clustering result is the smallest. For example, here, the cost function can be defined as the sum of squared errors of each sample from the cluster center point: image.png

which represents the i-th A sample is the cluster to which it belongs, representing the center point corresponding to the cluster, and M is the total number of samples.


Guess you like

Origin blog.51cto.com/15069488/2578587