The essential difference between classification and clustering - machine learning

There are two common big problems in machine learning, one is classification and the other is clustering

Comparison of classification and clustering

  • Cluster analysis is the study of how to divide samples into several classes without training .
  • In classification , it is known which classes exist , that is, which classes exist in the target database are known, and what needs to be done is to mark which class each record belongs to .
  • The problem that clustering needs to solve is to aggregate a number of given unlabeled patterns to make them meaningful clusters. Clustering is to hope to combine all records without knowing how many classes there are in the target database in advance. Forms different classes or clusters such that, in this case of classification, the similarity, based on some measure (eg distance), is minimized between the same clusters and maximized between different clusters change.
  • Unlike classification, unsupervised learning does not rely on pre-defined classes or training instances with class labels, and the labels need to be automatically determined by the clustering learning algorithm, while the instances or data samples of classification learning have class labels.

Classification

There are several ways to classify classification, but the meaning is the same.

  • Classification: The classification task is to obtain an objective function f by learning, and map each attribute set x to a predefined class label y.

  • Classification is to train a certain learning machine (that is, to obtain a certain objective function) according to some given samples of known class labels, so that it can classify samples of unknown classes. This is supervised learning.

  • Classification: Obtain the relationship between sample attributes and class labels through learning. 
    In our own words, we get the classification model (that is, get the function between the sample attribute and the class label) based on some known samples (including attributes and class labels), and then use this objective function to classify attributes that only contain attributes. Classify sample data.

Limitations of Classification Algorithms

As a supervised learning method, classification requires that the information of each category must be clearly known in advance, and it is asserted that all items to be classified have a corresponding category. However, in many cases, the above conditions are not met, especially when dealing with massive data. If the data meets the requirements of the classification algorithm through preprocessing, the cost is very high. At this time, the clustering algorithm can be considered.

clustering

Some concepts related to clustering are as follows

  • Clustering means that we do not know the category label of any sample in advance. We hope to divide a group of unknown categories of samples into several categories through some algorithm. When clustering, we do not care what a certain category is. We need to realize The goal is just to bring similar things together, which is called unsupervised learning in machine learning
  • Usually, people define clustering according to a certain distance or similarity between samples, that is, similar (or close) samples are grouped into the same class, and dissimilar (or distant) samples are classified into other classes .
  • The goal of clustering: objects within a group are similar to each other (related), while objects in different groups are different (unrelated). The greater the similarity within groups and the greater the differences between groups, the better the clustering.


to explain the content

Because two algorithms are being researched recently, it is just used to talk about different algorithms for classification and clustering. 
One of the differences between SVM and the bisection K-means algorithm: Support Vector Machine (SVM) is a classification algorithm, and the bisection K-means algorithm belongs to a clustering algorithm.

There is a sentence on page 306 of the book "Introduction to Data Mining (Complete Edition)": Clustering can be seen as a classification, which uses class labels to create labels for objects, but these labels can only be derived from the data. In contrast, the aforementioned classification is supervised classification: that is, a model developed using objects whose class labels are known, assigning class labels to new, unlabeled objects. For this reason, cluster analysis is sometimes called unsupervised classification. In data mining, when the term classification is used without any strings attached, it usually refers to supervised classification.

Therefore, one of the differences between SVM and the bisection K-means algorithm is that the support vector machine (SVM) is a supervised classification algorithm, and the bisection K-means algorithm belongs to an unsupervised classification algorithm .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325368103&siteId=291194637