Machine Learning 12 - Linear Model of Unsupervised Learning

1 Overview of Unsupervised Learning

We all know that monitoring data is very valuable. Generally speaking, it is easy for us to obtain data, but it is more difficult to obtain labels. Therefore, unsupervised learning is critical in machine learning. How to make good use of a large amount of unsupervised data is critical to the cold start of the business and continuous iterative operation.

Unsupervised learning is roughly divided into

  1. Turn the complexity into simplicity. Including
    1. Clustering, combining unsupervised data into clusters. The data in the cluster is similar, but the data in the cluster is not similar.
    2. Dimensionality reduction, feature extraction. Extract features for unsupervised data, such as images and text. Such as PCA, Auto-Encoder, MF
  2. Out of nothing, mainly all kinds of generative models.

This article mainly talks about unsupervised linear models. Including clustering, PCA, MF, etc.

 

2 Clustering

2.1 Cluster types

Clustering is very important in actual business, especially when the business is cold-started. It can be used for intention category mining, knowledge base production, topic mining, etc. It can also be combined with the marking data to realize the noise discovery of the marking data. There are many clustering algorithms, as follows

  1. Divide clustering k-means, k-medoids, k-modes, k-medians, kernel k-means
  2. Hierarchically class Agglomerative, divisive, BIRCH, ROCK, Chameleon, HAC
  3. Density clustering DBSCAN, OPTICS, HDBScan
  4. Grid clustering STING
  5. Model clustering GMM
  6. Graph clustering Spectral Clustering (spectral clustering)

 

2.2 Clustering algorithm steps (k-means, DBScan)

The k-means steps are:

  1. Randomly select k values ​​as the initial mean vector (cold start)
  2. Put the sample into the nearest mean vector cluster
  3. After the cluster is constructed, recalculate the mean vector
  4. Iteration second step
  5. Until the clusters of the two iterations are exactly the same, stop

DBScan steps:

First calculate all possible cores based on neighborhood parameters, minimum distance, and minimum cluster size

Select one of the cores, and remove all the samples with the density of the calculator from the core set.

Continue to perform the second step in the remaining core set

If the core set is empty, or a new cluster cannot be generated, the end

 

2.3 Cluster evaluation index

The clustering evaluation indicators are divided into the following, sklearn has implemented them, just call them directly

  1. No validation set, DBI, DB index
  2. There are validation set, rand coefficient, NMI mutual information, homogeneity, etc.

 

3 PCA principal component analysis

PCA (Principe Component Analysis) uses the idea of ​​dimensionality reduction to convert multiple indicators into a few indicators. For example, face recognition is transformed into recognizing eyes, nose, mouth, etc. This is the meaning of principal component analysis.

PCA is a linear transformation that transforms data into a new coordinate system. The first largest variance of any data projection is on the first coordinate (first principal component), and the second largest variance is on the second coordinate (first principal component). Two principal components), and so on. The idea of ​​PCA is to reduce the dimensionality while retaining the features that contribute to the data variance. These features are the main features of the data, called principal components.

The following figure shows PCA in one-dimensional space. The direction with the largest variance of the data projection is the first principal component, which is the most important feature.

image.png

The main characteristics of the data can be found through PCA. The number 7 can be found in the following figure, which is composed of three main characteristics.

image.png

The disadvantages of PCA are

  1. Unsupervised, less accurate. LDA based on supervised works better, but supervised data is needed.
  2. In the linear model, the captured features are still too shallow. Compared with Auto-Encoder, which can be based on a deep model, the feature extraction capability is much weaker.

 

4 MF matrix factorization

Matrix decomposition can also get basic components and features. Matrix decomposition can be realized by using SVD. The following are the features of handwriting recognition extracted by MF. As you can see, the basic strokes can be extracted.

image.png

 

Guess you like

Origin blog.csdn.net/u013510838/article/details/108549365