Artificial Intelligence_04

Unsupervised Learning

A method of machine learning that automatically classifies or clusters input data given no previously labeled training examples.

advantage:
  • Algorithms are not constrained by supervisory information (bias), may take new information into account

  • No label data is required, greatly expanding the data sample

main application:

Clustering analysis (clustering, the most widely used), association rules, dimensionality reduction

Cluster analysis

Also known as group analysis, objects are automatically divided into different categories according to the similarity of certain attributes of objects.

KMeans clustering:

  • Classify according to the distance between the data and the center point

  • Update center point based on category data

  • Repeat the process until convergence

Features:

1. Simple implementation and fast convergence

2. Need to specify the number of categories

Mean shift clustering (Meanshift):

  • Retrieve data points in a certain area of ​​the center point

  • update center

  • Repeat the process until the center point is stable

Features:

1. Automatically discover the number of categories without manual selection

2. You need to select the radius of the area

DBSCAN algorithm (density-based spatial clustering algorithm):

  • Filter valid data based on area point density
  • Expand to the periphery based on valid data until no new points are added
Features:

1. Filter noise data

2. No need to artificially select the number of categories

3. Different data densities affect the results at the same time

K nearest neighbor classification (KNN, one of the simplest machine learning algorithms, belongs to supervised learning):

Given a training data set, for a new input instance, find the K instances closest to the instance in the training data set (that is, the K neighbors mentioned above), most of these K instances belong to a certain class, then Classify the input instance into this class.

K-means clustering (KMeans Analysis):

Clustering with k points in the space as the center and classifying the objects closest to them is the most basic but also the most important algorithm in clustering algorithms.

Formula:
The distance between the data point and the center point of each cluster: dist ( xi , uit ) The distance between the data point and the center point of each cluster: dist(x_i,u_i^t)The distance between the data point and the center point of each cluster: d i s t ( xi,uit)

Classification by distance: xi ∈ unearstt Classification by distance: x_i\in u^t_{nearst}Sort by distance: xiun e a rs tt

Center update: ujt + 1 = 1 k ∑ xi ∈ S j ( xi ) Center update: u_j^{t+1}=\frac{1}{k}\sum_{x_i\in S_j}{(x_i)}hub update: ujt+1=k1xiSj(xi)

S j S_j Sj t t jjat time tj area clusters;kkk : contained inS j S_jSjthe number of points in the range

x i x_i xi: contained in S j S_jSjpoint in range; ujt u_j^tujt: for the ttAt time t , the jjthcenter of region j

Algorithm flow:

1. Select the number of clusters kkk

2. Determine the cluster center

3. According to the points, find the cluster center cluster to determine the category to which each point belongs

4. Update the clustering center according to the data of each category

5. Repeat the above steps until convergence (the center point will not change)

advantage:

  • The principle is simple, the implementation is easy, and the convergence speed is fast
  • Few parameters, easy to use

shortcoming:

  • The number of clusters must be set
  • Randomly select the initial cluster centers, the results may lack consistency

mean shift clustering

A clustering algorithm based on density gradient ascent (finding cluster centers along the direction of density increase)

Formula:
Mean Shift: M ( x ) = 1 k ∑ xi ∈ S h ( u − xi ) Mean Shift: M(x)=\frac{1}{k}\sum_{x_i\in S_h}{( u-x_i)}Mean shift: M ( x )=k1xiSh(uxi)

Center update: ut + 1 = M t + ut Center update: u^{t+1} = M^t+u^thub update: ut+1=Mt+ut

S h S_h Sh: uuu is the center point and the radius ishhThe high-dimensional spherical area of ​​​​h ; kkk : included inS h S_hShthe number of points in the range

x i x_i xi: contained in S h S_hShpoints in range; M t M^tMt : forttThe drift mean obtained in the t state; utu^tut : for thettthThe center of the state at time t

Algorithm flow:

1. Randomly select an unclassified point as the center point

2. Find the point that is within the bandwidth from the center point, and record it as the set SSS

3. Calculate from the center point to the set SSOffset vectorMM for each element in SM

4. The center point takes the vector MMMMobile _

5. Repeat steps 2-4 until convergence

6. Repeat 1-5 until all points are classified

7. Classification: According to each class, the visit frequency of each point, the class with the highest visit frequency is used as the class of the current point set

Guess you like

Origin blog.csdn.net/qq_45104014/article/details/129298051