The basic algorithm large data processing

Courses Address: http: //www.auto-mooc.com/mooc/detail mooc_id = BA91C867A68E92651FBF224828ECAE6E & major_id = E1007D8658541BD264785AA3709ADA25?

This is a note!

1.0 Data basic algorithm

1.1 clustering algorithm

Class: similar set of elements.

classification

Good categories defined in advance, the number of categories is fixed; label formation according to certain criteria, and then classified according to distinguish tags.

Clustering

There is no prior booking categories, the number of classes uncertain. Without manual annotation and clustering classifier is trained beforehand, category re-clustering process generated automatically.
Here Insert Picture Description

K-means clustering algorithm

K-means clustering algorithm.
Here Insert Picture Description
Here Insert Picture Description
Step: 1, first randomly determined centroid, FIG b; 2, the samples is calculated the distance to the centroid; 3, the sample cluster, FIG c; 4, recalculation of the cluster, each centroid, FIG d; 5, implementation of 2-step cycle.

Here Insert Picture Description

SOM clustering

Here Insert Picture Description
Here Insert Picture Description

KNN with K-means the difference

Reference: https: //www.tuicool.com/articles/qamYZv

Here Insert Picture Description
KNN algorithm process is like this:

We can see from the above figure, the figure of the data set is good data that are playing the label, one is blue square, one is red triangle, green circle that we are to be classified The data.

If K = 3, then the nearest triangle there are two red and one blue square from the green point, three-point vote, so green this point to be classified belongs to a red triangle.

If K = 5, then there is from the nearest point two red triangles green and three blue squares, this five-point vote, so green this point to be classified belong to the blue square.

We can see, KNN is essentially a method of statistical data based on ! In fact, many machine learning algorithm is based on statistical data.
Here Insert Picture Description

Cluster performance metrics

Here Insert Picture Description
Here Insert Picture Description

Distance calculation:

Here Insert Picture Description

Mahalanobis distance (what ??? clustering radar is to be made will learn about)

1.2 dimensionality reduction algorithm

Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description

Covariance matrix? ? ? (When will learn about)

Here Insert Picture Description
Here Insert Picture Description

1.3 regression algorithm

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description

Published 30 original articles · won praise 3 · views 10000 +

Guess you like

Origin blog.csdn.net/djfjkj52/article/details/104307351