Data Mining classical algorithm

Copyright: Department of CDA Data Analyst original works, Reprinted with authorization https://blog.csdn.net/yoggieCDA/article/details/90781251


Data Analysis Data Mining involves a lot of algorithms, in order to do that we need to learn data analysis algorithms. In data mining algorithms, each algorithm has its own advantages in the field of data mining they have had a more profound impact. Then we know that the classic data mining algorithms are what it? Here we give you introduce one by one.

1.K-Means algorithm

The k-means algorithm algorithm, the algorithm is K-Means, k-means algorithm is a clustering algorithm, the object is divided into n k divided according to their attributes, k greater than n. It is very similar to the expectation-maximization algorithm to handle mixed normal distribution, because they are trying to find natural clusters in the data center. It is assumed that the object properties from the space vector, and the goal is that the inside of each group the sum of the minimum mean square error. This algorithm is very common in data mining algorithms.

2. SVM

The Support vector machines is to support vector machines, referred to as the SV machine (paper generally referred to as SVM). It is a method of supervised learning, this method is widely used in statistical classification and regression analysis. SVM vector will be mapped to a higher dimensional space, the establishment of a maximum interval hyperplane in this space. In hyperplane parted data has two mutually parallel hyperplanes. Hyperplane partition distance of the two hyperplanes parallel maximized. Assumed that the greater the distance or gap between the parallel hyperplanes, the smaller the total error of the classification. These advantages also achievements of this algorithm.

3kc4k5

C4.5 decision tree algorithm is a classification algorithm machine learning algorithm, C4.5 algorithm inherits the advantages of ID3 algorithm and ID3 algorithm is improved, this improvement embodied in four aspects, the first is in pruning the tree construction process, the second is the ability to complete the processing of the discrete continuous attributes, the third is to use attribute information gain ratio is selected to overcome the bias of the selected attribute values ​​of multiple attribute information for selecting insufficient gain, The fourth is the ability to process incomplete data. So what is the advantage of this algorithm is that it? The advantage is generated classification rules easier to understand, high accuracy rate. The disadvantage is that: during the construction of the tree, it is necessary to set the data sequentially scanning and sorting a plurality of times, resulting in inefficient algorithm.

In this article we tell you about the three data mining algorithms, namely The k-means algorithm algorithm, Support vector machines, C4.5 algorithms, data mining are very common and very important I hope we can focus on learning these algorithms, I hope this article will help you to understand better data mining.

Guess you like

Origin blog.csdn.net/yoggieCDA/article/details/90781251