Classification
Classification is a supervised learning a core issue, which is learning from a data classification decision function or classification models (classifiers), new input and output predicted output variables taking a finite number of discrete values.
 Including binary classification / multiclassification problem.
 Algorithms: decision trees, Bayesian, SVM SVM, logistic regression
A decision tree
 Tree: the tree structure , each nonleaf node represents a characteristic property, wherein each branch edges represent the attributes on a certain range of output, each leaf node storing a category.
 Decision process: starting from the root, to be tested corresponding characteristic properties classifiers and outputs the selected value according to the branch, until it reaches the leaf node, the leaf node is stored as a category decision result.
 Build decision tree:
 Feature selection : Select features with the ability to classify the training data;
 Entropy: uncertainty.
 Information gain: information entropy (front)  entropy (after)
 Information gain ratio: * penalty parameter information gain. When large number of features, smaller penalty parameter.
 Gini index: represents a collection of uncertainty, the greater the higher the Gini coefficient of inequality.
 Decision Trees : feature selection method according to a certain point on the respective decision tree, the decision tree constructed recursively;
 Tree pruning : cut some of the subtree or a leaf node in the tree has been generated, thus simplifying the classification tree models, makes the model generalization ability.
 Ideal tree: the leaf nodes at least, the minimum depth of the leaf node, leaf nodes and at least a minimum depth.
 Prepruning: build in advance by stopping trees and tree pruning, once stopped, the node is a leaf, the leaves hold subset of the most frequent category.
 After the pruning process (common): First, a complete tree structure, and then use those leaf nodes instead of confidence enough node in the subtree, with the class label of the leaf node subtree of the most frequent class labels.
 Feature selection : Select features with the ability to classify the training data;
 Core Algorithm:  model tree structure, wherein the coefficient selection
 ID3 algorithm: classification, multibranch trees, information gain
 C4.5 algorithms: classification, multibranch trees, information gain ratio
 CARD algorithms: classification, regression, binary tree, the Gini coefficient, mean square error
Two Bayesian
Bayesian classification is a classification method and Bayes theorem based on conditional independence property characteristics. The core is Bayesian formula:
[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (imgF9ZI63hS1583900698217) (file: /// C: \ Users \ SHERRY ~ 1 \ AppData \ Local \ Temp \ ksohtml564 \ wps1.png)]

calculation process

 Calculated prior probability;
 Conditional probability is calculated for each attribute;
 After calculating the posterior probability.

Laplace correction: prior probability, conditional probability

advantage:
 Algorithm logic is simple, easy to implement;
 The classification process in a small space and time overhead.

Disadvantages:
In theory, naive Bayes classification model compared with other methods has the smallest error rate, but in fact is not always the case. This is because Naive Bayes model assumptions are independent properties , this assumption is often not true in practical applications, in between the number of properties or attributes more relevance great , the good classification results.
Three support vector machine (SVM) classification
 SVM is a supervised learning method, the main idea is to create a optimal decision hyperplane , so that both sides of the plane from the distance between the plane of the two types of samples recently maximize, thus providing a good panclassification problem capabilities.
 Increasing the penalty coefficient and form a soft spacer, for the case of not completely linearly separable.
 By increasing the dimension of the kernel function mapping, which can be divided. Kernel linear, polynomial, Gaussian, hybrid kernels.
 advantage:
 Compared with other classification algorithms trained without undue samples , and since the introduction of SVM kernel function, it is possible to handle high dimensional samples;
 Structural risk minimization. This risk is cumulative error between the approximate classifier real model of the problem and the real solution to the problem;
 Nonlinear means SVM good at dealing with the case sample data linearly inseparable, mainly through the slack variable (also called punishment variable), and kernel technology, it is also the essence of this part of the SVM.
Four logistic regression (Logistic)

Logistic regression is a classification algorithm that can handle binary classification and multivariate classification. Respectively, using two / multiple logistic regression model.

First logistic regression configured generalized linear regression function , and then use the sigmoid function g (z) to a discrete value mapped back category.
sigmoid function: growth speed near the center, both ends of the slower growth, ranging between 01.

Select loss function: loglikelihood function loss . Decreased rapidly, it is a special case of maximum entropy model.

Maximum Entropy Model : keep all the uncertainty, it will reduce the risk to a minimum. When a probability distribution of random events to predict, it should meet all the known conditions, for unknown circumstances do not make any subjective assumptions.
Features:
 Formally simple and beautiful;
 The only effect is to not only meet the various information sources but also to ensure smoothness of the limitations of the model;
 A huge amount of computing, implementation of the project determines the quality of the practical model or not.
Five integrated learning

Definition: by the plurality of weak classifiers integrated together so that together they complete learning tasks, build a strong classifier .

Two types of methods:
 On Bagging (on Bootstrap aggregating) : based on data random resampling method for constructing classifiers
 Taken with replacement data set obtained by sampling the N
 Learning a model on each data set
 Output voting model using N final prediction result obtained
 Boosting (lift method) : Based on the error boosted classifier performance, error samples are classified by focusing on existing classifier, to build a new classifier.
 The initial distribution should be equal probability distribution;
 Improve the probability distribution of sample error after each cycle, misclassified sample weight increase in the share of the training set right, so that the next cycle of base classifiers can focus on these samples to determine the error;
 Classifier weights calculated weight of the higher recognition rate higher group classification weights.
 On Bagging (on Bootstrap aggregating) : based on data random resampling method for constructing classifiers

difference
 Bagging unrelated each training set, i.e. each base classifier unrelated, Boosting the training set and to be adjusted on a result of the, or that it can not parallel computing.
 Bagging the prediction function is uniformly equal, but the prediction function is a weighted Boosting

Advantages: the most advanced forecasting algorithms are used almost integrated it more accurate than the results predicted using a single model, it has been widely used in major competitions.

Cons: requires a lot of maintenance work
to predict Boosting the function is a weighted 
Advantages: the most advanced forecasting algorithms are used almost integrated it more accurate than the results predicted using a single model, it has been widely used in major competitions.

Cons: it requires a lot of maintenance work

On behalf of algorithms: Random Forests, Adboost