Supervised Learning Methods

Here Insert Picture Description

1. Application issues

Supervised learning: learning a model, it can be the appropriate output for a given input prediction. Including classification, labeling and return.

  • Classification: Examples from the feature vector to the prediction problem of class labels
  • Dimensioning: marker sequence to the sequence from the observation (or state series) in forecasting problems.

感知机、k近邻法、朴素贝叶斯法、决策树It is a simple classification, the model with an intuitive, simple, easy to implement, etc.

逻辑斯谛回归、最大熵模型、支持向量机、提升方法Is more complex but more effective classification method, often higher classification accuracy

隐马尔可夫模型、条件随机场It is the main tagging method . Conditions are usually marked with a higher accuracy rate airport

2. Model

2.1 probability model, non-probabilistic model

Prediction model can be written as conditional probability distribution P ( Y X ) P(Y|X) or decision-making functions Y = f ( X ) Y=f(X) form.

  • 朴素贝叶斯法、隐马尔可夫模型 It is the probability model
  • 感知机、k近邻法、支持向量机、提升方法 Non-probabilistic model
  • 决策树、逻辑斯谛回归、最大熵模型、条件随机场Both can be seen as a probability model, and can be seen as a non-probabilistic model

2.2 determination method, generation method

Learndirect conditional probability distribution P ( Y X ) P(Y|X) or decision-making functions Y = f ( X ) Y=f(X) a method for thedetermination method

  • Model corresponds to the discriminant model:感知机、k近邻法、决策树、逻辑斯谛回归、最大熵模型、支持向量机、提升方法、条件随机场

First, learn the joint probability distribution P ( X , Y ) P(X,Y) , so as to obtain the conditional probability distribution P ( Y X ) P(Y|X) method isgenerating method

  • Model is generated corresponding to the model:朴素贝叶斯法、隐马尔可夫模型

2.3 Spatial

Decision tree is defined in a general feature space may contain a continuous or discrete variable variable

Perceptron, support vector machines, k-nearest neighbor feature space is the Euclidean space (generally, Hilbert space)

A method to enhance the model is a linear combination of weak classifiers, feature space is weak classifier model-lift feature space


2.4 linear, non-linear model

Perceptron model is a 线性model of
logistic regression, maximum entropy model, CRFs are 对数线性model
k-nearest neighbor, decision tree, support vector machines (including kernel), to enhance the method is 非线性the model

3. Learning Strategies

In supervised learning classification in the second category, 支持向量机、逻辑斯谛回归&最大熵模型、提升方法
their use 合页损失函数、逻辑斯谛损失函数、指数损失函数, are written as:

[ 1 Y f ( x ) ] + [1-y f(x)]_{+}

log [ 1 + exp ( Y f ( x ) ) ] \log [1+\exp (-y f(x))]

exp ( Y f ( x ) ) \exp (-y f(x))

Upper bound of these three functions are lost 0-1 loss function, have a similar shape

Here Insert Picture Description

It can be considered support vector machines, logistic regression & maximum entropy model, enhance loss methods use different proxy loss function (surrogateloas Punotion) indicate classification, the definition of empirical risk or structural risk function, to achieve second-class classification learning task.

Learning strategy is to optimize the structure of the hazard function:

me f H 1 N i = 1 N L ( y i , f ( x i ) ) + λ J ( f ) \min _{f \in H} \frac{1}{N} \sum_{i=1}^{N} L\left(y_{i}, f\left(x_{i}\right)\right)+\lambda J(f)

Item 1 is empirical risk (loss experience), the second term is a regularization term , L ( y , f ( x ) ) L(y,f(x)) of the loss function, J ( f ) J(f) is the complexity of the model, λ 0 \ Lambda \ geq 0 coefficients.

  • Support Vector Machine L 2 L_2 Norm represent the complexity of the model
  • The original logistic regression model with no maximum entropy regularization term, can add to them L 2 L_2 Norm regularization term
  • Lifting method is not explicit regularization term, by stopping early (early stopping) way to achieve the effect of regularization

Probabilistic model of learning can be formalized as maximum likelihood estimation or Bayesian estimation of the maximum a posteriori probability estimates

Learning strategy is to minimize the likelihood of loss or minimum number of regularized log-likelihood loss

The log-likelihood losses can be written as: log P ( y x ) -\log P(y|x)

When the maximum a posteriori probability estimation, regularization term is the prior probability of the negative logarithm


Decision tree learning strategy is regularized maximum likelihood estimation, the loss of the log-likelihood function is lost, the regularization term is the complexity of the decision tree

& Logistic regression model maximum entropy, conditional random learning strategy can be seen as either maximum likelihood estimation (or regularization of maximum likelihood estimation), and can be seen as minimizing the logistic loss ( or regularization of logistic loss)

Naive Bayes model, hidden Markov model of unsupervised learning is the maximum likelihood estimation or maximum a posteriori probability estimates, but this time the models with hidden variables

4. Learning Algorithm

Statistical learning problems have concrete form later, it becomes an optimization problem

  • Naive Bayesian method, supervised learning hidden Markov model, the optimal solution that is the maximum likelihood estimate value can be calculated directly from the probability calculation formula.

  • Perceptron, logistic regression model & maximum entropy, with the condition learn to use the airport gradient descent method , quasi-Newton method solution of optimization problems in general unconfined

  • Support vector machine learning, can solve convex quadratic programming of the dual problem. Sequence minimal optimization algorithm Method

  • Decision tree learning is a typical example based on heuristic algorithms. Feature selection can be considered to generate, heuristic pruning is performed maximum likelihood estimation regularization.

  • Methods to enhance the learning model is additive model , loss of function is a characteristic exponential loss function, heuristically learning model from front to back and gradually, in order to achieve the purpose of optimizing the objective function approximation

  • EM algorithm is an iterative solver with Hidden Variables probabilistic model parameter method, which convergence can be guaranteed, but not guaranteed to converge to the global optimum

  • SVM, logistic regression model & maximum entropy, conditional random learning is a convex optimization problem , the global optimum solution to ensure that there is . And other learning problems are not convex optimization problem

Published 816 original articles · won praise 1667 · Views 400,000 +

Guess you like

Origin blog.csdn.net/qq_21201267/article/details/105345344