Machine Learning: Logistic Regression and Maximum Entropy

I. Overview

(1) Maximum entropy model

The maximum entropy model is a learning criterion for probability models, which can be applied to various probability models.

Take the conditional probability distribution model as an example:

Model:

It is also an optimization strategy, solving max Pw, and getting the parameter w 

Strategy:

Constrained optimization problem:

Optimization problem in dual form.

In the same model, find the max Pw to get the parameter w.

 (2) Logistic regression

Binomial logistic regression model:

Multinomial logistic regression model: 

optimization:

Solve the maximum estimated value of the following likelihood function to obtain the parameter w

With w, the model is used to calculate the probability values ​​of the two categories respectively, and the classification results are obtained. 

2. Main content

(1) Maximum entropy

The principle of maximum entropy is a criterion for probabilistic model learning. The principle of maximum entropy believes that when learning a probability model, among all possible probability models (distributions), the model with the largest entropy is the best model. Constraints are usually used to determine the set of probability models, so the principle of maximum entropy can also be expressed as selecting the model with the largest entropy among the set of models that satisfy the constraints.

//================== Supplement =====================//

Intuitively, the principle of maximum entropy believes that the probability model to be selected must first satisfy the existing facts, that is, the constraints. In the absence of more information, those uncertain parts set it as " equally likely " , that is, the probability is equal, so the entropy is the largest . The principle of maximum entropy expresses equal possibility, that is, equal probability, through the maximization of entropy. " Equal probability " is not easy to operate, but entropy is a numerical index that can be optimized. 

Therefore, entropy is a quantitative indicator to measure the probability of equality. The greater the entropy, the greater the disorder and the more information it contains.

Figure 6.2 provides a geometric interpretation of probabilistic model selection using the principle of maximum entropy. The probability model set P can be represented by a simplex in Euclidean space, such as the triangle ( 2 - simplex ) in the left figure. A point represents a model, and the entire simplex represents a collection of models. A straight line on the right figure corresponds to a constraint, and the intersection of the lines corresponds to the set of models that satisfy all constraints. In general, there are still infinitely many such models. The purpose of learning is to select the optimal model in the possible model set, and the principle of maximum entropy gives a criterion for optimal model selection.

//=====================================// 

The principle of maximum entropy can be applied to various probability models. Here we take the conditional probability model as an example to explain the solution process.

 Restrictions:

The difference between the maximum entropy model and the naive Bayes is that it contains multiple constraints.

(1) The constraints are introduced below, which are represented by the characteristic function f(x,y).

A model may contain several such constraints. 

(2) Construct another logical constraint.

 

Because if the law can be obtained from the training set, then there is a formula: P(X,Y) = P(Y|X)P(X), so there is the above equation, which is also one of the constraints to be satisfied.

Among them, P(x) and P(X,Y) are the known results obtained from the training set, and P(Y|X) is the result to be obtained.

Compared with naive Bayesian classification, it is to summarize the empirical distribution of the joint probability distribution and the empirical distribution of the marginal probability distribution from the training data set. The conditional probability distribution can be obtained by using the formula P(Y|X) =  . The calculation formula is as above.

 

This is the calculation method of the model without constraints. If constraints are included, a more general method is used: the maximum entropy model.

 

//================ Preliminary knowledge ==================//

The formula for calculating entropy is:

The formula for calculating conditional entropy is: 

//========================================// 

Build an optimization strategy:

Now that we have the calculation formula and constraints of the above conditional entropy, our goal is to find the maximum entropy while satisfying the constraints, so there are:

 A little conversion has:

The solution to the above constrained problem is the solution of the maximum entropy model. 

When solving, the above constrained problems can be transformed into unconstrained problems.

 

 

The internal minimization problem is to first calculate the partial derivative of P(y|x), and the result is only w 

 

 

F: 

F: 

So there are: 

 The w obtained by maximizing it is the parameter of the model. Also get the conditional probability P(y|x)

(2) Logistic regression

Model:

Strategy: 

 

//===================== Supplement ===================// 

 //=======================================//

optimization:

Using Maximum Likelihood Estimation

 

 Multinomial Logistic Regression

  • Algorithm: commonly used Newton method, gradient descent method, improved iterative scaling method

 

Guess you like

Origin blog.csdn.net/stephon_100/article/details/125242834