Machine learning---Linear discriminant analysis

1. Basic idea

        Linear Discriminant Analysis (LDA), also called Fisher Linear

Discriminant (FLD) is a classic algorithm for pattern recognition. It was first proposed by Ronald Fisher in 1936 and developed by

Belhumeur introduces the fields of pattern recognition and artificial intelligence.

       The basic idea of ​​linear discriminant analysis is to project high-dimensional pattern samples into the optimal discriminant vector space to extract classification information

and the effect of compressing the dimensionality of the feature space. After projection, it is ensured that the pattern sample has the maximum inter-class distance and the minimum class distance in the new subspace.

distance, that is, the pattern has the best separability in this space.

       Therefore, it is an effective feature extraction method. Using this method, the inter-class scatter matrix of the projected pattern samples can be

is the largest, and at the same time the intra-class dispersion matrix is ​​the smallest.

       LDA and PCA (Principal Component Analysis) are both commonly used dimensionality reduction techniques. PCA mainly finds comparisons from the perspective of feature covariance.

Good projection method. LDA considers labeling more, that is, it is hoped that the distance between data points of different categories will be larger after projection, and the same

Category data points are more compact.

2. LDA

        Two categories, one green category and one red category. The first picture below is the original data of two categories. Now it is required

Reduce the dimensionality of data from two dimensions to one dimension. If projected directly onto the x1 axis or x2 axis, there will be duplication between different categories, resulting in poor classification effect.

drop. The straight line mapped to the second picture below is calculated using the LDA method. It can be seen that the red category and the green category are in

The distance between them after mapping is the largest, and the degree of discreteness of the internal points of each category is the smallest (or the degree of aggregation is the largest)

of).

First image below: When projected onto this line, the two classes are not well separated.

Second image below: This line successfully separates the two classes while reducing the dimensionality of the problem from two features (x1, x2) to only

A scalar value y.

LDA is a linear classifier. For a classification problem of K-classification, there will be K linear functions: 

       When the condition is met: for all j, Yk > Yj, we say that x belongs to category k. For each category,

There is a formula to calculate a score. Among the scores obtained by all formulas, find the largest one, which is the category it belongs to.

       The above formula is actually a kind of projection, which projects a high-dimensional point onto a straight line. The goal of LDA is to give

Create a data set labeled with categories and project it onto a straight line to separate the points as much as possible by category. When k=2, that is

When classifying the problem, it is as shown in the figure below:

The red square points are the original points of class 0, and the blue square points are the original points of class 1. The line passing through the origin is the straight line of the projection.

Line, you can clearly see from the picture that the red point and the blue point are clearly separated by the origin.

3. Optimization function

Assume that the straight line (projection function) used to distinguish the two categories is:

 One goal of LDA classification is to make the distance between different categories as far apart as possible, and the distance within the same category as close as possible, so

We need to define several key values:

The original center point (mean) of category i is: (Di represents the point belonging to category i):

The projected center point of category i is:

After measuring the projection of category i, the degree of dispersion (difference) between category points is: 

Finally, we can get the following formula, which represents the target optimization function after LDA is projected to w:

The denominator represents the sum of variances within each category. The larger the variance, the more dispersed the points within a category are. The numerator is the variance of each category.

As the square of the distance between the center points, we can find the optimal w by maximizing J(w).

The goal of classification is to make the points within a category as close as possible (concentrated), and the points between categories as far apart as possible.

We define a matrix of the dispersion degree of each category before projection, which means that if the input point set Di of a certain category

The closer a point is to the center point mi of this classification, the smaller the value of the element in Si will be. If the points of the classification are all tightly around mi,

Then the element value in Si is closer to 0.

definition:

definition:

Simplification:

After bringing (1) in, we get:

Similarly, numerator J(w) is:

In this way, the objective optimization function can be transformed into:

       In this way, the Lagrange multiplier method can be used, but there is another problem. If the numerator and denominator can take any value,

That will lead to infinite solutions. Limit the denominator to a length of 1, and use it as a restriction condition of the Lagrange multiplier method to get:   

Such an equation is a problem of finding generalized eigenvalues. If Sw is invertible, then multiply both sides of the derivative result by the inverse of Sw:

The gratifying result is that w is the eigenvector of the matrix. This formula is called Fisher linear discrimination.

Observe again and find the previous formula of SB:

so:

Bringing in the final eigenvalue formula we get:

Since any expansion or reduction of w will not affect the result, the constants on both sides can be eliminated to obtain:

At this point, we only need to find the mean and variance of the original sample to find the best direction w. This is what Fisher proposed in 1936.

linear discriminant analysis.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_43961909/article/details/132354454