Principle and linear discriminant analysis (Linear Discriminant Analysis)

Project Address: https://github.com/Daya-Jin/ML_for_learner/blob/master/discriminant_analysis/LinearDiscriminantAnalysis.ipynb
original blog: https://daya-jin.github.io/2018/12/05/LinearDiscriminantAnalysis/

LDA

Univariate dichotomous

Suppose now that there is a single variable binary classification, labeling and obey the binomial distribution, subject to the conditional probability characteristics such as variance Gaussian distribution:

So in a given sample conditions, the conditional probability of occurrence of these two categories are:

Between the two logarithmic probability can be written as:

Can be obtained from the above equation, LDA linear discriminant function for a certain sample can be written as:

Univariate multi-classification

Not difficult to get, for multi-classification problems, the predicted output LDA model are:

Where the probability distribution for the class.

Multivariate multiple classifiers

More generally, in the case discussed multivariate data if there is a feature, under conditions, introduced into a covariance matrix, wherein the conditional probability can be written as:

Linear discriminant function as:

LDA model predicted output is:

Wherein each parameter estimation obtained by the observation data:

The total number of samples is the number of samples a category for
Indicating sample set of categories
Representing the number of categories

It can be seen LDA is a simple Bayesian model, and did not use the maximum likelihood strategy.

QDA

LDA model has an assumption: conditional probability characteristic data of obedience mean unequal, equal variance Gaussian distribution, unequal variance if the next real situation? The following figure shows the variance equal to the variance, ranging from the situation:

20180110232856285205.png

Similarly, it is possible to obtain QDA (Quadratic Discriminant Analysis) discriminant function:

QDA model predicted output is:

Wherein each parameter estimation obtained by the observation data:

The total number of samples is the number of samples a category for
Indicating sample set of categories
.

Fisher angle-resolved LDA

To be added, did not quite understand this part

LDA for dimensionality reduction

For data categories, assuming that "like attracts like" condition is satisfied, then for a center, without affecting the performance of the classification, at least we can be mapped to a dimension of space. As for the two cluster centers, we can map it to a straight line and can be separated, for the case, you can find a dimensional mapping space. LDA algorithm so there is a use as a dimensionality reduction algorithm supervised, the core idea is that the original data is mapped to a new space, so that the mean difference in all kinds of new space as large as possible, and as far as possible variance within each class small, in the case of binary classification is very easy to give a visual optimization objectives:

In order to extend the concept to a high-dimensional space, first give a few concepts:

Between-class (between-class) scatter matrix: wherein the mean value for the class, the mean of the data
The class (within-class) scatter matrix:

In the method proposed by Fisher, the dimensionality reduction process can be written as:

Wherein the mapping matrix, the original data. So between classes of low dimensional data variance, and variance within the class, lower-dimensional optimization target is equivalent to a Rayleigh maximize entropy:

The optimization problem is also equivalent to:

Lagrange multiplication using the above-mentioned problems solution:

Reversible assumptions:

We can see this is a characteristic value problem.

Implement guidance

The complete code