【ZJU-Machine Learning】Probabilistic Classification Method

fundamental issue


Insert image description here
Our focus is on prior probabilities.
During the training process, the data sample distribution of the training set and the test set should be approximately the same to ensure that the prior probabilities are approximately the same.

Insert image description here
Insert image description here

Probability density estimation problem

Naive Bayes classification problem

1. Restrictions:
Insert image description here
2. Application example:
Spam classification is mainly divided into two categories, namely: spam and not spam. During training, the sample is an email file, and its unit is a word.
Insert image description here
Need to learn:
Insert image description here
3. Calculation of P(d|c)
Insert image description here

In order to avoid that a certain word does not appear in the Thunder Chain sample but appears in the test sample, this will cause p to be 0.
Therefore: we make the following improvements:
Insert image description here

4. Judgment:
Insert image description here
Then d belongs to C1, otherwise it belongs to C2.

Gaussian probability density estimate

Insert image description here
Insert image description here
Here, ln makes p(x|c) change from multiplication to addition.
Insert image description here
where μ is the mean and Σ is the covariance matrix.

Summary: steps for Gaussian probability density estimation
Insert image description here

Gaussian Mixture Model

When a Gaussian model cannot be simulated, we use multiple Gaussian models. This is a non-convex problem, and it often cannot find the global optimum, but only the local optimum.
Insert image description here
Insert image description here

Insert image description here
If we continue to find partial derivatives, it will be particularly complicated. Next, there are the following methods:
Insert image description here
The first two methods are applicable to all local problems, while the EM algorithm detachment can solve a certain type of local extreme value problems.
In order to adapt to gradient descent, we take the inverse of E, so that the problem of finding the maximum value becomes the problem of finding the minimum value.

The EM algorithm has the following three advantages:
1. No need to adjust any parameters
2. Simple programming
3. Beautiful theory

EM algorithm

(1) Basic idea:

This idea is actually not exactly the same as the EM algorithm. This method uses a hard distinction method.
The problem we are facing now is a chicken-and-egg problem. For sample points, as long as we find the clustered area of ​​sample points, we will find the Gaussian distribution, and we will get μ and Σ. And we find Gaussian distribution. The model is to look at the distribution density of sample points. For such a problem, we can first randomly assume the Gaussian model where each point is located, and then the hard distinction method is to select the Gaussian model with a higher probability value for each point in the next calculation, and then continue to iterate. (EM algorithm is actually similar to K-means algorithm)

When we don't know the attributes or parameters of the data, we first randomize an attribute or parameter and then iterate.

Insert image description here
Insert image description here
Hard distinction: If p1>p2, then x belongs to model No. 1.
Soft distinction:
Insert image description here

(2) EM algorithm of Gaussian mixture model

Insert image description here
Among them, N is a function of Gaussian distribution.
Insert image description here

N samples are divided into k Gaussian distributions (in the form of this probability), and the sum of Nk is N
Insert image description here

(3) Example of EM algorithm-K-means clustering

Insert image description here
Indicative function I(x): When the input is True, the output is 1; when the input is False, the output is 0. Insert image description here
Prove its convergence:
Insert image description here
Among them, the third step makes E smaller because the mean will make E smaller.
Insert image description here

Example of its application: Image vector quantization based on K-means clustering

Insert image description here

Voiceprint recognition

(1) Preprocessing - Removal of silence
Because silence has a high degree of similarity, it needs to be removed.
Insert image description here
The zero-crossing rate is the number of times the speech waveform crosses the 0 axis per unit time (usually one frame time). (Recognizing consonants)
During preprocessing, the energy is processed first, and then the zero-crossing rate is filtered on the part removed during the energy screening process. If the zero-crossing rate of the part removed during the energy screening is high, then this section is restored.

(2) Extracted features: MEL-frequency Cepstrum Coefficients (MFCC)
Insert image description here

That is, through a piece of speech, the height of the vocal organ (that is, the physical characteristics of the person) is deduced to
Insert image description here
divide the speech into small segments (Segments). The recommended segment setting is: each segment length is 20ms, and a segment is taken every 10ms, so that 1 second of speech is Able to obtain 100 segments.

For each segment, extract a cepstral vector. The dimension of the vector can be adjusted according to the settings. Generally, it can be 12, 24, 36, etc.
Insert image description here
Insert image description here
(3) Use feature MFCC to make Gaussian mixture

The K of the input Xi (MFCC)
Insert image description here
mixed Gaussian is set to 64, and the covariance matrix Σ is a diagonal matrix.

The sum of π is 1, and the last parameter can be subtracted from 1, so π has 63
Insert image description here
(4) data

XM2VTS, consists of 295 people, each person has 24 sound files, recorded in four times, 6 each time, with an interval of one month between each time.

文件1和4:’zero one two three four five six seven eight ten’

文件2和5:’five zero six nine two eight one three seven four’

Documents 3 and 6: 'Joe took father's green shoe bench out'

Use the 12 files from the first two times for training and the 12 files from the last two times for testing.
Insert image description here
Its disadvantages: there cannot be noise (changing the distribution of X), otherwise the accuracy will drop significantly

Proof of EM algorithm

Insert image description here
Insert image description here

The general form of the EM algorithm

Based on the above derivation of the EM algorithm, we derive the general form of the EM algorithm:
Insert image description here

Convergence proof of EM algorithm

Insert image description here
E(θ)<=0, there is a previous term, everything must converge

Corresponding the general EM algorithm to the K-means algorithm

Insert image description here
Algorithm steps:
Insert image description here
Insert image description here
Insert image description here
Disadvantages of the EM algorithm:
unreliability. The military region quota has certain requirements and randomness for the selection of the initial value. The selection of different initial values ​​may cause great differences in the final results.
Insert image description here

Guess you like

Origin blog.csdn.net/qq_45654306/article/details/113879862