Bernoulli distribution of the maximum likelihood estimation (cross-entropy minimization, classification)

Bernoulli distribution

Bernoulli distribution, also known as 0-1 distribution is a discrete probability distribution. A typical example is a special coin toss, toss a coin each time only two results, both positive and negative. The probability of a coin thrown positive for \ (the p-\) , the probability was thrown negative \ (the p-1-\) . Thus, for the random variable \ (X-\) , there are:
\ [\ the aligned the begin {F} (X-=. 1) = P & F \\ (X-= 0) =. 1 &-P \ the aligned End {} \]
Since the random variables \ (X-\) only two values 0 and 1, \ (X-\) of the probability distribution function can be written as:
\ [F (X-) = P ^ X (1-P) 1-X} ^ { \ qquad 0 <p <1 \ tag {1} \]

Mathematical Expectation

In probability theory and statistics, mathematical expectation (or mean) is the sum of each test probabilities of possible outcomes multiplied by its results. It reflects the average value of the random variable size.

Discrete

Discrete random variables \ (X \) is the mathematical expectation for everything possible values \ (x_i \) and the corresponding probability \ (p (x_i) \) the sum of the products, that is, if the value of the random variable is a collection of \ (\ lbrace x_1, x_2, ..., x_n \ rbrace \) , the probability of each value corresponding to \ (\ lbrace p (x_1) , p (x_2), ..., p (x_n) \ rbrace \) , there are: \ [E (X-) = \ sum_ = {I}. 1 ^ n-x_np (x_n) \ {2} Tag \]


Thus, for the Bernoulli distribution , which is the mathematical expectation:
\ [E (X-) = 1⋅p 0⋅ + (. 1-P) = P \]
For random variables \ (X-\) , and the mathematical expectation of variance the formula is satisfied:
\ [Var (X-) = E ((XE (X-)) ^ 2) = E (X ^ 2) - [E (X)] ^ 2 \ Tag. 3} {\]

The variance of a random variable is a measure of the degree of deviation between random variables and its mathematical expectation.

Equation is derived as follows:
\ [\ the aligned the begin {Var} (X-) & E = ((XE (X-)) ^ 2) = \\ & E (X-2-2x ^ \ CDOT E (X-) + [E (X-)] ^ 2) \\ = & E ( X ^ 2) -2 \ cdot E (X) \ cdot E (X) + [E (X)] ^ 2 \\ = & E (X ^ 2) - [E (X) ] ^ 2 \ end {aligned} \]

For the Bernoulli distribution , there \ (E (X2) = E (the X-) \) . Thus, the variance is:
\ [Var (X-) PP = P ^ 2 = (. 1-P) \]

Maximum likelihood estimate

In statistics, maximum likelihood estimation (MLE), also known as maximum likelihood estimation is used to parameter estimation probability model approach. Its purpose is: using known sample results, the most likely cause reverse thrust parameter value thus results.

Since the sample in the sample set are independent and identically distributed, it is to Bernoulli distribution to derive the parameters p maximum likelihood estimation. Sample set of known note:
\ [D = \ lbrace x_1, x_2, ..., x_n \ rbrace \]
which is the likelihood function:
\ [\} Split the begin {L (P | x_1, ..., x_n) & = f (X | p) \\ & = f (x_1, x_2, ..., x_n | p) \\ & = \ prod_ {i = 1} ^ nf (x_i | p) \\ & = \ prod_ {i = 1} ^ np ^ {x_i} (1-p) ^ {1-x_i} \ end {split} \ tag {4} \]

Because even multiplication, typically logarithmic likelihood function is calculated, i.e. the log-likelihood function. Therefore the log likelihood function:
\ [\ Split the begin {L} = & \ log \ prod_. 1} = {I ^ NF (x_i | P) = \\ & \ sum_. 1} ^ {n-I = {\ log f (x_i | p)} \\ = & \ sum_ {i = 1} ^ n {[x_i \ log p + (1-x_i) \ log (1-p)]} \ end {split} \ tag {5 } \]

Equation \ ((5) \) is actually used logistic regression to the cross-entropy.
\ [\ Begin {split} \ hat {p} & = \ arg \ max_ {p} L (p | X) \\ & = \ arg \ max_p {\ sum_ {i = 1} ^ n {[x_i \ log p + (1-x_i) \ log (1-p)]}} \ end {split} \]

Thus, the maximum likelihood estimation is actually extremum points likelihood function, logarithmic likelihood function derivation parameters PPP:
\ [\ the aligned the begin {} \ {FRAC \ partial L} {\ &} P = partial \ sum_ {i = 1} ^ n {[\ frac {x_i} {p} + \ frac {1-x_i} {p-1}]} \\ & = \ sum_ {i = 1} ^ n {\ frac {p-x_i} {p ( p-1)}} = 0 \ end {aligned} \]

To obtain the maximum likelihood estimate Bernoulli as:
\ [\ the aligned the begin {} \ sum_. 1} ^ {n-I = (P-x_i) = 0 \\ & \ & the implies P = \ {n-FRAC {}. 1 } \ sum_ {i = 1} ^ nx_i \ end {aligned} \]

to sum up

The maximum likelihood estimate of the general steps required probability model are as follows:

1.写出随机变量的概率分布函数;
2.写出似然函数;
3.对似然函数取对数,并进行化简整理;
4.对参数进行求导,找到似然函数的极值点;
5.解似然方程。

I believe understand logistic regression algorithm of small partners have seen, for logistic regression to derive its essence is the maximum likelihood estimation algorithm. In the logistic regression, the probability distribution function is no longer $ f (x) = p ^ x (1-p) ^ {1-x} $, but:
\ [\} the aligned the begin {P (Y | X; \ theta) = (h _ {
\ theta} (x)) ^ y (1-h _ {\ theta} (x)) ^ {1-y} \ end {aligned} \ tag {6} \] wherein:
\ [ \ begin {split} h _ { \ theta} (x) = \ frac {1} {1 + e ^ {- z}} = \ frac {1} {1 + e ^ {- \ theta ^ {T} x} } \ end {split} \ tag {7} \]

Reference Links: https://blog.csdn.net/github_39421713/article/details/89213747

Guess you like

Origin www.cnblogs.com/LCharles/p/11906737.html