Some probability theories in machine learning

Conditional Probability

P(B|A)=13 means that when A occurs, the probability of B occurs.
Formula:
P(B|A)=P(AB)P(A)
P(AB)=P(B|A) ∗P(A)=P(A|B)∗P(B)
P(A|B)=P(B|A)∗P(A)P(B)

Total probability formula

B1, B2, B3...Bn is a division of S in the sample space, then
P(A)=P(B1)P(A|B1)+P(B2)P(A|B2)+...P( Bn)P(A|Bn)=∑ni=0P(Bi)P(A|Bi)

Bayesian formula

P(Bi|A)=P(A|Bi)∗P(Bi)∑ni=0$P(A|Bi)
Several understandings and explanations of Bayesian formula

P(A|B)=P(B|A)∗P(A)P(B)
where the probability of P(A) is the prior probability, which usually refers to the probability of a certain category in machine learning>

P(B|A) is the conditional probability, which is the probability of occurrence of B in class A

P(A|B) is the posterior probability, which specifically refers to: when the B event occurs, what is the probability from the A category at this time.

Maximum-likelihood

Principle
Use the known sample structure to reverse the parameter value that is most likely to lead to such a result. Maximum likelihood estimation is a statistical method based on the principle of maximum likelihood, and it is an application of probability theory in statistics. Maximum likelihood estimation provides a method for evaluating model parameters given observation data, that is, "the model has been determined and the parameters are unknown". Through several experiments, observe the results, and use the experimental results to obtain a certain parameter value that can maximize the probability of the sample appearance, which is called maximum likelihood estimation.

Since the samples in the sample set are all independent and identically distributed, only one type of sample set D can be considered to estimate the parameter vector θ. Remember the known sample set as:
D=x1,x2,x3,……xn
l(\theta)=p(D|\theta)=p(x_1,x_2,x_3……x_N| \theta )=\prod_ {i=1}^{n}P(x_i|\theta)$$ is the likelihood function of D

How to find the maximum likelihood function in ML

Find the value of θ that maximizes the probability of occurrence of this group of samples.

$$ \hat{\theta}=argmax l(\theta)=argmax\prod_{i=1}^{N}P(x_i|\theta) For a
simple understanding, we are in the situation where θ is known to occur Maximize the probability of D sequence. The multiplication is not easy to calculate. We can make a change.

θ ^ = argmaxl (θ) = argmax∏i = 1NP (xi | θ) = argmax (ln (∏i = 1NP (xi | θ))) = argmax∑i = 1Nln (P (xi | θ))

Guess you like

Origin blog.csdn.net/qq_38851184/article/details/106534210