Li Hongyi Machine Learning Notes - Probability Model

A very interesting class, but there are still doubts about how to use P(x) to generate x.

In the neural network, y=w*x+b, why is it in this form? This class will boil down to this at the end.

To give a practical example, there are 71 in class A and 69 in class B in the training set

We assume that the 71 points of class A follow the gaussion distribution. The function involved in the above figure: input a point (representing the feature vector of an instance), and output the probability of the point in the sample, which is P(x|A) and P below. (x|B)

The function has two parameters, μ and \sum:

 Using the method of maximum likelihood estimation, in order to make the L function reach the maximum value, relevant parameters can be determined.

Ever since, get parameters \in _{1}, \in _{2}, \sum1, \sum

In order to reduce parameters, we make \sum​​​​​​​​= \sum​​​​​​​​1= \sum​​​​​​​​2 

In the same way, we get \in _{1}, \in _{2},\sum

So why do we do so much?

In order to determine which class x belongs to.

The probability that x belongs to A is P(A|x)

 The probability that x belongs to B is P(B|x)

Use Bayesian formula to solve——

 but it's not the end

Convert the functional form of P(A|x) into the form of sigmoid activation function

 and simplify the expression for z

This does not get y=w*x+b in the neural network

Therefore, the meaning of the neural network connection layer in the classification is - assuming that the points in this class obey the gaussion distribution, etc., then this distribution has two parameters, and these two parameters will be updated continuously in the follow-up, then according to Result = sigmoid(w*x+b)=P(A|x), and then backpropagate to update the parameters.

Guess you like

Origin blog.csdn.net/weixin_62375715/article/details/129825920