A very interesting class, but there are still doubts about how to use P(x) to generate x.
In the neural network, y=w*x+b, why is it in this form? This class will boil down to this at the end.
To give a practical example, there are 71 in class A and 69 in class B in the training set
We assume that the 71 points of class A follow the gaussion distribution. The function involved in the above figure: input a point (representing the feature vector of an instance), and output the probability of the point in the sample, which is P(x|A) and P below. (x|B)
The function has two parameters, μ and :
Using the method of maximum likelihood estimation, in order to make the L function reach the maximum value, relevant parameters can be determined.
Ever since, get parameters , , 1, 2
In order to reduce parameters, we make = 1= 2
In the same way, we get , ,
So why do we do so much?
In order to determine which class x belongs to.
The probability that x belongs to A is P(A|x)
The probability that x belongs to B is P(B|x)
Use Bayesian formula to solve——
but it's not the end
Convert the functional form of P(A|x) into the form of sigmoid activation function
and simplify the expression for z
This does not get y=w*x+b in the neural network
Therefore, the meaning of the neural network connection layer in the classification is - assuming that the points in this class obey the gaussion distribution, etc., then this distribution has two parameters, and these two parameters will be updated continuously in the follow-up, then according to Result = sigmoid(w*x+b)=P(A|x), and then backpropagate to update the parameters.