Laplace Approximation for Bayesian Inference

Laplace Approximation for Bayesian Inference

This article describes the use of the Laplace approximation to solve the Bayesian posterior probability distribution. In the previous article: Maximum Posterior Probability (MAP) for Bayesian Inference, the use of point estimation method to solve the posterior probability distribution was introduced, and the posterior probability distribution formula was defined in the article:

\[p(w|t,X)=\frac{p(t|X,w)p(w)}{p(t|X)}\]

The denominator on the right side of the \(p(w|t,X)\) equation can be regarded as a constant.

Define the function \(g\) as follows:

\[g(w;X,t,\sigma^2)=p(t|X,w)p(w|\sigma^2)\]

Therefore, the ratio of \(g\) to \(p(w|t,X)\) is constant. The point estimation method is introduced above to solve \(p(w|t,X)\) . This paper introduces the Laplace approximation method to solve \(p(w|t,X)\) .

What is the Laplace approximation?

Since there is no way to solve \(p(w|t,X)\) directly , and instead solve \(g(w;X,t,\sigma^2)\) , Laplace approximation is to first assume the function \( log(g(w;X,t,\sigma^2))\) obeys the Gaussian distribution, and then through the Taylor expansion formula, \(log(g(w;X,t,\sigma^2))\) in Expand at \(w^*\) . \(w^*\) is the optimal parameter obtained above using Newton's method.

The mathematical expression of the Gaussian distribution is as follows:

\[\frac{1}{\sqrt{2\pi}}exp(-\frac{(w-u)^2}{2\sigma^2})\]

If the mean \(u\) and variance \(\sigma^2\) are known, the Gaussian distribution form of \(g\) is also obtained .

Taylor expansion

According to the above introduction, at \(w^*\) , \(log(g(w;X,t,\sigma^2))\) the first level number is equal to 0, and the second level number is less than 0 (for multivariate functions , then it is negative definite black match matrix). Therefore, the second-order Taylor expansion of it is as follows:

taylor

Since the number of one level is 0, it is simplified to:

Official (1)

Among them, \(v\) is as follows:

Take the logarithm of the mathematical expression for the Gaussian distribution:

\[logK-\frac{(w-u)^2}{2\sigma^2} (公式2)\]

where \(K=\frac{1}{\sqrt{2\pi}}\) is a constant. Comparing formula 1 \(log(g(w;X,t,\sigma^2))\) with formula 2, we can see that:

\(u=w^*\)

\ (\ sigma ^ 2 = \ frac {1} {v} \)

So far, we have solved the Gaussian distribution of the function \(log(g(w;X,t,\sigma^2))\) , and \(g\) and \(p(w|t,X)\ ) is a constant, and the distribution of the posterior probability \(p(w|t,X)\) is also obtained .

Predict using the expected value of the posterior probability distribution

For a new sample \(x_{new}\) , the probability of classifying it as a negative class is: \(P(T_{new}=1|x_{new},X,t,\sigma^2)\)

And this probability is the calculation: \(p(w|t,X)\) The expectation of the distribution obeyed. From the previous Laplace approximation, we know that it serves the normal distribution:

\(p(w|t,X)\)~\(N(u,\sigma^2)\)

If \(w\) is for multivariate variables, then

\(p(w|t,X)\) ~ \(N(u, \Sigma)\) , where \(u\) is a vector and \(\Sigma\) is a matrix

Therefore:

The Gaussian distribution is a distribution of connected random variables, so to solve the expected value of the Gaussian distribution is to integrate the probability density function. Obviously, the probability density function is a function of \(w\) , but because:

\[P(T_{new}=1|x_{new},w^*)=\frac{1}{1+exp(-w^T*x_{new})}\]

The integral value of \(w\) cannot be calculated, that is, it cannot be solved: \(E_{N(u,\Sigma)}(P(T_{new}=1|x_{new},w))\)

Fortunately, we are solving the expected value of the Gaussian distribution, so we select \(N_s\) samples to approximate the expected value:

Thus, the predicted probability is obtained. So far, the use of Laplace approximation to solve the posterior probability distribution has been introduced.

decision boundary

The posterior probability solved by the point estimation method is a specific function of \(w\) , while the posterior probability solved by the Laplace approximation method is a random variable that obeys the Gaussian distribution.

In the point estimation method, the decision boundary is a straight line, and due to the uncertainty of random variables, many of the decision boundaries obtained by the Laplace approximation are curved:

Summarize

The Laplace approximation is another way to calculate the posterior probability. It first assumes that the posterior probability \(p(w|t,X)\) obeys a Gaussian distribution. Then, the log form of the function \(g(w;X,t,\sigma^2)\ ) which is a constant compared to the posterior probability \(p(w|t,X)\) is in \(w^ *\) to perform Taylor expansion to solve this Gaussian distribution.

\(w^*\) is solved by the Newton method mentioned above.

With this Gaussian distribution, for each new sample, calculate the expected value of the Gaussian distribution, which is the model's predicted value for this new sample.

Original: http://www.cnblogs.com/hapjin/p/8848480.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324407906&siteId=291194637