Parameter Estimation - Grid-Based Approach

introduction

Approximate posterior distribution P r ( θ ∣ d ) P_r(\theta|d)PrThe simplest numerical approach to ( θ | d ) is to base all computations on an array of (not necessarily normalized)density values​​evaluated on a regular grid. Integrals over distributions are approximated as simple sums, and computing other standard derived quantities is fairly simple.

Pros and cons of the grid approach

Advantages of grid-based methods

  1. Accuracy/speed, since regular spacing of evaluations means minimal redundancy - the estimation error of a standard derivation decreases with increasing sample size;
  2. Repeatability , because (assuming only that the grid parameters are held fixed, the posterior density can be directly estimated) the result will be the same if recomputed;
  3. Simple, because the algorithm for building a regular grid is not complicated;
  4. There is no normalization requirement, since a grid of unnormalized posterior values ​​can be used.

Disadvantages, or limitations, of grid-based methods

  1. Requires some extrinsic knowledge of the range of parameter values ​​for which the posterior is important (while the prior range may be sufficient, it is usually not enough);
  2. If there is a posteriori error, the underlying "systematic" error is periodic, or varies on a scale commensurate with the spacing between grid points;
  3. This approach is only suitable for low-dimensional problems (any problem with more than 3-4 parameters is computationally infeasible);
  4. Writing code to handle grids of arbitrary dimensions can be unwieldy.

The algorithms required to generate the posterior sample arrays and process them into useful output are very simple, certainly much simpler than their purely mathematical expressions, especially in multidimensional environments. It is much more natural to describe them in terms of algorithms. Here the binary posterior distribution P r ( θ ∣ d ) = P r ( x , y ∣ d ) P_r(θ|d) = P_r(x,y|d)Pr(θd)=Pr(x,y d ) , whereN p = 2 , θ = ( x , y ) N_p = 2, θ = (x,y)Np=2 i=(x,y ) ; the generalization to higher (or lower) dimensional problems is conceptually straightforward.

mesh generation

For grid-based methods, for the distribution P r ( x , y ∣ d ) P_r(x,y|d)Pr(x,y | d ) (ie, the ability to assess) is not a sufficient starting point. Some information is needed to decide the range of parameter values ​​to consider: it must be possible to determinexmin , xmax , ymin and ymax x_{min}, x_{max}, y_{min} and y_{max}xminxmaxyminJapanese _maxP_r (x_{min}≤x≤x_{max},y_{min} ≤ y ≤ y_{max}|d)Pr(xminxxmaxyminyymaxd) ≪ \ll 1(immediately,xmin , xmax , ymin sum ymax x_{min}, x_{max}, y_{min} sum y_{max }xminxmaxyminJapanese _maxThe bounded region contains almost all probabilities). However, simply making this range arbitrarily large is not enough, since most , if not all, of the grid points will fall in low probability regions . There are general algorithms for doing this (sampling methods), but to have to resort to these would defeat the purpose of grid-based methods to a large extent. So here assume xmin , xmax , ymin and ymax x_{min}, x_{max}, y_{min} and y_{max }xminxmaxyminJapanese _maxReasonable values ​​for are known from some external information, although it cannot be overemphasized that if such constraints are not readily available, then grid-based methods can immediately become useless.

The next decision to make is the resolution of the grid, given by the number of columns N c N_cNcand the number of rows N r N_rNrdefinition. This choice is subject to a trade-off between accuracy and speed.

The minimum value of each dimension is 10; for the posterior distribution often encountered in real problems, any value greater than 1 0 2 10^210A value of 2 is usually unnecessary. As with product lines, this is subject to a certain amount of trial and error.

Therefore, the grid covers x min ⁡ ≤ x ≤ x max ⁡ x_{\min } \leq x \leq x_{\max }xminxxmaxy min ⁡ ≤ y ≤ y max ⁡ y_{\min } \leq y \leq y_{\max }yminyymaxrange with N c × N r N_{\mathrm{c}} \times N_{\mathrm{r}}Nc×NrAn area is Δ x × Δ y \Delta x \times \Delta yΔx _×A cell array of Δ y , where Δ x = ( x max ⁡ − x min ⁡ ) / 2 \Delta x=\left(x_{\max }-x_{\min }\right) / 2Δx _=(xmaxxmin)/2 ,Δ y = ( y max ⁡ − y min ⁡ ) / 2 \Delta y=\left(y_{\max }-y_{\min }\right) / 2y _=(ymaxymin)/2 . From this point, follow the algorithm specified below:

1. For each column combination, c ( ∈ { 1 , 2 , … , N c } ) c\left(\in\left\{1,2, \ldots, N_{\mathrm{c}}\right\ }\right)c({ 1,2,,Nc} ) and line combinations,r ( ∈ { 1 , 2 , … , N r } ) r\left(\in\left\{1,2, \ldots, N_{\mathrm{r}}\right\}\ right)r({ 1,2,,Nr} ) to calculate,

( x c , y r ) = [ x min ⁡ + c − 1 / 2 N c ( x max ⁡ − x min ⁡ ) , y min ⁡ + r − 1 / 2 N r ( y max ⁡ − y min ⁡ ) ] , \left(x_{c}, y_{r}\right)=\left[x_{\min }+\frac{c-1 / 2}{N_{\mathrm{c}}}\left(x_{\max }-x_{\min }\right), y_{\min }+\frac{r-1 / 2}{N_{\mathrm{r}}}\left(y_{\max }-y_{\min }\right)\right], (xc,yr)=[xmin+Ncc1/2(xmaxxmin),ymin+Nrr1/2(ymaxymin)],

where the points lying in the middle of each grid cell are selected.

  1. For each element in the array, compute the unnormalized posterior probability,

p c , r ′ = Pr ⁡ ( x , y ) Pr ⁡ ( d ∣ x , y ) . p_{c, r}^{\prime}=\operatorname{Pr}(x, y) \operatorname{Pr}(\boldsymbol{d} \mid x, y) . pc,r=Pr ( x ,y)Pr ( d)x,y).

3. Numerical normalization of posterior samples by calculation

pc , r = pc , r ′ ∑ c = 1 N c ∑ r = 1 N rpc , r ′ . p_{c, r}=\frac{p_{c, r}^{\prime}}{\sum_{c=1}^{N_{\mathrm{c}}} \sum_{r=1}^{ N_{\mathrm{r}}} p_{c, r}^{\prime}} .pc,r=c=1Ncr=1Nrpc,rpc,r.

Although this step is sometimes unnecessary, it is numerically cheap and simplifies subsequent analysis

A piecewise constant approximation to the posterior probability is now provided by

Pr ⁡ ( x , y ∣ d ) ≃ 1 ( x max ⁡ − x min ⁡ ) ( y max ⁡ − y min ⁡ ) ∑ c = 1 N c ∑ r = 1 N r Θ [ x − ( x c − Δ x / 2 ) ] Θ [ ( x c + Δ x / 2 ) − x ] Θ [ y − ( y c − Δ y / 2 ) ] Θ [ ( y c + Δ y / 2 ) − y ] p c , r , \begin{array}{c} \operatorname{Pr}(x, y \mid \boldsymbol{d}) \simeq \frac{1}{\left(x_{\max }-x_{\min }\right)\left(y_{\max }-y_{\min }\right)} \sum_{c=1}^{N_{\mathrm{c}}} \sum_{r=1}^{N_{\mathrm{r}}}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad \\ \Theta\left[x-\left(x_{c}-\Delta x / 2\right)\right] \Theta\left[\left(x_{c}+\Delta x / 2\right)-x\right] \Theta\left[y-\left(y_{c}-\Delta y / 2\right)\right] \Theta\left[\left(y_{c}+\Delta y / 2\right)-y\right] p_{c, r}\qquad\qquad\qquad, \end{array} Pr ( x ,yd)(xmaxxmin)(ymaxymin)1c=1Ncr=1NrTh[x(xcΔ x /2 ) ]Th[(xc+Δ x /2 )x]Th[y(ycy / 2 ) ]Th[(yc+y / 2 )y]pc,r,

Zero outside the area covered by the grid. More complex interpolation schemes can also be used to get from pc , r {p_{c,r}}pc,rto a distribution defined for all x and y, but the point is that the continuous function P r ( xc , yr ∣ d ) P_r(x_c,y_r|d)Pr(xc,yrd ) is now encoded (albeit approximately) in the finite set of numberspc , r {p_{c,r}}pc,rmiddle.

references

https://docslib.org/doc/8720468/parameter-estimation-daniel-mortlock-mortlock-ic-ac-uk-last-modi-ed-september-12-2013

Feroz, F., Hobson, M. P., and Bridges, M. (2009). MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics. Monthly Notices of the Royal Astronomical Society, 398, 1601–1614.

Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.

Skilling, J. (2004). Nested sampling. In AIP Conference Proceedings of the 24th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, volume 735 of Lecture Notes in Physics, Berlin Springer Verlag, pages 395–405.

Guess you like

Origin blog.csdn.net/weixin_48266700/article/details/128807748