Derivation notes of normal distribution

This article comes from Zhihu’s previous article on the derivation of normal distribution . I was enlightened, so I took notes.

Insert image description here

from Introduction To The Normal Distribution (Bell Curve), BySaul Mcleod, PhD, https://www.simplypsychology.org/normal-distribution.html

Assume that there is an error probability density function f (t) f(t)f ( t ) , now we havennValues ​​of n independent observationsx 1 x_1x1 x 2 x_2 x2 ⋯ \cdots x n x_n xn, assuming the true value is μ \muμ , then the error is:

ε 1 = x 1 − μ ε 2 = x 2 − μ ⋮ ε n = xn − μ \begin{aligned} \varepsilon_{1} & =x_{1}-\mu \\ \varepsilon_{2} & =x_ {2}-\mu \\ & \vdots \\ \varepsilon_{n} & =x_{n}-\mu \end{aligned}e1e2en=x1m=x2m=xnm

According to life experience, this error ε \varepsilonε , with a large number of observations, most of its values ​​should be0 0The range fluctuates around 0 , and appears more frequently. For observations with large errors, the corresponding∣ ε ∣ |\varepsilon|ε should also be large, and the frequency of occurrence should also be small. Make the maximum likelihood function:

L ( μ ) = ∏ i = 1 n f ( ε i ) = f ( x 1 − μ ) f ( x 2 − μ ) ⋯ f ( x n − μ ) \begin{aligned} L(\mu) & =\prod_{i=1}^{n} f\left(\varepsilon_{i}\right) \\ & =f\left(x_{1}-\mu\right) f\left(x_{2}-\mu\right) \cdots f\left(x_{n}-\mu\right) \end{aligned} L ( μ )=i=1nf( ei)=f(x1m )f(x2m )f(xnm ).

L ( μ ) L(\mu)L ( μ ) takes the natural logarithm:

ln ⁡ [ L ( μ ) ] = ln ⁡ [ ∏ i = 1 n f ( ε i ) ] = ln ⁡ [ f ( x 1 − μ ) f ( x 2 − μ ) ⋯ f ( x n − μ ) ] = ln ⁡ [ f ( x 1 − μ ) ] + ln ⁡ [ f ( x 2 − μ ) ] + ⋯ + ln ⁡ [ f ( x n − μ ) ] = ∑ i = 1 n ln ⁡ [ f ( x i − μ ) ] \begin{aligned} \ln [L(\mu)] & =\ln \left[\prod_{i=1}^{n} f\left(\varepsilon_{i}\right)\right] \\ & =\ln \left[f\left(x_{1}-\mu\right) f\left(x_{2}-\mu\right) \cdots f\left(x_{n}-\mu\right)\right] \\ & =\ln \left[f\left(x_{1}-\mu\right)\right]+\ln \left[f\left(x_{2}-\mu\right)\right]+\cdots+\ln \left[f\left(x_{n}-\mu\right)\right] \\ & =\sum_{i=1}^{n} \ln \left[f\left(x_{i}-\mu\right)\right] \end{aligned} ln [ L ( μ )]=ln[i=1nf( ei)]=ln[f(x1m )f(x2m )f(xnm ) ]=ln[f(x1m ) ]+ln[f(x2m ) ]++ln[f(xnm ) ]=i=1nln[f(xim ) ]

In order to get ln ⁡ [ L ( μ ) ] \ln [L(\mu)]The maximum value of ln [ L ( μ )] , for whichln ⁡ [ L ( μ ) ] \ln [L(\mu)]Find the partial derivative of ln [ L ( μ )] and set it equal to 0 00

∂ ln ⁡ [ L ( μ ) ] ∂ μ = ∂ ∑ i = 1 n ln ⁡ [ f ( x i − μ ) ] ∂ μ = − ∑ i = 1 n f ′ ( x i − μ ) f ( x i − μ ) = 0 \begin{aligned} \frac{\partial \ln [L(\mu)]}{\partial \mu} & =\frac{\partial \sum_{i=1}^{n} \ln \left[f\left(x_{i}-\mu\right)\right]}{\partial \mu} \\ & =-\sum_{i=1}^{n} \frac{f^{\prime}\left(x_{i}-\mu\right)}{f\left(x_{i}-\mu\right)} \\ & =0 \end{aligned} μln [ L ( μ ) ]=μi=1nln[f(xim ) ]=i=1nf(xim )f(xim ).=0

g ( t ) = f ′ ( t ) f ( t ) g(t)=\frac{f^{\prime}(t)}{f(t)} g(t)=f(t)f(t), then the above formula becomes:

∑ i = 1 n g ( x i − μ ) = 0 \sum_{i=1}^{n} g\left(x_{i}-\mu\right)=0 i=1ng(xim )=0

After reaching this step, the exciting part begins. This is also the brilliance of Gauss. He believes that μ \muThe unbiased estimate of μ should be x ˉ \bar{x}xˉ , then the original formula becomes

∑ i = 1 n g ( x i − x ˉ ) = 0 \sum_{i=1}^{n} g\left(x_{i}-\bar{x}\right)=0 i=1ng(xixˉ)=0

in,

x ˉ = 1 n ∑ i = 1 n x i \bar{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i} xˉ=n1i=1nxi

Solve the above equation for each xi x_ixiFind the partial derivative, for example, x 1 x_1x1Finding the partial derivative, we can get the following equation:

∂ ∑ i = 1 n g ( x i − x ˉ ) ∂ x 1 = ∂ ∑ i = 1 n g ( x i − 1 n ∑ i = 1 n x i ) ∂ x 1 = g ′ ( x 1 − x ˉ ) ( 1 − 1 n ) + g ′ ( x 2 − x ˉ ) ( − 1 n ) + ⋯ + g ′ ( x n − x ˉ ) ( − 1 n ) = 0 \begin{aligned} \frac{\partial \sum_{i=1}^{n} g\left(x_{i}-\bar{x}\right)}{\partial x_{1}} & =\frac{\partial \sum_{i=1}^{n} g\left(x_{i}-\frac{1}{n} \sum_{i=1}^{n} x_{i}\right)}{\partial x_{1}} \\ & =g^{\prime}\left(x_{1}-\bar{x}\right)\left(1-\frac{1}{n}\right)+g^{\prime}\left(x_{2}-\bar{x}\right)\left(-\frac{1}{n}\right)+\cdots+g^{\prime}\left(x_{n}-\bar{x}\right)\left(-\frac{1}{n}\right) \\ & =0 \end{aligned} x1i=1ng(xixˉ)=x1i=1ng(xin1i=1nxi)=g(x1xˉ)(1n1)+g(x2xˉ)(n1)++g(xnxˉ)(n1)=0

g ′ ( x i − x ˉ ) g^{\prime}\left(x_{i}-\bar{x}\right) g(xixˉ )is regarded as an unknown number, and the above homogeneous linear equations are written as a matrix equationA x = 0 \boldsymbol{A x}=\mathbf{0}Ax=0 form:

( 1 − 1 n − 1 n ⋯ − 1 n − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) ( g ′ ( x 1 − x ˉ ) g ′ ( x 2 − x ˉ ) ⋮ g ′ ( x n − x ˉ ) ) = ( 0 0 ⋮ 0 ) \left(\begin{array}{cccc} 1-\frac{1}{n} & -\frac{1}{n} & \cdots & -\frac{1}{n} \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right)\left(\begin{array}{c} g^{\prime}\left(x_{1}-\bar{x}\right) \\ g^{\prime}\left(x_{2}-\bar{x}\right) \\ \vdots \\ g^{\prime}\left(x_{n}-\bar{x}\right) \end{array}\right)=\left(\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right) 1n1n1n1n11n1n1n1n1n11n1 g(x1xˉ)g(x2xˉ)g(xnxˉ) = 000

For the coefficient matrix M \mathbf{M} of the above system of equationsM , put the 1st, 2nd, 3th ⋯, n 1,2,3 \cdots,n1,2,3,n lines are added sequentially to No.1 11 row, the following matrix can be obtained:

M = ( 1 − 1 n − 1 n ⋯ − 1 n − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) → ( 0 0 ⋯ 0 − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) \boldsymbol{M}=\left(\begin{array}{cccc} 1-\frac{1}{n} & -\frac{1}{n} & \cdots & -\frac{1}{n} \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right) \rightarrow\left(\begin{array}{cccc} 0 & 0 & \cdots & 0 \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right) M= 1n1n1n1n11n1n1n1n1n11n1 0n1n101n1n1n10n11n1

The first line is all 0, then det ⁡ M = 0 \det{M}=0theM=0 , this only shows that the system of equations has infinite solutions, and specifically we need to calculaterank ⁡ ( M ) \operatorname{rank}(\boldsymbol{M})rank ( M ) . Ultimately, the solution to the above system of equations can be written as

X = k ( g ′ ( x 1 − x ˉ ) g ′ ( x 2 − x ˉ ) ⋮ g ′ ( x n − x ˉ ) ) = k ( 1 1 ⋮ 1 ) \boldsymbol{X}=k\left(\begin{array}{c} g^{\prime}\left(x_{1}-\bar{x}\right) \\ g^{\prime}\left(x_{2}-\bar{x}\right) \\ \vdots \\ g^{\prime}\left(x_{n}-\bar{x}\right) \end{array}\right)=k\left(\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right) X=k g(x1xˉ)g(x2xˉ)g(xnxˉ) =k 111

g ′ ( x 1 − x ˉ ) = g ′ ( x 2 − x ˉ ) = ⋯ = g ′ ( x n − x ˉ ) = k g^{\prime}\left(x_{1}-\bar{x}\right)=g^{\prime}\left(x_{2}-\bar{x}\right)=\cdots=g^{\prime}\left(x_{n}-\bar{x}\right)=k g(x1xˉ)=g(x2xˉ)==g(xnxˉ)=k , solving the differential equation, we can get:

g ( t ) = k t + b g(t)=k t+b g(t)=kt+b

Solve this differential equation:

∫ f ′ ( t ) f ( t ) d t = ∫ k t   d t ⇔ ∫ d [ f ( t ) ] f ( t ) = 1 2 k t 2 + c ⇔ ln ⁡ [ f ( t ) ] = 1 2 k t 2 + c ⇔ f ( t ) = K e 1 2 k t 2 \begin{aligned} \int \frac{f^{\prime}(t)}{f(t)} \mathrm{d} t=\int k t \mathrm{~d} t & \Leftrightarrow \int \frac{\mathrm{d}[f(t)]}{f(t)}=\frac{1}{2} k t^{2}+c \\ & \Leftrightarrow \ln [f(t)]=\frac{1}{2} k t^{2}+c \\ & \Leftrightarrow f(t)=K \mathrm{e}^{\frac{1}{2} k t^{2}} \end{aligned} f(t)f(t)dt=k t d t f(t)d[f(t)]=21kt2+cln[f(t)]=21kt2+cf(t)=K e21kt2

At the same time, f (t) f(t)f ( t ) is the probability density function, then it starts from− ∞ -\infty to∞ \inftyThe integral of ∞ is1 11 (regularity of probability density)

∫ − ∞ + ∞ f ( t ) dt = ∫ − ∞ + ∞ K and 1 2 kt 2 dt = K ∫ − ∞ + ∞ e − t 2 2 σ 2 dt = K 2 σ [ ∫ − ∞ + ∞ e − ( t 2 σ ) 2 d ( 1 2 σ t ) ] [ 2 σ ∫ − ∞ + ∞ e − ( s 2 σ ) 2 d ( 1 2 σ s ) ] = K 2 σ ∫ − ∞ + ∞ ∫ − ∞ + ∞ e − ( u 2 + v 2 ) du dv = K 2 σ ∫ 0 2 π d θ ∫ 0 + ∞ e − r 2 r dr = K 2 σ π = 1 \begin{aligned} \int_{-\ infty}^{+\infty} f(t) \mathrm{d}t & =\int_{-\infty}^{+\infty} K \mathrm{e}^{\frac{1}{2} kt ^{2}} \mathrm{~d}t \\ & =K \int_{-\infty}^{+\infty} \mathrm{e}^{-\frac{t^{2}}{2\ sigma^{2}}}\mathrm{~d}t \\ & =K \sqrt{\sqrt{2} \sigma\left[\int_{-\infty}^{+\infty}\mathrm{e} ^{-\left(\frac{t}{\sqrt{2}\sigma}\right)^{2}}\mathrm{~d}\left(\frac{1}{\sqrt{2}\sigma } t\right)\right]\left[\sqrt{2}\sigma \int_{-\infty}^{+\infty} \mathrm{e}^{-\left(\frac{s}{\sqrt {2}\sigma}\right)^{2}}\mathrm{~d}\left(\frac{1}{\sqrt{2}\sigma}s\right)\right]}\\& =K\sqrt{2}\sigma\sqrt{\int_{-\infty}^{+\infty}\int_{-\infty}^{+\ infty} \mathrm{e}^{-\left(u^{2}+v^{2}\right)} \mathrm{d}u \mathrm{~d}v} \\ & =K \sqrt{ 2} \sigma \sqrt{\int_{0}^{2\pi}\mathrm{d}\theta\int_{0}^{+\infty}\mathrm{e}^{-r^{2}} r \mathrm{~d} r} \\ & =K \sqrt{2} \sigma \sqrt{\pi} \\ & =1 \end{aligned}+f(t)dt=+K e21kt2dt __ =K+e2 p2t2 dt=K2 p[+e(2 pt)2d _ (2 p1t)][2 p+e(2 ps)2d _ (2 p1s)] =K2 p++e(u2+v2)du dv =K2 p02 p.md i0+er2 rdr  =K2 pPi =1

Finally, the probability density function is obtained:

f ( t ) = 1 2 π σ e − 1 2 ( t σ ) 2 f(t)=\frac{1}{\sqrt{2 \pi} \sigma} \mathrm{e}^{-\frac{1}{2}\left(\frac{t}{\sigma}\right)^{2}} f(t)=2 p.m p1e21(pt)2

Guess you like

Origin blog.csdn.net/m0_51143578/article/details/132915387