State Estimation Study Notes in Robotics (1) Chapter 2 Basics of Probability Theory
-
- 2.1 Probability Density Function
- 2.2 Gaussian probability density function
-
- 2.2.1 Definition
- 2.2.3 Joint Gaussian probability density function, decomposition and inference
- 2.2.4 Statistical independence, irrelevance
- 2.2.5 Linear Transformation of Gaussian Random Variables
- 2.2.6 Normalized product of Gaussian probability density function
- 2.2.7 Sherman-Morrison-Woodbury equation
- 2.2.8 Nonlinear Transformation of Gaussian Random Variables
- 2.3 Gaussian process
The so-called state estimation problem is essentially a problem of re-estimating the internal state of the system based on the prior model and observation data of the system.
2.1 Probability Density Function
2.1.1 Definition
Define x as a random variable on the interval [a, b] , subject to a certain probability density function p ( x ) p(x)p ( x ) , then the functionp ( x ) p(x)p ( x ) must satisfy:
∫ abp ( x ) dx = 1 \int_{a}^{b} p(x)dx=1∫abp(x)dx=1 p ( x ) p(x) The integral of the p ( x ) function is 1 in order to satisfythe axiom of total probability.
For conditional probability, suppose p(x|y) represents the independent variable x∈ \in∈ [a,b] in condition y∈ \in∈ probability function under [r,s], then it satisfies:
( ∀ y ) ∫ abp ( x ∣ y ) dx = 1 (\forall y)\int_{a}^{b} p(x|y)dx =1(∀y)∫abp(x∣y)dx=1 The joint probability density function of N-dimensional continuous random variables can also representp ( x ) p(x)p ( x ) , where x=(x1,…,xN). For each xi, satisfy
xi∈ \in∈ [ai,bi], then, in fact,p ( x 1 , x 2 , . . . , x N ) p(x_{1} ,x_{2} ,...,x_{N } )p(x1,x2,...,xN) instead ofp ( x ) p(x)p ( x ) .
You can also usep ( x , y ) p(x,y)p(x,y ) to represent the joint density of x, y.
2.1.2 Bayesian formula and inference
A joint probability density can be decomposed into the product of a conditional probability density and an unconditional probability density:
p ( x , y ) = p ( x ∣ y ) p ( y ) = p ( y ∣ x ) p ( x ) p( x,y)=p(x|y)p(y)=p(y|x)p(x)p(x,y)=p(x∣y)p(y)=p ( y ∣ x ) p ( x ) Bayesian formula can be obtained from the above formula:
p ( x ∣ y ) = p ( y ∣ x ) p ( x ) p ( y ) p(x|y)=\frac {p(y|x)p(x)}{p(y)}p(x∣y)=p ( and )p(y∣x)p(x)由于 p ( y ) = p ( y ) ∫ p ( x ∣ y ) d x ⏟ 1 = ∫ p ( x ∣ y ) p ( y ) d x = ∫ p ( x , y ) d x = ∫ p ( y ∣ x ) p ( x ) d x p(y)=p(y)\underset{1}{\underbrace{\int p(x|y)dx} } =\int p(x|y)p(y)dx=\int p(x,y)dx=\int p(y|x)p(x)dx p ( and )=p ( and )1
∫p(x∣y)dx=∫p(x∣y)p(y)dx=∫p(x,y)dx=∫p(y∣x)p(x)dx所以 p ( x ∣ y ) = p ( y ∣ x ) p ( x ) p ( y ) = p ( y ∣ x ) p ( x ) ∫ p ( y ∣ x ) p ( x ) d x p(x|y)=\frac{p(y|x)p(x)}{p(y)}= \frac{p(y|x)p(x)}{\int p(y|x)p(x)dx} p(x∣y)=p ( and )p(y∣x)p(x)=∫p(y∣x)p(x)dxp(y∣x)p(x) Therefore, if the prior probability density function p ( x ) p(x) of the state is knownp ( x ) and sensor modelp ( y ∣ x ) p(y|x)p ( y ∣ x ) , the posterior probability density functionp ( x ∣ y ) p(x|y)p(x∣y)。
2.1.3 Moments
The 0th moment of the probability density is the probability of the entire event, which is always equal to 1.
The probability first-order moment is called expectation, with μ μμ expression:
μ = E [ x ] = ∫ xp ( x ) dx \mu =E[x]=\int xp(x)dxm=E [ x ]=∫x p ( x ) d x For the general matrix function F(x), it is:
E [ F ( x ) ] = ∫ F ( x ) p ( x ) dx E[F(x)]=\int F(x )p(x)dxE[F(x)]=∫F ( x ) p ( x ) d x expansion form is:
E [ F ( x ) ] = E [ fij ( x ) ] = ∫ fijp ( x ) dx E[F(x)]=E[f_{ij} (x)]=\int f_{ij}p(x)dxE[F(x)]=E[fij(x)]=∫fijThe second moment of p ( x ) d x probability is called the covariance matrixΣ \SigmaΣ :
Σ = E [ ( x − μ ) ( x − μ ) T ] \Sigma =E[(x-\mu )(x-\mu)^{T} ]S=And [ ( x−m ) ( x−m )T ]The third and fourth moments are called skewness and kurtosis, respectively.
2.1.5 Statistical independence and uncorrelation
Two random variables x and y are statistically independent if their joint probability density function can be factorized as follows:
p ( x , y ) = p ( x ) p ( y ) p( x,y)=p(x)p(y)p(x,y)=p ( x ) p ( y ) Similarly, if the expected operation of these two variables satisfies the following formula, they are uncorrelated:
E [ xy T ] = E [ x ] E [ y ] TE[xy^{T } ]=E[x]E[y]^{T}E [ x yT]=E[x]E[y]T But "uncorrelated" is weaker than "independent". If two random variables are statistically independent, then they must be uncorrelated, but if two random variables are uncorrelated, then they are not necessarily statistically independent.
2.1.6 Normalized product
If p 1 (x) and p 2 (x) are two different probability density functions of a random variable x, then their normalized product is defined as:
p ( x ) = η p 1 ( x ) p 2 ( x ) p(x)=\eta p_{1} (x)p_{2} (x)p(x)=the p1(x)p2(x)其中 η = ( ∫ p 1 ( x ) p 2 ( x ) d x ) − 1 \eta=\left ( \int p_{1} (x)p_{2} (x)dx \right ) ^{-1} the=(∫p1(x)p2(x)dx)− 1 is a constant normalization factor used to ensure that p(x) satisfies the axiom of total probability.
2.2 Gaussian probability density function
2.2.1 Definition
In the one-dimensional case, the Gaussian probability density function is expressed as:
p ( x ∣ μ , σ 2 ) = 1 2 π σ 2 exp ( − 1 2 ( x − μ ) 2 σ 2 ) p(x|\mu ,\ sigma^{2} )=\frac{1}{\sqrt{2\pi \sigma^{2}} }exp\left ( -\frac{1}{2}\frac{(x-\mu )^ {2} }{\sigma^{2}} \right )p(x∣μ,p2)=2 p.s _21exp(−21p2(x−m )2) whose one-dimensional Gaussian PDF image is as follows:
the Gaussian distribution of multidimensional variables is expressed as:
p ( x ∣ μ , Σ ) = 1 ( 2 π ) N det Σ exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x|\mu ,\Sigma )=\frac{1}{\sqrt{ (2\pi)^N det\Sigma} }exp\left ( -\frac{1}{2} (x-\mu)^{T}\Sigma ^{-1} (x-\mu ) \right )p(x∣μ,S )=( 2 p )N detΣ1exp(−21(x−m )T Σ−1(x−μ ) ) exceptμ = E [ x ] = ∫ − ∞ ∞ x 1 ( 2 π ) N the Σ exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) dx \mu=E[ x]=\int_{-\infty }^{\infty } x\frac{1}{\sqrt{(2\pi)^N the\sigma} }exp\left ( -\frac{1}{2} (x-\in)^{T}\Sigma ^{-1} (x-\in ) \right )dxm=E [ x ]=∫−∞∞x( 2 p )N detΣ1exp(−21(x−m )T Σ−1(x−m ) )d x Σ = E [ ( x − μ ) ( x − μ ) T ] = ∫ − ∞ ∞ ( x − μ ) ( x − μ ) T 1 ( 2 π ) N det Σ exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) dx \Sigma =E[(x-\mu )(x-\mu )^{T} ]=\int_{-\infty }^{\infty } (x -\mu )(x-\mu )^{T}\frac{1}{\sqrt{ (2\pi)^N det\Sigma} }exp\left ( -\frac{1}{2}(x -\mu)^{T}\Sigma ^{-1} (x-\mu ) \right )dxS=And [ ( x−m ) ( x−m )T]=∫−∞∞(x−m ) ( x−m )T( 2 p )N detΣ1exp(−21(x−m )T Σ−1(x−m ) )d x Conventionally, the normal distribution (Gaussian distribution) is recorded as:
x ∼ N ( μ , Σ ) x\sim N(\mu ,\Sigma )x∼N ( μ ,Σ ) If the random variable x satisfies:x ∼ N ( 0 , 1 ) x\sim N(0 ,1 )x∼N(0,1 )
where 1 is an N×N identity matrix, it can be considered that the random variable x obeys the standard normal distribution.
2.2.3 Joint Gaussian probability density function, decomposition and inference
Suppose a pair of variables (x,y) obey multivariate normal distribution, their joint probability density function is:
p ( x , y ) = N ( [ μ x μ y ] , [ Σ xx Σ xy Σ yx Σ yy ] ) p(x,y)=N\left ( \begin{bmatrix} \mu _{x} \\\mu _{y} \end{bmatrix} ,\begin{bmatrix} \Sigma _{xx} & \Sigma _{xy}\\ \Sigma _{yx} &\Sigma _{yy} \end{bmatrix}\right )p(x,y)=N([mxmy],[SxxSyxSxySyy] . : [
Σ xx Σ
Σ yx Σ yy ] = [ 1 Σ xy Σ yy − 1 0 1 ] [ Σ xx − Σ xy Σ yy − 1 Σ yx 0 0 Σ yy ] [ 1 0 Σ yy − 1 Σ yx 1 ] \begin{bmatrix} \Sigma _{xx} &\Sigma _; {xy}\\\Sigma_{yx} &\Sigma_{yy}\end{bmatrix}=\begin{bmatrix}1 & \Sigma_{xy}\Sigma_{yy}^{-1}\\ 0 &1 \end{bmatrix} \begin{bmatrix} \Sigma _{xx}-\Sigma _{xy}\Sigma _{yy}^{-1}\Sigma _{yx} & 0 \\ 0 &\Sigma _{yy} \end{bmatrix} \begin{bmatrix} 1&0\\\Sigma _{yy}^{-1}\Sigma _{yx}&1\end{bmatrix}[SxxSyxSxySyy]=[10SxySyy−11][Sxx−SxySyy−1Syx00Syy][1Syy−1Syx01] among which 1 is the unit matrix。 再安全通过求逆可得:
[ Σ xx Σ xy Σ yx Σ yy ] − 1 = [ 1 0 − Σ yy − 1 Σ yx 1 ] [ ( Σ xx − Σ xy Σ yy − 1 Σ yx ) − 1 0 0 Σ yy − 1 ] [ 1 − Σ xy Σ yy − 1 0 1 ] \begin{bmatrix} \Sigma _{xx} &\Sigma _{xy} \\ \Sigma _{ yx} &\Sigma _{yy} \end{bmatrix}^{-1} =\begin{bmatrix} 1 & 0 \\ -\Sigma _{yy}^{-1}\Sigma _{yx} &1\ end{bmatrix} \begin{bmatrix} \left ( \Sigma _{xx}-\Sigma _{xy}\Sigma _{yy}^{-1}\Sigma _{yx} \right ) ^{-1} & 0 \\ 0 &\Sigma _{yy}^{-1} \end{bmatrix} \begin{bmatrix} 1 & -\Sigma _{xy}\Sigma _{yy}^{-1} \\ 0 &1 \end{bmatrix}[SxxSyxSxySyy]−1=[1− Syy−1Syx01][( Sxx−SxySyy−1Syx)−100Syy−1][10− SxySyy−11] Therefore, we can get:
carefully observing the above formula, we can know that the result after simplification is the sum of two quadratic terms, which can be recorded as:
x ∼ N ( μ x + Σ xy Σ yy − 1 ( y − μ y ) , Σ xx − Σ xy Σ yy − 1 Σ yx ) x\sim N(\mu _{x} +\Sigma _{xy}\Sigma _{yy}^{-1} (y-\mu _{ y}),\Sigma _{xx}-\Sigma _{xy}\Sigma _{yy}^{-1} \Sigma _{yx})x∼N ( mx+SxySyy−1(y−my),Sxx−SxySyy−1Syx) y ∼ N ( μ y , Σ yy ) y\sim N(\mu _{y},\Sigma _{yy} )y∼N ( my,Syy) Therefore, it can be obtained that
the factors p(x|y) and p(y) are both Gaussian probability density functions, so if the observed value y is known, it can be obtained by calculating p(x|y) through formula (2.53b) The likelihood of x given the value of y.
actually this isGaussian inferenceThe most important part: according to the prior probability distribution of the known state, combined with the observation model to narrow the range, and further obtain the posterior probability distribution. In formula (2.53b), it can be seen that some adjustments have taken place in the mean, and the covariance matrix has also become smaller, that is to say, its uncertainty has decreased.
2.2.4 Statistical independence, irrelevance
In the case of Gaussian probability density functions, statistical independence and uncorrelation are equivalent, that is, two Gaussian variables that are statistically independent are uncorrelated (as is usually true for general probability density functions), and two uncorrelated Gaussian variables The variables are also statistically independent (not all probability density functions hold).
2.2.5 Linear Transformation of Gaussian Random Variables
Suppose there is a Gaussian random variable x ∈ RN ∼ N ( μ x , Σ xx ) x\in \mathbb{R} ^{N} \sim N(\mu _{x },\Sigma _{xx} )x∈RN∼N ( mx,Sxx) and another random variable y linearly dependent on x:y = G xy=Gxy=Let G x be a continuous variable:
it is also a functiony ∼ N ( µ y , Σ yy ) = N ( G µ x , G Σ yy GT ) y \sim N(\mu _{y},\Sigma _{yy } )=N(G\mu_{x},G\Sigma_{yy}G^{T} )y∼N ( my,Syy)=N ( G μx,G SyyGT ).
Another method is the variable substitution method. Assume that this mapping is injective, that is, two x values cannot correspond to the same y value; in fact, this injective condition can be simplified by a stricter condition, that is, G is reversible (so M=N).
According to the axiom of total probability:∫ abp ( x ) dx = 1 \int_{a}^{b} p(x)dx=1∫abp(x)dx=1 The x in a small area is mapped to y, and becomes:dy = ∣ det G ∣ dx dy=|det G|dxdy=∣ d e t G ∣ d x can then be substituted into the above formula, and we can get:μ y = G μ x \mu _{y} =G\mu _{x}
can also be obtained from the above formulamy=G μxForm yy = G Σ xx GT \ Sigma _{yy} = G\ Sigma _{xx}G^{T}Syy=G SxxGT. _
However, if M<N, the linear map is not injective, and it is impossible to find the distribution of y by definite integral variable substitution. However, if M<N, rank(G)=M, the linear mapping from y to x can also be considered. But this will be a bit troublesome, because the mapping will expand the variables into a larger space, so the actual covariance matrix of x will become larger. In order to avoid this problem, the form of information matrix is adopted. The so-called information matrix is the inverse of the covariance matrix, which is used to represent the reliability of this measurement, that is, the smaller the uncertainty, the greater the reliability. Letu = Σ yy − 1 yu=\Sigma _{yy} ^{-1} yu=Syy−1y can be obtained∼ N ( Σ yy − 1 μ y , Σ yy − 1 ) u\sim N(\Sigma _{yy} ^{-1} \mu _{y},\Sigma _{yy} ^ {-1} )u∼N ( Syy−1my,Syy−1) Similarly, letv = Σ xx − 1 xv=\Sigma _{xx} ^{-1} xv=Sxx−1x can be obtained∼ N ( Σ xx − 1 μ y , Σ xx − 1 ) u\sim N(\Sigma _{xx} ^{-1} \mu _{y},\Sigma _{xx} ^{ -1} )u∼N ( Sxx−1my,Sxx−1) Since the mapping from y to x is not unique, it is necessary to choose a special mapping, set as:v = GT u ⇔ Σ xx − 1 x = GT Σ yy − 1 yv=G^{T} u\Leftrightarrow \Sigma _{xx}^{-1} x=G^{T} \Sigma _{yy}^{-1} yv=GTu⇔Sxx−1x=GT Σyy−1y Then the expectation can be calculated:
It is worth noting that ifΣ xx − 1 \Sigma _{xx}^{-1}Sxx−1Without full rank, it is impossible to recover Σ xx \Sigma _{xx}Sxx和μ x \mu _{x}mx, so the distribution can only be expressed in the form of information. But the distribution of this form of information expression can also be fused.
2.2.6 Normalized product of Gaussian probability density function
There is a useful property in the Gaussian probability density function, that is, the normalized product of K Gaussian probability density functions is still a Gaussian probability density function:
η \etaη is a normalizing constant that ensures that the density function satisfies the axiom of total probability. When fusing multiple estimates together, a Gaussian normalized product is used, as shown in the figure below:
We have similar results for linear transformations of Gaussian distributed random variables:
2.2.7 Sherman-Morrison-Woodbury equation
The Sherman-Morrison-Woodbury (SMW) identity is also known as the matrix inversion lemma.
For an invertible matrix, it can be decomposed into a lower-diagonal-upper-triangular (LDU) form or an upper-diagonal-lower-triangular (UDL) form as follows: Then invert both sides of Eq
. For LDU, it can be obtained:
For UDL, it can be obtained:
Comparing the results of Equation (2.73) and Equation (2.74), the following equation can be obtained:
These will be frequently used when dealing with the covariance matrix of the Gaussian probability density function later identity.
2.2.8 Nonlinear Transformation of Gaussian Random Variables
Next, study the Gaussian distribution after a random nonlinear transformation, that is, calculate: p ( y ) = ∫ − ∞ ∞ p ( y ∣ x ) p ( x ) dxp(y)=\int_{-\infty }^ {\infty } p(y|x)p(x)dxp ( and )=∫−∞∞p ( y ∣ x ) p ( x ) d x minusp ( y ) = ∫ − ∞ ∞ p ( y ∣ x ) p ( x ) dxp(y)=\int_{-\infty }^{\infty } p (y|x)p(x)dxp ( and )=∫−∞∞p(y∣x)p(x)dx p ( x ) = N ( μ x , Σ x x ) p(x)=N(\mu _{x} ,\Sigma _{xx} ) p(x)=N ( mx,Sxx) Here g(·) meansg : x ↦ yg:x\mapsto yg:x↦y , is a nonlinear map. It is affected by Gaussian noise with covariance R. This type of stochastic nonlinear mapping is needed later to model the sensor. It is necessary to transfer the Gaussian distribution into the nonlinear transformation. For example, in Bayesian inference, the denominator of the Bayesian formula often has such a nonlinear transformation.
Nonlinear Mapping in the Scalar Case
First look at the simplified situation: x is a scalar, and the nonlinear function g( ) is determined (ie R=0). Let x ∈ R 1 x\in \mathbb{R} ^{1}x∈R1 is a Gaussian random variable:x ∼ N ( 0 , σ 2 ) x\sim N(0,\sigma ^{2} )x∼N(0,p2 )From the x-PDF:P ( x ) = 1 2 π σ 2 exp ( − 1 2 x 2 σ 2 ) P(x)=\frac{1}{\sqrt{2\pi\sigma^{2} } } exp(-\frac{1}{2}\frac{x^{2}}{\sigma^{2}} )P(x)=2 p.s _21exp(−21p2x2) Now consider the non-linear mapping:y = exp ( x ) y=exp(x)y=e x p ( x ) it is obviously reversible:x = ln ( y ) x=ln(y)x=l n ( y ) on the infinite small interval, the relationship between x and y is:dy = exp ( x ) dx dy=exp(x)dxdy=exp(x)dx或者 d x = 1 y d y dx=\frac{1}{y} dy dx=y1d y According to the axiom of total probability, there is:
the above formula is the exact expression of p(y), whenσ 2 = 1 \sigma ^2=1p2=When 1 , its image is shown in the figure below:
the part below the curve is from y=0 to∞ \inftyThe sum of the areas of ∞ is 1. After a large number of samples of x and the nonlinear transformation g(·), a gray histogram can be obtained, which can be regarded as an approximation of the black curve. It can be seen that the approximate value is consistent with the real value, which verifies the correctness of the above transformation.
Note that p(y) no longer obeys the Gaussian distribution because of the nonlinear transformation.
General case of linearization
However, closed-form solutions cannot be obtained for every g(·), and in the case of multi-dimensional variables, the calculation will become extremely complicated. Also, when the nonlinear transformation is random (R > 0), the mapping is necessarily irreversible due to redundant noise inputs, so a different approach is required to address this situation. There are many ways to deal with it. This time we will introduce the most commonly used methods.linearization.
After linearizing the nonlinear transformation, we get: g ( x ) ≈ μ y + G ( x − μ x ) g(x)\approx \mu _{y} +G(x-\mu _{x} )g(x)≈my+G(x−mx) G = ∂ g ( x ) ∂ x ∣ x = μ x G=\frac{\partial g(x)}{\partial x} \mid_{ x=\mu _{x} } G=∂x∂g(x)∣x = mx μ y = g ( μ x ) \mu _{y}=g(\mu _{x}) my=g ( mx) where G is the Jacobian of g(·) with respect to x. After linearization, the "closed-form solution" of the above problem can be obtained. This solution is actually an approximate solution of the above problem. The approximate solution is only established when the nonlinear property of the mapping is not strong.
As shown in the above figure, it describes the result of a one-dimensional Gaussian PDF after nonlinear transformation g( ), where g( ) is linearized.
From this formula:p ( y ) = ∫ − ∞ ∞ p ( y ∣ x ) p ( x ) dxp(y)=\int_{-\infty }^{\infty } p(y|x)p(x) dxp ( and )=∫−∞∞p ( y ∣ x ) p ( x ) d x can be obtained:
whereη \etaη is a normalization constant, and the matrix F is defined such that: FT ( GTR − 1 G + Σ xx − 1 ) = R − 1 GF^{T} (G^{T} R^{-1} G+\Sigma _ {xx} ^{-1} )=R^{-1} GFT(GTR−1G+Sxx−1)=R− 1 Gthen needs to complete the square term in the integral, that is,
the second factor has nothing to do with x, so it does not need to be integrated. The first factor is the Gaussian distribution of x, so the integration of x can get a constant, and then with the constantη \etaη merged. Similarly, for p(y), there are:
whereρ \rhoρ is a new normalization constant. This formula is the Gaussian distribution of y: y ∼ N ( μ y , Σ yy ) = N ( g ( μ x ) , R + G Σ xx GT ) y\sim N(\mu _{y},\Sigma _{ yy} )=N(g(\mu _{x} ),R+G\Sigma _{xx} G^{T})y∼N ( my,Syy)=N ( g ( mx),R+G SxxGT)
2.3 Gaussian process
Will satisfy the Gaussian distribution of variables x ∈ RN x\in \mathbb{R} ^{N}x∈RN记的:x ∼ N ( μ , Σ ) x\sim N(\mu ,\Sigma )x∼N ( μ ,Σ ) and use this type of random variable to express discrete-time state quantities.
The continuous state quantity at time t will be discussed next. To do this, a Gaussian process needs to be introduced first. The following figure describes the trajectory represented by the Gaussian process, where the mean value at each moment is represented by a mean functionσ ( t ) \sigma(t)σ ( t ) describes the variance at two different moments with a covariance functionΣ ( t , t ′ ) \Sigma (t,t')S ( t ,t' )description.
The black solid line is the mean function, and the shaded area is the covariance function.
The entire trajectory can be considered as a random variable in a collection of functions. The closer a function is to the mean function, the more similar the trajectories are. The covariance function describes the smoothness of the trajectory by describing the correlation of random variables at two moments t, t'. We denote this random variable function as:It shows that the continuous time trajectory is a Gaussian process. In fact, Gaussian processes are not limited to the case where the expression is one-dimensional with respect to time.
If only for a specific timeτ \tauIf you are interested in the case of τ , you can write the following expression: x ( τ ) ∼ N ( μ ( τ ) , Σ ( τ , τ ) ) x(\tau)\sim N(\mu (\tau ),\Sigma (\tau ,\tau ))x ( τ )∼N ( m ( t ) ,S ( t ,τ ) ) hereΣ( τ , τ ) \Sigma (\tau ,\tau )S ( t ,τ ) is the ordinary covariance matrix. We can marginalize all other moments, leaving only this particular timeτ \taux under τ ( τ ) x(\tau)x ( τ ) can be regarded as a general Gaussian random variable.
Usually Gaussian processes have different manifestations. A commonly used Gaussian process is the zero-mean, white-noise Gaussian process. For zero-mean white noiseω ( τ ) \omega (\tau )ω ( τ ) ,Determine: ω ( τ ) ∼ gp ( 0 , Q δ ( t − t ′ ) ) \omega (\tau )\sim \mathcal{g} \mathcal{p} (0,Q\delta (tt'))ω ( t )∼gp(0,Qδ(t−t′ ))where Q is the energy spectral density matrix,δ ( t − t ′ ) \delta (tt')d ( t−t′ )is Diracδ \deltaDelta function. Since its value only depends on the time difference t-t', the zero-mean white noise process is actually a stationary noise process.
Reference:
"State Estimation in Robotics"