Kalman filter--from derivation to application (1)

foreword

          The Kalman filter is a number of recursive mathematical equations derived for the purpose of minimum mean square error in the process of estimating the state of a linear system , and can also be derived from the perspective of Bayesian inference.

          This article will be divided into two parts:

In the first part, combined with examples, the principle of Kalman filter is introduced intuitively from the perspective of minimum mean square error, and a more detailed mathematical derivation is given.

In the second part, the practical application of Kalman filter is given through two examples. A uniform acceleration model will be introduced in detail, and the influence of the establishment of the system state model on the filtering will be visually compared.


first part

Let's first look at a joke that can help understand the Kalman filter:

There is a winding path in a green meadow that leads to a big tree. A request is made: from the starting point follow the path to the tree.

"It's easy," said A, and he followed the path exactly to the tree.

Now, the difficulty has been increased: blindfolded.

"It's not difficult, I used to be a special forces soldier." B said, so he walked crookedly to the tree. "Oh, I haven't practiced for a long time, I'm rusty." (Only based on my own predictive ability)

"Look at me, I have a DIY GPS!" C said, and he wobbled like a drunken man to the tree. "Alas, this GPS is not done well, the drift is too large." (Only relying on measurements with a lot of noise from the outside world)

"I'll give it a try." A man who had been a special forces soldier next to him took a GPS, blindfolded his eyes, and walked smoothly along the path to the tree. (You can predict by yourself + feedback from measurement results)

"So amazing! Who are you?"
"Kalman!"
"Kalman?! You are Kalman?" Everyone was surprised.
"I mean this GPS is stuck and slow."

This paragraph is quoted from highgear's  "Teach It to Fish: The Kalman Filter... The Big Leak..."  (click to jump to this page)

This little joke interestingly points out the core of Kalman filtering, prediction + measurement feedback, remember this idea.

-------------------------------------------------- --------- Separation line -------------------------------------- --------------------------------

Before introducing the Kalman filter, briefly explain a few concepts to be used in the process of learning Kalman. That is, what is covariance, what does it mean, what is minimum mean square error estimation, and what is multivariate Gaussian distribution. If you have an understanding of these, you can skip it and go directly to the dividing line below.

   Mean squared error : It is the expected value of the square of the "error" ( error is the difference between each estimated value and the true value ), that is, when there are multiple samples, the mean squared error is equal to the squared error of each sample and multiplied by the sample The sum of the probabilities of occurrence.

   Variance : Variance describes the degree of dispersion of a random variable and is the distance of the variable from the expected value .

Note that the two concepts are slightly different, and when your sample expected value is the true value, the two are exactly the same. The minimum mean square error estimation means that when estimating parameters, the expected value of the square error between the estimated model and the real value should be minimized.

Covariance       between two real variables :

                                             

It represents the overall error between the two variables, when Y=X is the variance. Let’s talk about my popular understanding of covariance. Let’s put the expectations in the formula aside, that is, assuming that the probability of occurrence of samples X and Y is 1, then the formula of covariance becomes:

                                                              

This is the multiplication of two things, which is immediately associated with the relevant calculations in numerical images. If the trends of the two variables are consistent, that is, if one of them is greater than its own expected value, and the other is also greater than its own expected value, then the covariance between the two variables is positive. If the trends of the two variables are opposite, that is, one of them is greater than its expected value and the other is less than its own expected value, then the covariance between the two variables is negative. The covariance matrix is ​​just a matrix composed of more elements. The diagonal of the covariance matrix is ​​the variance. For the specific formula form, please refer to the wiki.

   In fact, this form of multiplication is also somewhat similar to vector projection, that is, the inner product of two vectors. A little further, think of the determination of the spectral coefficients in the Fourier transform, to determine the spectrum of a function f(x) at a certain frequency w, that is, <f(x), cos(wt) >, < , > means Vector inner product, in layman's terms, is to project f(x) onto cos(wt). To clarify the essence of Fourier, we need to write another blog post. It is only mentioned here that it is beneficial to mutual understanding of knowledge. 

Gaussian distribution: The image of the probability density function is as shown in the figure below. The variance of the four curves is different, and the variance determines the fat, thin, and height of the curve. ( Image source: Wikipedia)

Multivariate Gaussian distribution: It is the extension of the low-dimensional to high-dimensional Gaussian distribution. The image is as follows.


Please google the formula corresponding to the multivariate Gaussian distribution. The variance in the previous Gaussian formula has also become the covariance. The covariance matrices corresponding to the above three graphs are as follows:

             

Note that the main diagonal of the covariance matrix is ​​the variance, and the anti-diagonal is the covariance between the two variables. As far as the above binary Gaussian distribution is concerned, the larger the covariance, the flatter the image, that is, the more connected the two dimensions are.

-----------------------------------------------------------分割线---------------------------------------------------------------------

       这部分每讲一个数学性的东西,接着就会有相应的例子和直观的分析帮助理解。

       首先假设我们知道一个线性系统的状态差分方程为

                                                   

其中x是系统的状态向量,大小为n*1列。A为转换矩阵,大小为n*n。u为系统输入,大小为k*1。B是将输入转换为状态的矩阵,大小为n*k。随机变量w为系统噪声。注意这些矩阵的大小,它们与你实际编程密切相关。

       看一个具体的匀加速运动的实例。

       有一个匀加速运动的小车,它受到的合力为 ft , 由匀加速运动的位移和速度公式,能得到由 t-1 到 t 时刻的位移和速度变化公式:

                                                 

该系统系统的状态向量包括位移和速度,分别用 xt 和 xt的导数 表示。控制输入变量为u,也就是加速度,于是有如下形式:

                                           

所以这个系统的状态的方程为:

                                         

这里对应的的矩阵A大小为 2*2 ,矩阵B大小为 2*1。 

      貌似有了这个模型就能完全估计系统状态了,速度能计算出,位移也能计算出。那还要卡尔曼干嘛,问题是很多实际系统复杂到根本就建不了模。并且,即使你建立了较为准确的模型,只要你在某一步有误差,由递推公式,很可能不断将你的误差放大A倍(A就是那个状态转换矩阵),以至于最后得到的估计结果完全不能用了。回到最开始的那个笑话,如果那个完全凭预测的特种兵在某一步偏离了正确的路径,当他站在错误的路径上(而他自己以为是正确的)做下一步预测时,肯定走的路径也会错了,到最后越走越偏。

     既然如此,我们就引进反馈。从概率论贝叶斯模型的观点来看前面预测的结果就是先验,测量出的结果就是后验。

     测量值当然是由系统状态变量映射出来的,方程形式如下:

                                       

注意Z是测量值,大小为m*1(不是n*1,也不是1*1,后面将说明),H也是状态变量到测量的转换矩阵。大小为m*n。随机变量v是测量噪声。

     同时对于匀加速模型,假设下车是匀加速远离我们,我们站在原点用超声波仪器测量小车离我们的距离。

                                            

也就是测量值直接等于位移。可能又会问,为什么不直接用测量值呢?测量值噪声太大了,根本不能直接用它来进行计算。试想一个本来是朝着一个方向做匀加速运动的小车,你测出来的位移确是前后移动(噪声影响),只根据测量的结果,你就以为车子一会往前开一会往后开。

对于状态方程中的系统噪声w和测量噪声v,假设服从如下多元高斯分布,并且w,v是相互独立的。其中Q,R为噪声变量的协方差矩阵。

                               

看到这里自然要提个问题,为什么噪声模型就得服从高斯分布呢?请继续往下看。

      对于小车匀加速运动的的模型,假设系统的噪声向量只存在速度分量上,且速度噪声的方差是一个常量0.01,位移分量上的系统噪声为0。测量值只有位移,它的协方差矩阵大小是1*1,就是测量噪声的方差本身。

那么:

                                         

Q中,叠加在速度上系统噪声方差为0.01,位移上的为0,它们间协方差为0,即噪声间没有关联。

     

      理论预测(先验)有了,测量值(后验)也有了,那怎么根据这两者得到最优的估计值呢?首先想到的就是加权,或者称之为反馈。

      我们认定是预测(先验)值,是估计值,为测量值的预测,在下面的推导中,请注意估计和预测两者的区别,不混为一谈。由一般的反馈思想我们得到估计值:

                                       

                                                                 

其中,称之为残差,也就是预测的和你实际测量值之间的差距。如果这项等于0,说明预测和测量出的完全吻合。这种反馈递推的形式又让我联想到数值分析里用来求解线性方程组时的一种迭代方法,Gauss-Seidel迭代法,有兴趣的可以看看。

      现在的关键就是求取这个K。这时最小均方误差就起到了作用,顺便在这里回答为什么噪声必须服从高斯分布,在进行参数估计的时候,估计的一种标准叫最大似然估计,它的核心思想就是你手里的这些相互间独立的样本既然出现了,那就说明这些样本概率的乘积应该最大(概率大才出现嘛)。如果样本服从概率高斯分布,对他们的概率乘积取对数ln后,你会发现函数形式将会变成一个常数加上样本最小均方误差的形式。因此,看似直观上很容易理解的最小均方误差理论上来源就出于那里(详细过程还请自行谷歌,请原谅,什么都讲的话就显得这边文章没有主次了)。

     先看估计值和真实值间误差的协方差矩阵,提醒一下协方差矩阵的对角线元素就是方差,求这个协方差矩阵,就是为了利用他的对角线元素的和计算得到均方差.    

                             

这里请注意ek是向量,它由各个系统状态变量的误差组成。如匀加速运动模型里,ek便是由位移误差和速度误差,他们组成的协方差矩阵。表示如下:

                                 

其中,Serr代表位移误差,Verr代表速度误差,对角线上就是各自的方差。

把前面得到的估计值代入这里能够化简得:

                                                       (1)式

同理,能够得到预测值和真实值之间误差的协方差矩阵:

                                   

注意到系统状态x变量和测量噪声之间是相互独立的。于是展开(1)式可得:

                                    

最后得到:

                             

继续展开:

                     

接下来最小均方差开始正式登场了,回忆之前提到的,协方差矩阵的对角线元素就是方差。这样一来,把矩阵P的对角线元素求和,用字母T来表示这种算子,他的学名叫矩阵的迹。

                                  

最小均方差就是使得上式最小,对未知量K求导,令导函数等于0,就能找到K的值。

                         

                         

Note that this calculation formula K, the transformation matrix H is constant, and the measurement noise covariance R is also constant. So the magnitude of K will be related to the error covariance of the predicted values. It may be further assumed that the dimensions of the matrix in the above formula are all 1*1, and assume that H=1, which is not equal to 0. Then K can be written as follows:

                          

Therefore , the larger the value, the larger the K, and the weight will pay more attention to the feedback. If it is equal to 0, that is, the predicted value is equal to the actual value, then K=0, the estimated value is equal to the predicted value (a priori).

Substitute the calculated K into Pk to simplify Pk and estimate the covariance matrix Pk:

                                 

Therefore, the K of each step in the recursive formula is calculated, and the estimated covariance of each step can also be calculated. But there seems to be another thing in K's formula that we have not yet calculated , which he calls the covariance matrix of the error between the predicted value and the true value. Its recursive calculation is as follows:

 Note first that the recursive form of the predicted value is:     

 

        

        

Since the system state variables and noise are independent, it can be written as:

     

                                             

The recursive formula is also obtained from this . Therefore, we only need to set the initial one , and we can continue to recurse .

Here is a summary of the recursive process and the idea:

The first step is to calculate the predicted value, the error covariance matrix between the predicted value and the true value.

                    

With these two, the Kalman gain K can be calculated, and then the estimated value can be obtained,

                     

Finally, the error covariance matrix between the estimated value and the true value is calculated to prepare for the next recursion.

                     

So far, the theoretical derivation of Kalman filter is over. There are also some details such as the incorrect establishment of the equation of state in practical applications, how the predicted results will be, etc., and some conclusions are left to the second part .


(Please indicate the author and source for reprinting: http://blog.csdn.net/heyijia0327 Please do  not use for commercial purposes without permission)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325732697&siteId=291194637