Nonlinear least squares method for solving

 Personal blog: http://www.chenjianqu.com/

Original link: http://www.chenjianqu.com/show-85.html

Least squares method

    The least squares method (also known as the least squares method) is a mathematical optimization techniques. It squares matching the data to minimize the error and find the best matching function. Using the least squares method can be easily calculated unknown data, and such that the minimum squared error between the determined data and the actual data to these. Least squares curve fitting may also be used. Mathematics, said:

1.jpg

    Where yi is the i-th actual observed value, or true value or called target; Fi (x) can be regarded as the i-th value x obtained by the prediction parameter, it is an estimate of yi. The goal of least squares to estimate the unknown parameters to minimize the observed value (real value) and the difference between the estimated value.

Linear Least Squares

    The so-called linear, refers to f (x) is a linear function of x, f (x) = x0 + t1 * x1 + ... + tq * xq. Linear least-squares solution is relatively simple.

Nonlinear least squares

    The so-called non-linear, that is, f (x) can not be expressed as a linear relationship, but some kind of non-linear relationship. Consider a simple non-linear least squares problem:

2.jpg

    The type of f (x) to fit the function , can be any scalar nonlinear function F (x) is the objective function , we hope to find a x (ie, x is to be optimized parameters) make the target function F ( x) minimum. There are several ways to solve this problem:

    1. direct derivation , so dF / dx = 0, but this is often difficult to achieve.

    2. iterative method :

3.jpg

    This makes problem solving derivative function is zero became a constantly looking for incremental decline Δxk problems. The following is to find the incremental approach.

 

First and second order gradient method

    Consider the k-th iteration, looking for Δxk, the most direct way is to target function F (x) Taylor expansion in the vicinity of xk:

4.jpg

    Wherein J (xk) is F (x) concerning x, a first derivative ( gradient , Jacobian (the Jacobian) matrix ), H (XK) is second order (ON Hesse (the Hessian) matrix ).

    If the above formula retain only first-order term is called a step of the method or steepest descent method , taking Δx = -J (xk), i.e., a direction the reverse direction increment gradient, usually provided by a step λ.

    If you leave the second order term, this incremental equation:

5.jpg

    Δx find derivatives with respect to the right and to make it 0, to give: J + HΔx = 0 , i.e.  HΔx = -J & lt  . Solving the linear equation is obtained incremental, the method is called second order gradient method or Newton's method .

    A ladder of law too greedy, easily out of the jagged line, but increased the number of iterations; and the second-order gradient rule is necessary to calculate H matrix of the objective function, which is very difficult at larger scale of the problem, we usually tend to avoid the calculation of H.

 

Gauss-Newton method

    To fit the function f (the X-) (not the objective function F (x), otherwise it becomes Newton's method) first-order Taylor expansion:

6.jpg

    Where J (x) is f (x) is the derivative with respect to x, as nx1 column vectors. Our goal is to find the incremental Δx, so that | f (x + Δx) | 2 minimum. In order to seek Δx, requires the solution of a linear least squares problem:

7.jpg

    The above objective function Δx derivative, and to make the derivative equal to zero. To this end, the first expansion of the squared term in the objective function:

8.jpg

    [Delta] x and then the derivative, so that it is 0, to give: J (X) F (X) + J (X) JT (X) [Delta] x = 0 , namely:

J(x)JT(x)∆x = - J(x)f(x) 

    The formula is linear equations on variables Δx, called incremental equation , or so-called Gauss-Newton equation (Gauss-Newton equation) or a normal equation (Normal equation). So H = J (X) JT (X) , G = -J & lt (X) F (X) , the Gauss-Newton equation becomes:

    H∆x=g

    Newton's method in comparison HΔx = -J & lt , Gauss-Newton method using J (x) JT (x) as a Newton approximation method, the Hessian matrix, thereby omitting calculation of H. Solving equations Gauss-Newton optimization is the core of the whole problem, if we can solve this equation, the Gauss-Newton method steps are as follows:

10.jpg

    In order to solve equation increment requires solving for H-1, H-matrix which requires reversible. But in fact only half a definite H, H is singular matrix may be the case or ill-condition, in which case the increment of poor stability, resulting algorithm does not converge. Even H non-singular nor sick, but if you ask too much out of step Δx, we can not guarantee iterative convergence.

 

Levenberg - Marquardt method

    The method convergence rate is slower than the Gauss-Newton method, also called damped Newton's method . Gauss-Newton method used in the second-order approximation of the Taylor expansion can only be deployed near the point better approximate effect, so it is natural to think of Δx to add a range, called trust region (Trust Region) . This range defines the circumstances under which a second order approximation is valid, such methods also called trust region method (Method, the Trust Region) . In the area inside the trust, we believe that approximation is valid; out of this area, similar problems may occur.

    So how do you determine the scope of the regional trust it? A better approach is to be determined based on the difference between our approximate model with the actual function: if the difference is small, indicating a good approximation effect, we have expanded the range of approximation; on the contrary, if the difference is large, narrow the approximate range. We define a metric ρ to characterize how well approximated by the following formula is 6.34:

11.jpg

    ρ is the actual function of the molecule falling value, the denominator is a value falling approximation model. If ρ is close to 1, it is a good approximation. If ρ is too small, indicating much less than the actual value of the reduced value of approximately decreases, it is considered relatively poor approximation, approximation requires narrow range. On the contrary, if ρ is relatively large, then the actual decline more than expected, we can zoom in an approximate range. Thus new steps are as follows:

100.jpg   

    Here μ approximate range of multiple and expansion thresholds are empirical values. In the formula (6.35), an increment to define a sphere of radius μ, only in this sphere is valid. After the belt D, the ball can be seen as an ellipsoid. In the optimization method proposed by Levenberg, D taken into the unit matrix the I , corresponds directly to a ball Δxk constraints. D Marquardt proposed to take a non-negative diagonal matrix - usually JTJ practice of the square root of the diagonal elements, so that the constraint on the gradient in the range of larger dimensions small.

    Obtain gradients need to solve the formula (6.35), this problem is sub-optimization problem with inequality constraints, we use the Lagrange multiplier term constraints into the objective function, constitute the Lagrangian function:

14.jpg

    λ为拉格朗日乘子。类似于高斯牛顿法中的做法,令该拉格朗日函数关于 ∆x 的导数为零,它的核心仍是计算增量的线性方程:(H + λDTD) ∆xk = g 。这里的增量方程相比于高斯牛顿法多了一项λDTD,若 D=I,则求解的是:

     (H + λI) ∆xk = g

    当参数 λ 比较小时,H 占主要地位,这说明二次近似模型在该范围内是比较好的,列文伯格—马夸尔特方法更接近于高斯牛顿法。当 λ 比较大时,λI 占据主要地位,列文伯格—马夸尔特方法更接近于一阶梯度下降法(即最速下降),这说明附近的二次近似不够好。

    列文伯格—马夸尔特方法的求解方式,可在一定程度上避免线性方程组的系数矩阵的非奇异和病态问题,提供更稳定、更准确的增量 ∆x。在实际中,还存在许多其他的方式来求解增量,例如 Dog-Leg 等方法。

 

曲线拟合

    要拟合的曲线:y = exp(ax2 + bx + c) + w,其中a、b、c为曲线参数,w为0均值、σ标准差的高斯噪声。假设有N个观测点,则使用高斯牛顿法求解下面的最小二乘问题以估计曲线参数:

15.jpg

    定义误差为:ei = yi - exp(axi2 + bxi + c),这里的状态变量为a,b,c,求出每个误差项对于状态变量的导数:

17.jpg

    于是雅可比矩阵为:

18.jpg

    得高斯牛顿方程:

19.jpg

    代码如下:

CMakeLists.txt

cmake_minimum_required(VERSION 2.6)
project(gaussnewtontest)
# 添加c++ 11标准支持
set( CMAKE_CXX_FLAGS "-std=c++11" )
include_directories("/usr/include/eigen3")
find_package( OpenCV REQUIRED )
include_directories( ${OpenCV_INCLUDE_DIRS} )
add_executable(gaussnewtontest main.cpp)
target_link_libraries(gaussnewtontest ${OpenCV_LIBS} )
install(TARGETS gaussnewtontest RUNTIME DESTINATION bin)

main.cpp

#include <iostream>
#include <opencv2/opencv.hpp>
#include <Eigen/Core>
#include <Eigen/Dense>

using namespace std;
using namespace Eigen;

int main(int argc, char **argv) {
  double ar = 1.0, br = 2.0, cr = 1.0;         // 真实参数值
  double ae = 2.0, be = -1.0, ce = 5.0;        // 估计参数值
  int N = 100;                                 // 数据点
  double w_sigma = 1.0;                        // 噪声Sigma值
  double inv_sigma = 1.0 / w_sigma;
  cv::RNG rng;   
                                // OpenCV随机数产生器
  vector<double> x_data, y_data;      // 数据
  for (int i = 0; i < N; i++) {
    double x = i / 100.0;
    x_data.push_back(x);
    y_data.push_back(exp(ar * x * x + br * x + cr) + rng.gaussian(w_sigma * w_sigma));
  }
  
  // 开始Gauss-Newton迭代
  int iterations = 100;    // 迭代次数
  double cost = 0, lastCost = 0;  // 本次迭代的cost和上一次迭代的cost
  
  for (int iter = 0; iter < iterations; iter++) {
    Matrix3d H = Matrix3d::Zero();             // Hessian = J^T W^{-1} J in Gauss-Newton
    Vector3d b = Vector3d::Zero();             // bias
    cost = 0;
    for (int i = 0; i < N; i++) {
      double xi = x_data[i], yi = y_data[i];  // 第i个数据点
      double error = yi - exp(ae * xi * xi + be * xi + ce);
      Vector3d J; // 雅可比矩阵
      J[0] = -xi * xi * exp(ae * xi * xi + be * xi + ce);  // de/da
      J[1] = -xi * exp(ae * xi * xi + be * xi + ce);  // de/db
      J[2] = -exp(ae * xi * xi + be * xi + ce);  // de/dc
      H += inv_sigma * inv_sigma * J * J.transpose();
      b += -inv_sigma * inv_sigma * error * J;
      cost += error * error;
    }
    
    // 求解线性方程 Hx=b
    Vector3d dx = H.ldlt().solve(b);
    if (isnan(dx[0])) {
      cout << "result is nan!" << endl;
      break;
    }
    
    if (iter > 0 && cost >= lastCost) {
      cout << "cost: " << cost << ">= last cost: " << lastCost << ", break." << endl;
      break;
    }
    
    ae += dx[0];
    be += dx[1];
    ce += dx[2];
    lastCost = cost;
    cout << "total cost: " << cost << ", \t\tupdate: " << dx.transpose() <<
         "\t\testimated params: " << ae << "," << be << "," << ce << endl;
  }
  
  
  cout << "estimated abc = " << ae << ", " << be << ", " << ce << endl;
  return 0;
}

 

 

References

[1] Ko Vision SLAM14 talk

[2] Jiang Phi linear least squares nonlinear least squares. Https://www.jianshu.com/p/bf6ec56e26bd 

 

Published 74 original articles · won praise 33 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_37394634/article/details/104430491