Newton method, Gauss-Newton method, Levenberg-Marquardt (LM) method (including code) [least square nonlinear solution]

Article directory

Original Link: Address

Personal notes:
Newton (Newton) method, Gauss-Newton (GaussNewton) method, Levenberg-Marquardt (LM) algorithm, etc. Combined with the purpose of realizing the function, the following mainly gives the derivation results, code implementation and some practical applications. At the end of the derivation process, some articles and materials for personal reference will be placed.

One: Newton's method

1 Overview

$x^{(k)}$ near the minimum point $x^{(k)}$ second-order Taylor polynomials to approximate the objective function $f (x)$ , and substitute point $x^{(k)}$ points to the direction of the minimum point of the approximate quadratic function as the search direction $p^{(k)}$ .
Suppose a planning problem: $min f(x),x∈R^n$
where $f (x)$ at point $x^{(k)}$ has a second-order continuous partial derivative, the Hessian matrix $^2f(x^{(k)})$ is positive definite.
Nowminimum point of $f$ $($ $x$ $)$ $K$ -level estimated value $x^{(k)}$ , and $f (x)$ as a second-order Taylor expansion:
insert image description here the main ones are the first three items, and the last item is a high-order infinitesimal.

2: Newton's direction and Newton's method

The main parts in the above formula are:
insert image description here

at $x^{(k)}$ nearby available $Q (x)$ to approximate $f (x)$ , $Q (x) \approx f (x)$ .
So we can useThe minimum point of $Q$ $($ $x$ $)$ $The minimum point of f (x)$ , find $The stagnation point of Q (x)$ :
insert image description here

By gradient $▽ Q (x) = 0$ getsStationary point $x^{(k+1)}$ $of Q$ $($ $x$ $)$ $x^{(k + 1)}$ ， $p^{(k)}$ is Newton direction, step size $\lambda _k$ for $1$

insert image description here
That is described below, where $H$ is the Hessian matrix

3: Basic steps of Newton's method

1: Select initial data: initial point $x^{(0)}$ , the termination condition $e > 0$ , let $k := 0$
2: Find the gradient vector $f^{(k)}$ , and calculate $f^{(k)}||$ :

若 $f^{(k)}||<ε$ , stop selection, output $x^{(k)}$ , otherwise go to the next step.

3: Construct Newton direction:
insert image description here
4: Algorithm iteration:

Solution $x^{(k+1)} = x^{(k)} + p^{(k)}$ with $x^{(k+1)}$ as the next iteration point, let $k := k + 1$ , turn to step 2.

4: Example

Use Newton's method to find the function $f(x) = x_1^2 + 25x_2^2$ , where the initial point is $x^{(0)} = (2, 2)^T$ , $ε = 10^{-6}$ Solution
(1) Find the gradient and Hessian matrix:
insert image description here

(2) Determine Newton's direction:
insert image description here namely: $x_1 = 0, x_2=0$

Here the three-dimensional coordinate system is used to view $x_1 = 0, x_2=0$ There is an extreme value $at 0$
insert image description here
Here, the Newton method has a disadvantage of using the Hessian matrix: the amount of calculation is particularly large. The Gauss-Newton method is introduced below, which replaces the Hessian matrix with the Jacobian matrix on the basis of the Newton method.

Two: Gauss–Newton algorithm (Gauss–Newton algorithm)

1 Overview

The Gauss-Newton algorithm is used to solve nonlinear least squares problems, which is equivalent to minimizing the sum of squares of function values. It is an extension of Newton's method of finding the minimum value of a nonlinear function. Since the sum of squares must be non-negative, the algorithm can be viewed as using Newton's method to iteratively approximate the zero of the sum, thereby minimizing the sum. It has the advantage of not needing to compute the potentially challenging second derivative (aka the Hessian matrix).

Note: The Gauss-Newton method is for nonlinear least squares problems. Detailed Least Squares

2: Gauss-Newton method derivation

①: Objective function problem: independent variable $x$ obtains the dependent variable $y$ ， $m$ observation points:
$X=[x_1,x_2,...x_m]^T$ ， $Y=[y_1,y_2,...y_m]^T$
②：model function： $Y = f(X;β_1,β_2,...,β_n)$ => $f (x; b)$

where $x$ is the independent variable, $y$ is the dependent variable, $β$ is the target parameter. $x and y$ are known, optimize $β$ target parameter.

③：Preference index function: $\displaystyle\sum_{i=1}^{n}(f(x_i;β)-y_i )^2$
④: No.Prediction deviation of observation point $i$ $r_i = f(x_i;β)-y_i)^2$ , then each deviation forms a vector form: $R = [r_1,r_2,...r_m]^T$
⑤: For ③ objective function can be written as: $\displaystyle\sum_{i=1}^{n}r_i^2 = R^TR$
⑥：是电影电视支度： $[\frac{\partial S}{\partial β_1},\frac{\partial S}{\partial β_2},...,\frac{\partial S}{\partial β_n}]^T$ , where for each parameter $β_j$ 求偏密 $\frac{\partial S}{\partial β_j} = 2 \displaystyle\sum_{i=1}^{m}r_i\frac {\partial r_i}{\partial β_j}$

⑦: For $R = [r_1,r_2,...r_m]^T and β$ can be written as the Jacobian matrix

$\begin{bmatrix} \frac{\partial r_1}{\partial β_1}&...&\frac{\partial r_1}{\partial β_n}\\ \vdots & \ddots & \vdots \\ \frac{\partial r_m}{\partial β_1}&...&\frac{\partial r_m}{\partial β_n}\end{bmatrix}$

⑧：target function 梢度（一駖偏寛）： $[\frac{\partial S}{\ partial β_1},\frac{\partial S}{\partial β_2},...,\frac{\partial S}{\partial β_n}]^T => \frac{\partial S}{\partial β_j} = 2 \displaystyle\sum_{i=1}^{m}r_i\frac{\partial r_i}{\partial β_j} => ▽S = 2J^TR$
⑨: Find the objective function Hessian matrix (second-order partial derivative):
from the gradient vector element $\frac{\partial S}{\partial β_j} = 2 \displaystyle\sum_{i=1}^{m}r_i\frac{\partial r_i}{\partial β_j}$ 到黑塞ボタッシット $\frac{\partial ^2S}{\partial β_k\partial β_j} = 2 \frac{\partial }{\partial β_k} (\displaystyle\sum_{i=1}^{m}r_i\frac{\partial r_i}{\partial β_j})$ ，
applicability of chain式法则得： $\frac{ \partial ^2S}{\partial β_k\partial β_j} = 2 \displaystyle\sum_{i=1}^{m}(\frac{\partial r_i}{\partial β_k}\frac{\partial r_i}{\ partial β_j} + r_i\frac{\partial^2 r_i}{\partial β_k\partial β_j})$ ,
among which $O$ matrix element ： $O_{kj} = \displaystyle\sum_{i=1}^{m}r_i\frac{\partial^2 r_i}{ \partial β_k\partial β_j}$
Hessian matrix: $H = 2(J^TJ + O)$
⑩: Written in the form of Newton's method: if the model is better, where $O$ square $r_i$ is close to 0. OO $The O$ matrix is ignored for the convenience of calculation.
insert image description here

3: Gauss-Newton method algorithm flow

1: given initial parameter value $β_0$ (default is $1$ vector). Let $ε = 1^{-10}$
2: For the $K$ times selection, find the current Jacobian matrix $J(β_k)$ and residual value $f(β_k)$ is $R.$ _
3: Solve the incremental equation: $H\Delta β_k = -g$
=> $\Delta β_k= -H^{-1}g$
=> $\Delta β_k≈ -(J^TJ)^{-1}J^TR$ . Here the Jacobian matrix $J^TJ$ Approximate Hessian Matrix $H$ 。
4：若 $\Delta β_k<ε$ ，则最作。 otherwise，令 $β_{k+1} = β_k +\Delta β_k$ , return to step 2.

4: Gauss-Newton method C++ code

   /* 高斯牛顿法(GNA) 解决非线性最小二乘问题 确定目标函数和约束来对现有的参数优化
     */

    template <class _T,class _ResidualsVector,class _JacobiMat>
    Eigen::VectorXd GaussNewtonAlgorithm(Eigen::VectorXd params,_T otherArgs, _ResidualsVector ResidualsVector,_JacobiMat JacobiMat,double _epsilon = 1e-10,int _maxIteCount = 999)
    {
    
    
        int k=0;
        // ε 终止条件
        double epsilon = _epsilon;
        //迭代次数
        int maxIteCount = _maxIteCount;

        //found 为true 结束循环
        bool found = false;
        while(!found && k<maxIteCount)
        {
    
    
            //迭代增加
            k++;
            //获取预测偏差值 r= ^y(预测值) - y(实际值)
            //保存残差值
            Eigen::VectorXd residual = ResidualsVector(params,otherArgs);

            //求雅可比矩阵
            Eigen::MatrixXd Jac = JacobiMat(params,otherArgs);

            // Δx = - (Jac^T * Jac)^-1 * Jac^T * r
            Eigen::VectorXd delta_x =  -  (((Jac .transpose() * Jac ).inverse()) * Jac.transpose() * residual).array();

            qDebug()<<QString("高斯牛顿法：第 %1 次迭代 --- 精度：%2  ").arg(k).arg(delta_x.array().abs().sum());
            //达到精度,结束
            if(delta_x.array().abs().sum() < epsilon)
            {
    
    
                found = true;
            }

            //x(k+1) = x(k) + Δx
            params = params + delta_x;

        }
        return params;
    }

Matrix is a matrix class in the Eigen library , and the Eigen library is introduced here to facilitate algebraic operations.
Approximate Hessian matrix $J(x)^TJ(x)$ may be a singular matrix or ill-conditioned, the following leads to $LM algorithm$ .

Three: Levenberg-Marquardt algorithm (Levenberg-Marquardt algorithm)

1 Overview

In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or simply LM), also known as the damped least squares method (DLS), is used to solve nonlinear least squares problems. These minimization problems arise especially in least squares curve fitting. LMA interpolates between Gauss-Newton Algorithm (GNA) and Gradient Descent. LMA is more robust than GNA, which means that in many cases it can find a solution even if it starts off far from the final minimum. For well-behaved functions and reasonable startup parameters, LMA tends to be slower than GNA. LMA can also be viewed as a Gauss-Newton using a trust region approach.
LMA is used in many software applications to solve general curve fitting problems. By using the Gauss-Newton algorithm, it usually converges faster than first-order methods. However, like other iterative optimization algorithms, LMA can only find local minima, not necessarily global minima. Like other numerical minimization algorithms, the Levenberg–Marquardt algorithm is an iterative process. To start the minimization, the user must provide an initial guess $β$ , where there is only one minimum, an uninformed standard guess would be something like $β = [1,1,...,1]^T$ will work just fine; in the case of multiple minima, the algorithm will only converge to the global minimum when the initial guess is already somewhat close to the final solution.

2: LM algorithm process

1: given initial parameter value $β_0$ (default is $1$ vector). initial $μ_0$ The choice of can depend on $H_0=J(β_0)^TJ(β_0)$ , generally we choose $μ_0 =τ*max_i\{H_{ii}^{(0)}\}$ ，generalτ $τ=10^{−6}$ , here we set $ε_1 = 1^{-10}$ 和 $ε_2 = 1^{-10}$ . $^{_}$
2: If $||g||_\infty≤ε_1$ established, stop.
3: For the $K$ times selection, find the current Jacobian matrix $J(β_k)$ and residual value $f(β_k)$ is $R.$ _
4: Solve the incremental equation: $(H+μ_kI)\Delta β_k = -g$
=> $\Delta β_k= -(H+μ_kI)^{-1}g$
=> $\Delta β_k≈ -(J^TJ+μ_kI)^{-1}J^TR$ . Here the Jacobian matrix $J^TJ$ Approximate Hessian Matrix $H$ 。
5：若 $||\Delta β_k||≤ε_2(||β_k|| + ε_2)$ is established, then stop. Otherwise, let $β_{k+1} = β_k +\Delta β_k$ 。
6： $\rho = \frac{| |F(β_k)||_2^2-||F(β_k+\Delta β_k)||_2^2}{L(0) - L(\Delta β_k)}$ , if $\rho >0$ , find the current Hessian matrix $H ≈ J(β_k)^TJ(β_k)$ and gradient $g = J(β_k)^Tf(β_k)$ . If $||g||_\infty≤ε_1$ established, stop. Update $μ_k$ ， $μ_k =μ_k*max\{\frac{1}{3},1-(2\rho-1)^3\};v=2$ . If $\rho ≤ 0$ ， $μ_k =μ_k*v; v=2*v$ 。

The final pseudocode is as follows:

insert image description here

3: Levenberg-Marquardt method C++ code


    /* 列文伯格马夸尔特法(LMA) ==  使用信赖域的高斯牛顿法，鲁棒性更好， 确定目标函数和约束来对现有的参数优化
     * params 初始参数,待优化
     * otherArgs 其他参数
     * _ResidualsVector 自定义函数：获取预测值和实际值的差值
     * _JacobiMat 自定义函数：获取当前的雅可比矩阵
     * _epsilon 收敛精度
     * _maxIteCount 最大迭代次数
     * _epsilon 和 _maxIteCount 达到任意一个条件就停止返回
     */
    template <class _T,class _ResidualsVector,class _JacobiMat>
    Eigen::VectorXd LevenbergMarquardtAlgorithm(Eigen::VectorXd &params,_T otherArgs, _ResidualsVector ResidualsVector,_JacobiMat JacobiMat,double _epsilon = 1e-12,quint32 _maxIteCount = 99)
    {
    
    
        quint32 iterCount=0;
        double currentEpsilon =0.;
        QElapsedTimer eTimer;

        // ε 终止条件
        double epsilon = _epsilon;
        double _currentEpsilon=0.0;
        // τ
        double tau = 1e-6;

        //迭代次数
        quint32 maxIteCount = _maxIteCount;
        quint32 k=0;
        int v=2;

        //求雅可比矩阵
        Eigen::MatrixXd Jac = JacobiMat(params,otherArgs);

        //用雅可比矩阵近似黑森矩阵
        Eigen::MatrixXd Hessen = Jac .transpose() * Jac ;

        //获取预测偏差值 r= ^y(预测值) - y(实际值)
        //保存残差值
        Eigen::VectorXd residual = ResidualsVector(params,otherArgs);
        //梯度
        Eigen::MatrixXd g = Jac.transpose() * residual;

        //found 为true 结束循环
        bool found = ( g.lpNorm<Eigen::Infinity>() <= epsilon );

        //阻尼参数μ
        double mu =  tau * Hessen.diagonal().maxCoeff();
        eTimer.restart();
        while(!found && k<maxIteCount)
        {
    
    
            k++;
            //LM方向  uI => I 用黑森矩阵对角线代替
            //Eigen::MatrixXd delta_x = - (Hessen + mu*Hessen.asDiagonal().diagonal()).inverse() * g;

            Eigen::VectorXd delta_x = - (Hessen + mu*Eigen::MatrixXd::Identity(Hessen.cols(), Hessen.cols())).inverse() * g;


            if( delta_x.lpNorm<2>() <= epsilon * (params.lpNorm<2>() + epsilon ))
            {
    
    
                currentEpsilon = delta_x.lpNorm<2>();
                found = true;
            }
            else
            {
    
    
                Eigen::VectorXd newParams = params + delta_x;
                //L(0) - L(delta) = 0.5*(delta^-1)*(μ*delta - g)
                //ρ     =    (F(x) - F(x_new)) / (L(0) - L(delta));
                double rho = (ResidualsVector(params,otherArgs).array().pow(2).sum() - ResidualsVector(newParams,otherArgs).array().pow(2).sum())
                        / (0.5*delta_x.transpose()*(mu * delta_x - g)).sum();

                if(rho>0)
                {
    
    
                    params = newParams;
                    Jac = JacobiMat(params,otherArgs);
                    Hessen = Jac.transpose() * Jac ;
                    //获取预测偏差值 r= ^y(预测值) - y(实际值)
                    residual = ResidualsVector(params,otherArgs);
                    g = Jac.transpose() * residual;
                    _currentEpsilon = g.lpNorm<Eigen::Infinity>();
                    found = (_currentEpsilon  <= epsilon );
                    mu = mu* qMax(1/3.0 , 1-qPow(2*rho -1,3));
                    v=2;
                }
                else
                {
    
    
                    mu = mu*v;
                    v = 2*v;
                }
            }
            iterCount=k;
            currentEpsilon = _currentEpsilon;
            //发送 当前迭代次数,当前精度，迭代一次需要的时长
            qDebug()<<QString("当前迭代次数: %1 ,收敛精度: %2 ,迭代时长: %3 ").arg(iterCount).arg(currentEpsilon).arg(eTimer.restart());
        }
        return params;
    }

//====================================================================
//====================================================================

// 例子：下面是曲线拟合使用lm算法来优化，残差值向量和雅可比矩阵需要自己编写
#define DERIV_STEP 1e-5

//线拟合残差值向量
class LineFitResidualsVector
{
    
    
public:
    Eigen::VectorXd  operator()(const Eigen::VectorXd& parameter,const QList<Eigen::MatrixXd> &otherArgs)
    {
    
    
        Eigen::MatrixXd inValue = otherArgs.at(0);
        Eigen::VectorXd outValue = otherArgs.at(1);
        int dataCount = inValue.rows();
        int paramsCount = parameter.rows();
        //保存残差值
        Eigen::VectorXd residual = Eigen::VectorXd::Zero(dataCount);
        //获取预测偏差值 r= ^y(预测值) - y(实际值)
        for(int i=0;i<dataCount;++i)
        {
    
    
            for(int j=0;j<paramsCount;++j)
            {
    
    
                //这里使用曲线方程 y = a1*x^0 + a2*x^1 + a3*x^2 + ...      根据参数(a1,a2,a3,...)个数来设置
                residual(i) += parameter(j) * inValue(i,j);
            }

        }

        return residual - outValue;
    }

};


//求线拟合雅克比矩阵 -- 通过计算求偏导
class LineFitJacobi
{
    
    
    //求偏导
    double PartialDeriv(const Eigen::VectorXd& parameter,int paraIndex,const Eigen::MatrixXd &inValue,int objIndex)
    {
    
    
        Eigen::VectorXd para1 = parameter;
        Eigen::VectorXd para2 = parameter;
        para1(paraIndex) -= DERIV_STEP;
        para2(paraIndex) += DERIV_STEP;

        //逻辑
        double obj1 = 0;
        double obj2 = 0;
        for(int i=0;i<parameter.rows();++i)
        {
    
    
            //这里使用曲线方程 y = a1*x^0 + a2*x^1 + a3*x^2 + ...      根据参数(a1,a2,a3,...)个数来设置
            obj1 += para1(i) * inValue(objIndex,i);
        }

        for(int i=0;i<parameter.rows();++i)
        {
    
    
            //这里使用曲线方程 y = a1*x^0 + a2*x^1 + a3*x^2 + ...      根据参数(a1,a2,a3,...)个数来设置
            obj2 += para2(i) * inValue(objIndex,i);
        }

        return (obj2 - obj1) / (2 * DERIV_STEP);
    }

public:

    Eigen::MatrixXd operator()(const Eigen::VectorXd& parameter,const QList<Eigen::MatrixXd> &otherArgs)
    {
    
    
        Eigen::MatrixXd inValue = otherArgs.at(0);
        int rowNum = inValue.rows();
        int paramsCount = parameter.rows();

        Eigen::MatrixXd Jac(rowNum, paramsCount);

        for (int i = 0; i < rowNum; i++)
        {
    
    
            for (int j = 0; j < paramsCount; j++)
            {
    
    
                Jac(i,j) = PartialDeriv(parameter,j,inValue,i);
            }
        }
        return Jac;
    }
};



/* y = a0 * x^0 + a1 * x^1  求未知系数a0和a1\n"
         * y = a0 * x^0 + a1 * x^1 + a2 * x^2  求未知系数a0和a1和a2\n"
         * y = a0 * x^0 + a1 * x^1 + a2 * x^2 + a3 * x^3  求未知系数a0和a1和a2和a3\n"
         *
         * 矩阵描述
         *  _               _ _   _     _ _
         * |1   x1  x1^2 ...| | a0 |   |y1 |
         * |1   x2  x2^2 ...| | a1 |  =|y2 |
         * |1   x3  x3^2 ...| | a2 |   |y3 |
         * |1   x4  x4^2 ...| | .. |   |y4 |
         * |....         ...|  - -     |...|
         *  -              -            - -
         *  Ax = B
         */

    QList<double> coeffL;
    //数据个数
    int rows = listP.count();
    int col = m_maxPower + 1;
    //m_maxPower
    VectorXd vector_x;

    //创建动态n行，3列矩阵
    MatrixXd matA;
    matA.resize(rows,col);
    VectorXd matB;
    matB.resize(rows,1);

    //构建矩阵
    for(int i=0;i<rows;++i)
    {
    
    
        //A
        for(int j=0;j<col;++j)
        {
    
    
            matA(i,j) = std::pow(listP.at(i).x(),j);
        }
        //B
        matB(i,0) = listP.at(i).y();
    }


    
    //勾选高斯牛顿法
    if(m_GN->isChecked())
    {
    
    
        int iteCount= m_iterationCount->value();
        //初始参数为1
        VectorXd args = VectorXd::Ones(m_maxPower+1);

        QList<MatrixXd> otherArgs;
        otherArgs.append(matA);
        otherArgs.append(matB);
        vector_x =GaussNewtonAlgorithm(args,otherArgs,LineFitResidualsVector(),LineFitJacobi(),1e-10,iteCount);
    }
    //勾选LM
    else if(m_LM->isChecked())
    {
    
    
        int iteCount= m_iterationCount->value();
        //初始参数为1
        VectorXd args = VectorXd::Ones(m_maxPower+1);

        QList<MatrixXd> otherArgs;
        otherArgs.append(matA);
        otherArgs.append(matB);
        LevenbergMarquardtAlgorithm(args,otherArgs,LineFitResidualsVector(),LineFitJacobi(),1e-15,iteCount);

    }

insert image description here

Four: Summary

1: Gauss-Newton method and $The LM$ $algorithm$ belongs to the optimal least squares algorithm, and in the case of given initial parameters, the parameters are further optimized, while $The LM$ $algorithm$ has better robustness, but at the expense of a certain convergence speed. (Optimization understanding: In mathematics, optimization in a narrow sense refers to a class of problems. It has three elements: optimization variables, objective functions, and constraints. Under the condition of satisfying the constraints, the optimization variables are adjusted to minimize the value of the objective function. This is The simplest interpretation of the optimization problem). Newton's method is suitable for finding extreme values.
2: Tools: The main Qt +Eigen library
Eigen library is a library for matrix calculations and algebraic calculations

3: The complete code above has been uploaded to GitHub
4: References Hessian
matrix and Jacobian matrix Understanding
Gauss-Newton method Detailed explanation of
LM algorithm Detailed explanation
of LM paper

insert image description here