Ceres introduction and examples (9) On Derivatives (Numeric derivatives)

The other extreme of using analytic derivatives is using numeric derivatives. The key point is that the derivation process of the function f(x) with respect to x can be written in the limit form:

Forward Differences forward difference

Of course, in a computer, we cannot perform numerical limit operations, so what we do is, choose a small value h and approximate the derivative as
D f ( x ) = lim ⁡ h → 0 f ( x + h ) − f ( x ) h Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}Df(x)=h0limhf(x+h)f(x)

The above formula is the simplest and most basic form of numerical differentiation. This is the so-called forward difference formula.

So, how do you build a numerically discriminative version of Rat43Analytic in the Ceres solver ? This can be done in two steps:

  • 1. Define a Functor for a given parameter value and calculate the residual for a given (x, y).
  • 2. NumericDiffCostFunctionUsed to construct a CostFunctionpackage instance Rat43CostFunctor.
struct Rat43CostFunctor {
    
    
  Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {
    
    }

  bool operator()(const double* parameters, double* residuals) const {
    
    
    const double b1 = parameters[0];
    const double b2 = parameters[1];
    const double b3 = parameters[2];
    const double b4 = parameters[3];
    residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
    return true;
  }

  const double x_;
  const double y_;

}
// Analytic算法中手动求解Jacobians的部分被拿掉了。

CostFunction* cost_function =
  new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
    new Rat43CostFunctor(x, y));

This is the minimum amount of work required to define a cost function. The only thing the user needs to do is to make sure that the evaluation of the residuals is implemented correctly and efficiently.

Before going any further, it is instructive to first estimate the error in the forward difference formulation. We do this by considering the Taylor expansion of f around x.
f ( x + h ) = f ( x ) + h D f ( x ) + h 2 2 ! D 2 f ( x ) + h 3 3 ! D 3 f ( x ) + ⋯ D f ( x ) = f ( x + h ) − f ( x ) h − [ h 2 ! D 2 f ( x ) + h 2 3 ! D 3 f ( x ) + ⋯ ] D f ( x ) = f ( x + h ) − f ( x ) h + O ( h ) \begin{split} f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) + \frac{h^3}{3!}D^3f(x) + \cdots \\ Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [ \frac{h}{2!}D^2f(x) + \frac{h^2}{3!}D^3f(x) + \cdots \right]\\ Df(x) &= \frac{ f(x + h) - f(x)}{h} + O(h) \end{split}f(x+h)Df(x)Df(x)=f(x)+hDf(x)+2!h2D2f(x)+3!h3D3f(x)+=hf(x+h)f(x)[2!hD2f(x)+3!h2D3f(x)+]=hf(x+h)f(x)+O(h)
That is, the error in the forward difference formula is O(h)
PS: In the asymptotic error analysis, O ( hk ) O(h^k)O(hThe error of k )hkh^kof the error when h is close enough to 0hk is at most a constant multiple of .

implementation details

NumericDiffCostFunctionImplements a general algorithm to numerically differentiate a given functor. Although NumericDiffCostFunctionthe actual implementation is complex, the resulting CostFunction is roughly as follows:

class Rat43NumericDiffForward : public SizedCostFunction<1,4> {
    
    
   public:
     Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {
    
    }
     virtual ~Rat43NumericDiffForward() {
    
    }
     virtual bool Evaluate(double const* const* parameters,
                           double* residuals,
                           double** jacobians) const {
    
    
       functor_(parameters[0], residuals);
       if (!jacobians) return true;
       double* jacobian = jacobians[0];
       if (!jacobian) return true;

       const double f = residuals[0];
       double parameters_plus_h[4];
       for (int i = 0; i < 4; ++i) {
    
    
         std::copy(parameters, parameters + 4, parameters_plus_h);
         const double kRelativeStepSize = 1e-6;
         const double h = std::abs(parameters[i]) * kRelativeStepSize;
         parameters_plus_h[i] += h;
         double f_plus;
         functor_(parameters_plus_h, &f_plus);
         jacobian[i] = (f_plus - f) / h;
       }
       return true;
     }

   private:
     std::unique_ptr<Rat43Functor> functor_;
 };

Note the choice of the step size h in the above code, we use the relative step size kRelativeStepSize = 1 0 − 6 \text{kRelativeStepSize} = 10^{-6}kRelativeStepSize=106 instead of an absolute step size that is the same for all parameters.
This provides a better estimate of the derivative than the absolute step size. This choice of step size is only suitable for parameter values ​​that are not close to zero. Therefore,NumericDiffCostFunctionthe actual implementation of , uses a more complex step size selection logic, where near zero it switches to a fixed step size.

Central Differences

O(h) error in the forward difference formulation is okay, but not great. A better approach is to use the central difference formula:
D f ( x ) ≈ f ( x + h ) − f ( x − h ) 2 h Df(x) \approx \frac{f(x + h) - f( x - h)}{2h}Df(x)2h _f(x+h)f(xh)

Note that the Forward Difference formula requires only one additional calculation if the value of f(x) is known, but the Central Difference formula requires two calculations, which doubles its cost.
So, is the extra solution worth it?
To answer this question, we again calculate the approximation error in the central difference formula:
f ( x + h ) = f ( x ) + h D f ( x ) + h 2 2 ! D 2 f ( x ) + h 3 3 ! D 3 f ( x ) + h 4 4 ! D 4 f ( x ) + ⋯ f ( x − h ) = f ( x ) − h D f ( x ) + h 2 2 ! D 2 f ( x ) − h 3 3 ! D 3 f ( c 2 ) + h 4 4 ! D 4 f ( x ) + ⋯ D f ( x ) = f ( x + h ) − f ( x − h ) 2 h + h 2 3 ! D 3 f ( x ) + h 4 5 ! D 5 f ( x ) + ⋯ D f ( x ) = f ( x + h ) − f ( x − h ) 2 h + O ( h 2 ) \begin{split} f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\ f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!} D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) + \cdots\\ Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!} D^3f(x) + \frac{h^4}{5!} D^5f(x) + \cdots \\ Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2) \end{split} f(x+h)f(xh)Df(x)Df(x)=f(x)+hDf(x)+2!h2D2f(x)+3!h3D3f(x)+4!h4D4f(x)+=f(x)hDf(x)+2!h2D2f(x)3!h3D3f(c2)+4!h4D4f(x)+=2h _f(x+h)f(xh)+3!h2D3f(x)+5!h4D5f(x)+=2h _f(x+h)f(xh)+O(h2)

The error of the central difference formula is O ( h 2 ) O(h^2)O(h2 ), that is, the error falls quadratically, while the error in the forward difference formula only falls linearly.

It is a simple matter to use central difference instead of forward difference in Ceres Solver, the changed NumericDiffCostFunctiontemplate parameters are as follows

CostFunction* cost_function =
// 将前向差分转换为中心差分很简单:只需要将NumericDiffCostFunction模板参数中的FORWARD更改为CENTRAL
  new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>(
    new Rat43CostFunctor(x, y));

But what do these differences in errors mean in practice? To understand this, consider the problem of deriving a function of one variable
           f ( x ) = ex sin ⁡ x − x 2 , at x = 1.0 f(x) = \frac{e^x}{\sin x - x^2} , at x = 1.0f(x)=sinxx2ex, at x=1.0
           
is easy to find the derivative at this positionD f ( 1.0 ) = 140.73773557129658 Df(1.0)=140.73773557129658Df(1.0)=140.73773557129658
PS: Refer to http://www2.edu-edu.com.cn/lesson_crs78/self/j_0022/soft/ch0302.html for derivatives

Using this value as a reference, we can now calculate the relative error in the forward and center difference formulas as a function of absolute step size, and plot them.
insert image description here
Reading this chart from right to left, a few things stand out in the chart above:

  • 1. The graphs for both formulas have two different regions. First, starting with a very large value of h, this error decreases as the effect of the truncated Taylor series, but as values ​​of h continue to decrease, this error starts to increase again, as the "rounding" error starts to dominate the calculation dominant position. Therefore, we cannot continue to obtain a more accurate estimate of Df by reducing h. Our approximation becomes an obvious limiting factor.
  • 2. The forward difference formula is not a good way to find derivatives. As the step size decreases, the central difference formula converges more quickly to a more accurate derivative estimate. Therefore, unless the evaluation function of f(x) is so complicated that the central difference formula cannot afford it, do not use the forward differentiation formula.
  • 3. For a bad value of h, neither formula applies.

Knight's Method

In the above two differential methods, the accuracy is limited by the precision of the computer's floating-point number, that is, h cannot be infinitely small. So, can we get a better estimate of Df without h being so small that we start hitting the precision limits of floating point numbers?

One possible approach is to find a way to reduce the error by less than O ( h 2 ) O(h^2)O(h2 )Fast method
This can be achieved by applying Richardson Extrapolation to the differentiation problem. This is also known as the Ridders' Method.

Let's review the error in the central difference formula.
D f ( x ) = f ( x + h ) − f ( x − h ) 2 h + h 2 3 ! D 3 f ( x ) + h 4 5 ! D 5 f ( x ) + ⋯ = f ( x + h ) − f ( x − h ) 2 h + K 2 h 2 + K 4 h 4 + ⋯ \begin{split} Df(x) & = \frac{f(x + h) - f(x - h) }{2h} + \frac{h^2}{3!} D^3f(x) + \frac{h^4}{5!} D^5f(x) + \cdots\\ & = \frac{ f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots \end{split}Df(x)=2h _f(x+h)f(xh)+3!h2D3f(x)+5!h4D5f(x)+=2h _f(x+h)f(xh)+K2h2+K4h4+

The key things to note here are K 2 , K 4 , . . . K_2, K_4, ...K2,K4,... has nothing to do with h, only x.
Let us define A ( 1 , m ) = f ( x + h / 2 m − 1 ) − f ( x − h / 2 m − 1 ) 2 h / 2 m − 1 . A(1, m) = \frac {f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}.A(1,m)=2 h / 2m1f(x+h/2m1)f(xh/2m1).

Then observe that D f ( x ) = A ( 1 , 1 ) + K 2 h 2 + K 4 h 4 + ⋯ D f ( x ) = A ( 1 , 2 ) + K 2 ( h / 2 ) 2 + K 4 ( h / 2 ) 4 + ⋯ Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots \\ Df(x) = A(1, 2) + K_2 (h /2)^2 + K_4 (h/2)^4 + \cdotsDf(x)=A(1,1)+K2h2+K4h4+Df(x)=A(1,2)+K2(h/2)2+K4(h/2)4+

Here we halve the step size to obtain a second central difference estimate of Df(x). Combining the above two estimates, we get:
D f ( x ) = 4 A ( 1 , 2 ) − A ( 1 , 1 ) 4 − 1 + O ( h 4 ) Df(x) = \frac{4 A( 1, 2) - A(1,1)}{4 - 1} + O(h^4)Df(x)=414 A ( 1 ,2)A(1,1)+O(h4)

It is an approximation of Df(x) with a truncation error of O ( h 4 ) O(h^4)O(h4 )But we don't have to stop there. We can iterate this process to get more accurate estimates as follows:
A ( n , m ) = { f ( x + h / 2 m − 1 ) − f ( x − h / 2 m − 1 ) 2 h / 2 m − 1 n = 1 4 n − 1 A ( n − 1 , m + 1 ) − A ( n − 1 , m ) 4 n − 1 − 1 n > 1 \begin{split}A(n, m ) = \begin{cases} \frac{\displaystyle f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{\displaystyle 2h/2^{ m-1}} & n = 1 \\ \frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n -1} - 1} & n > 1 \end{cases}\end{split}A(n,m)= 2 h / 2m1f(x+h/2m1)f(xh/2m1)4n114n1A(n1,m+1)A(n1,m)n=1n>1

It is easy to prove that A ( n , 1 ) A(n, 1)A(n,1 ) the approximation error isO ( h 2 n ) O(h^{2n})O(h2 n ). In order to understand how the above formula actually calculatesA ( n , 1 ) A(n, 1)A(n,1 ) For example, one of the following solutions is given:
A ( 1 , 1 ) A ( 1 , 2 ) A ( 1 , 3 ) A ( 1 , 4 ) ⋯ A ( 2 , 1 ) A ( 2 , 2 ) . A ( 2 , 3 ) ⋯ A ( 3 , 1 ) A ( 3 , 2 ) ⋯ A ( 4 , 1 ) ⋯ ⋱ \begin{split}\begin{array}{ccccc} A(1,1) & A( 1, 2) & A(1, 3) & A(1, 4) & \cdots\\ & A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\& & A(3, 1) & A(3, 2) & \cdots\\ & & & A(4, 1) & \cdots \\ & & & & \ddots \end{array}\end{split}A(1,1)A(1,2)A(2,1)A(1,3)A(2,2)A(3,1)A(1,4)A(2,3)A(3,2)A(4,1)

Therefore, to compute A ( n , 1 ) A(n, 1)A(n,1 ) For values ​​of increment n, we move from left to right, calculating one column at a time. Assuming that the main cost here is the evaluation of the function f(x), then the cost of computing a new column in the above table is the evaluation of the two functions.
Since computingA ( 1 , n ) A(1, n)A(1,n ) , the calculation step size is2 1 − nh 2^{1-n}h2The central difference formula of 1 n h

Applying this method to f ( x ) = ex sin ⁡ x − x 2 f(x) = \frac{e^x}{\sin x - x^2}f(x)=sinxx2ex, starting with a rather large step size h=0.01, we get:
141.678097131 140.971663667 140.796145400 140.752333523 140.741384778 140.736185846 140.737639311 140 \begin{split}\begin{array }{rrrrr} 141.678097131 &140.971663667 &140. 796145400 &140.752333523 &140.741384778\\ &140.736185846 &140.737639311 &140.737729564 &140.737735196\\ & &140.737736209 &140.7377 35581 &140.737735571\\ & & &140.737735571 &140.737735571\\ & & & &140. 737735571\\ \end{array}\end{split}141.678097131140.971663667140.736185846140.796145400140.737639311140.737736209140.752333523140.737729564140.737735581140.737735571140.741384778140.737735196140.737735571140.737735571140.737735571

Relative to the reference value D f ( 1.0 ) = 140.73773557129658 , A ( 5 , 1 ) Df(1.0)=140.73773557129658, A(5,1)Df(1.0)=140.73773557129658A(5,1 ) The relative error is1 0 − 13 10^{-13}1013
for comparison, at the same step length0.01 / 2 4 = 0.000625 0.01/2^4 = 0.0006250.01/24=Under 0.000625 , the relative error of the central difference formulais 0.01 / 2 4 = 0.000625 0.01/2^4 = 0.0006250.01/24=0.000625

The above table is the basis of Ridders numerical differentiation method. The full implementation is an adaptive scheme that tracks its own estimation error and stops automatically when the desired accuracy is reached. Of course, it's more expensive than the forward and center difference formulation, but also significantly more robust and accurate.
Using Ridder's method instead of forward or centered differencing is a simple matter in Ceres, changing the NumericDiffCostFunctiontemplate parameters as follows:

CostFunction* cost_function =
  new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>(
    new Rat43CostFunctor(x, y));

The graphs below show the absolute step size versus relative error for the three differencing methods. For Ridders' method, we assume that evaluating A ( n , 1 ) A(n,1)A(n,1 ) The corresponding step size is2 1 − nh 2^{1-n}h21nh

insert image description here

Computing A(5,1) using the Ridders method requires computing the evaluation function ten times, and for Df(1.0) our estimate is 1000 times better than the central difference estimate. To calculate these numbers accurately, the computer's double-precision floating-point type ≈ 2.22 × 1 0 − 16 \approx 2.22 \times 10^{-16}2.22×1016

Going back to Rat43, let's also look at the runtime costs of various methods of computing numerical derivatives

CostFunction

Time (ns)

Rat43Analytic

255

Rat43AnalyticOptimized

92

Rat43NumericDiffForward

262

Rat43NumericDiffCentral

517

Rat43NumericDiffRidders

3760

As expected, the running time of central differencing is about twice that of forward differencing, and Ridders' method is many times longer than the previous two, although its accuracy is also significantly improved.

suggestion

Numerical differentiation should be used when you cannot compute the derivative analytically or using automatic differentiation. This usually happens when you call an external library or function whose analytical form you don't know, or even if you knew, you can't rewrite it in a way that uses Automatic Derivatives.
When using numerical differentiation, try to use the central difference method. If execution time is not a concern, or if an appropriate static relative step size cannot be determined for the objective function, then Ridders' method is recommended.

Guess you like

Origin blog.csdn.net/wanggao_1990/article/details/129713409