Ceres introduction and examples (8) On Derivatives (Analytic Derivatives)

Consider the fitting problem of the following curve ( Rat43 ):
y = b 1 ( 1 + eb 2 − b 3 x ) 1 / b 4 y = \frac{b_1}{(1+e^{b_2-b_3x})^{ 1/b_4}}y=(1+eb2b3x)1/b4b1
That is to say, given some data { xi , yi } , ∀ i = 1 , . . . , n \{x_i, y_i\},\ \forall i=1,... ,n{ xi,yi}, i=1,...,n , determine the parameters b 1 , b 2 , b 3 , b 4 thatbest fit the datab1,b2,b3,b4

The problem we face is to solve b 1 , b 2 , b 3 , b 4 b_1, b_2, b_3, b_4b1,b2,b3,b4E ( b 1 , b 2 , b 3 , b 4 ) = ∑ if 2 ( b 1 , b 2 , b 3 , b 4 ; xi , yi ) = ∑ i ( b 1
( 1 + eb 2 − b 3 xi ) 1 / b 4 − yi ) 2 \begin{split} E(b_1, b_2, b_3, b_4) &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i , y_i)\\ &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\ \end{split}E(b1,b2,b3,b4)=if2(b1,b2,b3,b4;xi,yi)=i((1+eb2b3xi)1/b4b1yi)2

The notion of best fit depends on the choice of an objective function to measure the quality of the fit, which in turn depends on the underlying noisy processes that produced the observations. Minimizing the sum of squared differences is the right thing to do when the noise is Gaussian. In this case, the optimal value for the parameter is the maximum likelihood estimate.

To solve this problem using the Ceres solver, we need to define a CostFunction to compute the residual f and its derivatives with respect to b1, b2, b3 and b4 given x and y. According to the knowledge of calculus in advanced mathematics, we can calculate a series of derivatives of f:
D 1 f ( b 1 , b 2 , b 3 , b 4 ; x , y ) = 1 ( 1 + e b 2 − b 3 x ) 1 / b 4 D 2 f ( b 1 , b 2 , b 3 , b 4 ; x , y ) = − b 1 e b 2 − b 3 x b 4 ( 1 + e b 2 − b 3 x ) 1 / b 4 + 1 D 3 f ( b 1 , b 2 , b 3 , b 4 ; x , y ) = b 1 x e b 2 − b 3 x b 4 ( 1 + e b 2 − b 3 x ) 1 / b 4 + 1 D 4 f ( b 1 , b 2 , b 3 , b 4 ; x , y ) = b 1 log ⁡ ( 1 + e b 2 − b 3 x ) b 4 2 ( 1 + e b 2 − b 3 x ) 1 / b 4 \begin{split} D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\ D_2 f(b_1, b_2, b_3, b_4; x,y) &= \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ D_3 f(b_1, b_2, b_3, b_4; x,y) &= \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}} \end{split} D1f(b1,b2,b3,b4;x,y)D2f(b1,b2,b3,b4;x,y)D3f(b1,b2,b3,b4;x,y)D4f(b1,b2,b3,b4;x,y)=(1+eb2b3x)1/b41=b4(1+eb2b3x)1/b4+1b1eb2b3x=b4(1+eb2b3x)1/b4+1b1x eb2b3x=b42(1+eb2b3x)1/b4b1log(1+eb2b3x)

From these manually computed derivatives, we can now implement the CostFunction:

class Rat43Analytic : public SizedCostFunction<1,4> {
    
    
   public:
     Rat43Analytic(const double x, const double y) : x_(x), y_(y) {
    
    }
     virtual ~Rat43Analytic() {
    
    }
     virtual bool Evaluate(double const* const* parameters,
                           double* residuals,
                           double** jacobians) const {
    
    
       const double b1 = parameters[0][0];
       const double b2 = parameters[0][1];
       const double b3 = parameters[0][2];
       const double b4 = parameters[0][3];

       residuals[0] = b1 *  pow(1 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;

       if (!jacobians) return true;
       double* jacobian = jacobians[0];
       if (!jacobian) return true;

       jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
       jacobian[1] = -b1 * exp(b2 - b3 * x_) *
                     pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
       jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
                     pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
       jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
                     pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
       return true;
     }

    private:
     const double x_;
     const double y_;
 };

It's a tedious piece of code that's hard to read and has a lot of redundancy. So in practice, we would cache some subexpressions to improve its efficiency, which would result in the following:

class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
    
    
   public:
     Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {
    
    }
     virtual ~Rat43AnalyticOptimized() {
    
    }
     virtual bool Evaluate(double const* const* parameters,
                           double* residuals,
                           double** jacobians) const {
    
    
       const double b1 = parameters[0][0];
       const double b2 = parameters[0][1];
       const double b3 = parameters[0][2];
       const double b4 = parameters[0][3];

       const double t1 = exp(b2 -  b3 * x_);
       const double t2 = 1 + t1;
       const double t3 = pow(t2, -1.0 / b4);
       residuals[0] = b1 * t3 - y_;

       if (!jacobians) return true;
       double* jacobian = jacobians[0];
       if (!jacobian) return true;

       const double t4 = pow(t2, -1.0 / b4 - 1);
       jacobian[0] = t3;
       jacobian[1] = -b1 * t1 * t4 / b4;
       jacobian[2] = -x_ * jacobian[1];
       jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
       return true;
     }

   private:
     const double x_;
     const double y_;
 };

How do these two implementations differ in performance?

CostFunction

Time (ns)

Rat43Analytic

255

Rat43AnalyticOptimized

92

Rat43AnalyticOptimized2.8 times faster than Rat43Analytic. This difference in runtime is not uncommon. To get the best performance from analytically computed derivatives, it is often necessary to optimize the code to account for common subexpressions.

When should analytical derivatives be used?

  • 1. The expression is simple, e.g. mostly linear

  • 2. A computer algebra system like Maple, Mathematica or symy can be used to symbolically differentiate the objective functions and generate c++ to compute them.

  • 3. There are some algebraic structures in the formula that can achieve better performance than automatic differentiation.
    That said, obtaining maximum performance outside of computing the reciprocal is quite a lot of work. Before going down this path, it is useful to estimate that the computational cost of the Jacobian is a fraction of the overall solution time, remember Live Amdahl's Law is your friend.

  • 4. There is no other way to calculate the derivative, for example you want to calculate the derivative of the roots of a polynomial:
    a 3 ( x , y ) z 3 + a 2 ( x , y ) z 2 + a 1 ( x , y ) z + a 0 ( x , y ) = 0 a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0a3(x,y)z3+a2(x,y)z2+a1(x,y)z+a0(x,y)=0

    For x, y, this requires the use of the inverse function theorem

  • 5. You like the chain rule and do algebraic calculations by hand.

Guess you like

Origin blog.csdn.net/wanggao_1990/article/details/129712740