Ceres introduction and examples (10) On Derivatives (Automatic Derivatives)

Now we will discuss the automatic differentiation algorithm. It is an algorithm that can quickly calculate exact derivatives, while the user only needs to do the work similar to numerical differentiation. The code snippet below implements the CostFunction for Rat43 .

struct Rat43CostFunctor {
    
    
  Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {
    
    }

  template <typename T>
  bool operator()(const T* parameters, T* residuals) const {
    
    
    const T b1 = parameters[0];
    const T b2 = parameters[1];
    const T b3 = parameters[2];
    const T b4 = parameters[3];
    residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
    return true;
  }

  private:
    const double x_;
    const double y_;
};


CostFunction* cost_function =
      new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
        new Rat43CostFunctor(x, y));

Note that the only difference when defining an auto-differentiating Functor is the setting of operator() compared to numerical differentiation.
PS: code for numerical differentiation

			struct Rat43CostFunctor {
    
    
			  Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {
    
    }
			
			  bool operator()(const double* parameters, double* residuals) const {
    
    
			    const double b1 = parameters[0];
			    const double b2 = parameters[1];
			    const double b3 = parameters[2];
			    const double b4 = parameters[3];
			    residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
			    return true;
			  }
			
			  const double x_;
			  const double y_;
			}
			
			CostFunction* cost_function =
			  new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
			    new Rat43CostFunctor(x, y));

In the case of numerical differentiation, it is

bool operator()(const double* parameters, double* residuals) const;

For automatic differentiation, it is a templated function of the form

template <typename T> bool operator()(const T* parameters, T* residuals) const;

What is the impact of this change? The table below compares the time required to compute the residuals and Jacobian of Rat43 with different methods.

CostFunction	Time (ns)
Rat43Analytic	255
Rat43AnalyticOptimized	92
Rat43NumericDiffForward	262
Rat43NumericDiffCentral	517
Rat43NumericDiffRidders	3760
Rat43AutomaticDiff	129

We can get exact differentiation using automatic differentiation (Rat43AutomaticDiff), which is about as much work as writing numerical differentiation code, but only 40% slower than hand-optimized analytical differentiation.

So how does it work? For that, we'll learn about Dual Numbers and Jets.

Dual Numbers to Even & Jets

Read this subsection and the next section on implementing Jets, not directly related to using automatic differentiation in the Ceres solver. However, understanding how Jets works is very useful when debugging and reasoning about the performance of automatic differentiation.

Dual even numbers are an extension of real numbers, similar to complex numbers: while complex numbers introduce an imaginary unit $i^2=-1$ to expand real numbers, and introduce an infinitesimal unit of $ϵ$ ， thusϵ $ϵ^2=0$ .
Even number $a + v ϵ$ has two components, the real component a and the infinitesimal component v.
PS: Refer to even numbers https://zhuanlan.zhihu.com/p/380140763

Surprisingly, this simple change leads to a convenient way to compute exact derivatives without manipulating complex symbolic expressions. For example, consider the following function
$f(x) = x^2 ,$
equation,
$\begin{split} f(10 + \epsilon) &= (10 + \epsilon) ^2\\ &= 100 + 20 \epsilon + \epsilon^2\\ &= 100 + 20 \epsilon \end{split}$

Looking at the coefficient of ϵ, we find that Df(10)=20. In fact, this can be generalized to functions that are not polynomials. Consider an arbitrary differentiable function f(x) then we can compute $\epsilon)$
by considering the Taylor expansion of f around x $f (x + ϵ)$ ,the infinitesimal function
$\begin{split} f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x) \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\ f(x + \epsilon) &= f(x) + Df( x) \epsilon\end{split}$

Note, $\epsilon^2 = 0$ 。

Jet is an n-dimensional dual number. Among them, we use n infinitesimal units $\epsilon_i,\ i=1,...,n$ to increase real numbers, and there are properties $\forall i, j\ :\epsilon_i\epsilon_j = 0$ . Then Jet consists of real part a and n-dimensional infinitesimal part v, that is,
$\sum_j v_{j} \epsilon_j$

The sum notation is tedious, so let's write
$\mathbf{v}.$

The above formula, $\epsilon_i$ is implied. Then, using the same Taylor series expansion as above, we can see:
$\mathbf{v}) = f(a ) + Df(a) \mathbf{v}.$
Similarly, for a multivariate function $mf:\mathbb{R}^{n}\rightarrow \mathbb{R}^m$ ，for $x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n$

For every $\mathbf{v}_i = e_i$ is the i-th standard basis vector. Then, simplify the above expression to
$x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i$

We can extract the Jacobian coordinates $\epsilon_i by examining the coefficients of$ 。

Implementing Jets

To be useful in practice, we need to be able to compute arbitrary functions f, not only on real numbers but also on dual numbers, but we don't usually compute functions via Taylor expansion,

This is where C++ templates and operator overloading come into play. The code snippet below has a simple implementation of Jet and some operators/functions to manipulate them.

template<int N> struct Jet {
    
    
  double a;
  Eigen::Matrix<double, 1, N> v;
};

template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
    
    
  return Jet<N>(f.a + g.a, f.v + g.v);
}

template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
    
    
  return Jet<N>(f.a - g.a, f.v - g.v);
}

template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
    
    
  return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
}

template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
    
    
  return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
}

template <int N> Jet<N> exp(const Jet<N>& f) {
    
    
  return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
}

// This is a simple implementation for illustration purposes, the
// actual implementation of pow requires careful handling of a number
// of corner cases.
template <int N>  Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
    
    
  return Jet<N>(pow(f.a, g.a),
                g.a * pow(f.a, g.a - 1.0) * f.v +
                pow(f.a, g.a) * log(f.a); * g.v);
}

With these overloaded functions, we can now call Rat43CostFunctor with an array of Jets instead of doubles. Combining this with properly initialized Jets, we can compute the Jacobian as follows:

class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
    
    
 public:
  Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {
    
    }
  virtual ~Rat43Automatic() {
    
    }
  virtual bool Evaluate(double const* const* parameters,
                        double* residuals,
                        double** jacobians) const {
    
    
    // Just evaluate the residuals if Jacobians are not required.
    if (!jacobians) return (*functor_)(parameters[0], residuals);

    // Initialize the Jets
    ceres::Jet<4> jets[4];
    for (int i = 0; i < 4; ++i) {
    
    
      jets[i].a = parameters[0][i];
      jets[i].v.setZero();
      jets[i].v[i] = 1.0;
    }

    ceres::Jet<4> result;
    (*functor_)(jets, &result);

    // Copy the values out of the Jet.
    residuals[0] = result.a;
    for (int i = 0; i < 4; ++i) {
    
    
      jacobians[0][i] = result.v[i];
    }
    return true;
  }

 private:
  std::unique_ptr<const Rat43CostFunctor> functor_;
};

In fact, this is how AutoDiffCostFunction works.

trap

Automatic differentiation frees the user from the burden of computing and reasoning about Jacobian symbolic expressions, but this freedom comes at a price. For example, consider the following simple functor:

struct Functor {
    
    
  template <typename T> bool operator()(const T* x, T* residual) const {
    
    
    residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
    return true;
  }
};

Looking at the residual calculation of the code, no problems are foreseen. However, if we look at the analytical expression for the Jacobian matrix
$\begin{split} y &= 1 - \sqrt{x_0^2 + x_1^2}\\ D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\ D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}\end{split}$

We find it at $x_0 = 0, x_1 = 0$ is an infinitive.

There is no single solution to this problem. In some cases one needs to explicitly point out possible points of uncertainty and use alternative expressions using L'Hopital's rule (see for example some conversion routines in rotation.h), in other cases it may be Regularization of the expression is required to eliminate these points.

Ceres introduction and examples (10) On Derivatives (Automatic Derivatives)

Guess you like