Ceres Solver 官方教程学习笔记（九）——自动微分法Automatic Derivatives

这篇文章翻译自官方教程Automatic Derivatives并且参考了少年的此间的博客文章Ceres-Solver学习笔记(5)

现在我们将讨论自动微分算法。它是一种可以快速计算精确导数的算法，同时用户只要做与数值微分法类似的工作。下面的代码片段实现了对Rat43（见前两节）的CostFunction。

struct Rat43CostFunctor {
  Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}

  template <typename T>  
  bool operator()(const T* parameters, T* residuals) const {//变化1
    const T b1 = parameters[0];
    const T b2 = parameters[1];
    const T b3 = parameters[2];
    const T b4 = parameters[3];
    residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
    return true;
  }

  private:
    const double x_;
    const double y_;
};


CostFunction* cost_function =
      new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(    //变化2
        new Rat43CostFunctor(x, y));

我把对应的数值微分法代码贴在这里以供对比。

struct Rat43CostFunctor {
  Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}

  bool operator()(const double* parameters, double* residuals) const {
    const double b1 = parameters[0];
    const double b2 = parameters[1];
    const double b3 = parameters[2];
    const double b4 = parameters[3];
    residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
    return true;
  }

  const double x_;
  const double y_;
}

CostFunction* cost_function =
  new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
    new Rat43CostFunctor(x, y));

注意，与数值微分法相比，在定义自动微分的Functor时，唯一的区别是对操作符operator()的设置。
在数值微差的情况下

//数值微分法
bool operator()(const double* parameters, double* residuals) const;

//自动微分法
template <typename T> bool operator()(const T* parameters, T* residuals) const;

这个变化有什么影响呢？下表比较了使用各种方法对Rat43进行计算残差和雅可比矩阵的时间。

CostFunction	Time (ns)
Rat43Analytic	255
Rat43AnalyticOptimized	92
Rat43NumericDiffForward	262
Rat43NumericDiffCentral	517
Rat43NumericDiffRidders	3760
Rat43AutomaticDiff	129

我们可以使用自动微分(Rat43AutomaticDiff)来得到精确的微分。而这与编写数字微分的代码量相差不多，但比优化后的解析微分法只慢 $40\%$ 。为了研究它的工作原理，必须要学习二元数(Dual number)和射流(Jet)

二元数(Dual number)和射流(Jet)

阅读这一小节和下一节关于实现Jets的内容，与在Ceres求解器中使用自动微分没有直接关系。但是，在调试和推理自动微分的性能时，了解Jets的工作原理是非常有用的。

二元数是实数的一个延伸，类似于复数。复数则通过引入虚数来增加实数，比如 $i$ ，二元数引入了一个极小(infinitesimal)二元数单位，比如 $\epsilon$ ，且 $\epsilon^2 = 0$ （平方后太小可以忽略）。一个二元数 $a+vϵ$ 包含两个分量，实分量 $a$ 和极小分量的 $v$ 。令人惊喜的是，这个简单的变化带来了一种方便的计算精确导数的方法，而不需要复杂的符号表达式。
例如，考虑函数

f (x) = x^{2},

$f(x) = x^2 ,$

然后

\begin{aligned} \begin{aligned} (41) & f (10 + ϵ) & = (10 + ϵ)^{2} \\ (42) & = 100 + 20 ϵ + ϵ^{2} \\ (43) & = 100 + 20 ϵ \end{aligned} \end{aligned}

$\begin{split}\begin{align} f(10 + \epsilon) &= (10 + \epsilon)^2\\ &= 100 + 20 \epsilon + \epsilon^2\\ &= 100 + 20 \epsilon \end{align}\end{split}$

观察 $\epsilon$ 的系数，我们发现 $Df(10) =20$ 。事实上，这个规律可以推广到不是多项式的函数。考虑一个任意可微函数 $f(x)$ 。然后我们可以计算 $f(x+ϵ)$ ，通过在 $x$ 附近做泰勒展开，这就得到了无穷级数

\begin{aligned} \begin{aligned} (44) & f (x + ϵ) & = f (x) + D f (x) ϵ + D^{2} f (x) \frac{ϵ^{2}}{2} + D^{3} f (x) \frac{ϵ^{3}}{6} + \dots \\ (45) & f (x + ϵ) & = f (x) + D f (x) ϵ \end{aligned} \end{aligned}

$\begin{split}\begin{align} f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x) \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\ f(x + \epsilon) &= f(x) + Df(x) \epsilon \end{align}\end{split}$

记住， $\epsilon^2 = 0$ 。
射流Jet是一个 $n$ 维二元数。我们定义 $n$ 个极小单位 $\epsilon_i,\ i=1,...,n$ 。并且存在性质 $\forall i, j\ :\epsilon_i\epsilon_j = 0$ 。射流数由实数 $a$ 和 $n$ 维极小分量组成。

x = a + \sum_{j} v_{j} ϵ_{j}

$x = a + \sum_j v_{j} \epsilon_j$

为了简化我们改写为这种形式

x = a + v .

$x = a + \mathbf{v}.$

然后，使用泰勒级数展开，我们可以看到：

f (a + v) = f (a) + D f (a) v .

$f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.$

对多变量函数 $f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m$ 相似。对于自变量 $x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n$ ：

f (x_{1}, . . ., x_{n}) = f (a_{1}, . . ., a_{n}) + \sum_{i} D_{i} f (a_{1}, . . ., a_{n}) v_{i}

$f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i$

如果每个选取的极小量 $\mathbf{v}_i = e_i$ 是 $i^{\text{th}}$ 标准基向量，那么上面的表达式就可以简化为

f (x_{1}, . . ., x_{n}) = f (a_{1}, . . ., a_{n}) + \sum_{i} D_{i} f (a_{1}, . . ., a_{n}) ϵ_{i}

$f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i$

我们可以通过查找 $\epsilon_i$ 的系数来提取雅可比矩阵的坐标。

实现射流(Jet)

为了让上面学到的内容在实践中发挥作用，我们需要能够计算函数 $f$ 的值，不仅在自变量是实数的时候，也需要在自变量是二元数的情况下适用。但是通常我们并非通过泰勒展开式来求函数值。这也就是为什么我们需要用到C++模板和操作符重载。下面的代码段实现了Jet类以及对该类的一些操作和函数。

template<int N> struct Jet {
  double a;
  Eigen::Matrix<double, 1, N> v;
};

template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
  return Jet<N>(f.a + g.a, f.v + g.v);
}

template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
  return Jet<N>(f.a - g.a, f.v - g.v);
}

template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
  return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
}

template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
  return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
}

template <int N> Jet<N> exp(const Jet<N>& f) {
  return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
}

// This is a simple implementation for illustration purposes, the
// actual implementation of pow requires careful handling of a number
// of corner cases.
template <int N>  Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
  return Jet<N>(pow(f.a, g.a),
                g.a * pow(f.a, g.a - 1.0) * f.v +
                pow(f.a, g.a) * log(f.a); * g.v);
}

有了这些重载的函数，我们现在可以用一个Jets数组来调用Rat43CostFunctor（见Ceres Solver 官方教程学习笔记（八）——数值微分法Numeric derivatives），而不是double双精度类型。将其与初始化的Jets结合起来，我们就可以计算雅可比矩阵了：

class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
 public:
  Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
  virtual ~Rat43Automatic() {}
  virtual bool Evaluate(double const* const* parameters,
                        double* residuals,
                        double** jacobians) const {
    // Just evaluate the residuals if Jacobians are not required.
    if (!jacobians) return (*functor_)(parameters[0], residuals);

    // 初始化Jets，四个待求参数
    ceres::Jet<4> jets[4];
    for (int i = 0; i < 4; ++i) {
      jets[i].a = parameters[0][i];
      jets[i].v.setZero();
      jets[i].v[i] = 1.0;
    }

    ceres::Jet<4> result;
    (*functor_)(jets, &result);

    // 把Jet的值（前面提到的，极小单位分量的系数）复制出啦.
    residuals[0] = result.a;
    for (int i = 0; i < 4; ++i) {
      jacobians[0][i] = result.v[i];
    }
    return true;
  }

 private:
  std::unique_ptr<const Rat43CostFunctor> functor_;
};

这就是AutoDiffCostFunction的核心工作原理。

陷阱

自动微分使用户不必计算和推理Jacobians的符号表达式，但是这个捷径是有代价的。例如，考虑以下简单的函数：

struct Functor {
  template <typename T> bool operator()(const T* x, T* residual) const {
    residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
    return true;
  }
};

查看计算残差的代码，没有人预见到任何问题。但是，如果我们看一下雅可比矩阵的解析表达式

\begin{aligned} y & = 1 - \sqrt{x_{0}^{2} + x_{1}^{2}} \\ D_{1} y & = - \frac{x_{0}}{\sqrt{x_{0}^{2} + x_{1}^{2}}}, D_{2} y = - \frac{x_{1}}{\sqrt{x_{0}^{2} + x_{1}^{2}}} \end{aligned}

$\begin{split} y &= 1 - \sqrt{x_0^2 + x_1^2}\\ D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\ D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}\end{split}$

我们发现它在 $x_0=0，x_1=0$ 处是不确定的。

这个问题没有完美的解决方案。在某些情况下，我们需要明确地指出可能出现的不确定的点，并使用使用L’Hopital’s rule”的替代表达式（例如参见rotation.h中的一些转换例程），在其他情况下，可能需要对表达式进行正则化，以消除这些点。