Ceres introduction and examples (7) On Derivatives (Spivak Notation)

On Derivatives

The Ceres solver, like all gradient-based optimization algorithms, relies on being able to evaluate the objective function and its derivative at any point within its domain. In fact, defining the objective function and its Jacobian matrix is ​​the main task that the user needs to perform when using the Ceres solver to solve an optimization problem. Correct and efficient calculation of the Jacobian matrix is ​​the key to good performance.

Users can flexibly choose from the following three differential algorithms:

  • 1. Automatic Derivatives Automatic differentiation algorithm: Ceres uses c++ templates and operator overloading to automatically calculate analytic derivatives.
  • 2.Numeric derivatives Numerical differential algorithm: Ceres calculates derivatives with finite difference values
  • 3. Analytic Derivatives Analytical differential algorithm: the user calculates the derivatives by himself or by using tools such as Maple or Mathematica, and implements them in CostFunction.

Which of these three approaches should be used (alone or in combination) depends on the situation and the trade-offs the user is willing to make. Unfortunately, numerical optimization textbooks rarely discuss these issues in detail, leaving users at their own devices.

The purpose of this article is to fill this gap and describe each of these three methods in detail in the context of the Ceres Solver so users can make an informed choice.
For those impatient, here's some high-level advice:

  • 1. Use Automatic Derivatives automatic differentiation algorithm
  • 2. In some cases, it may be worth using Analytic Derivatives
  • 3. Avoid Numeric derivatives numerical differential algorithm. Use it as a last resort, mainly for interfacing with external libraries.

Spivak Notation

To simplify reading and understanding, derivatives are represented using Spivak notation. For a univariate function f, f(a) is its value at a, Df is its first derivative, and Df(a) is its derivative at a, that is, D f ( a ) = ddxf ( x ) ∣
x = a Df(a) = \left . \frac{d}{dx} f(x) \right |_{x = a}Df(a)=dxdf(x) x=a
D f ( a ) Df(a) D f ( a ) means the kkthof fk -order derivative. For a binary functiong ( x , y ) g(x,y)g(x,y) D 1 g D_1g D1g andD 2 g D_2gD2g respectively aboutggTwo partial differentials of g
, namely D 1 g = ∂ ∂ xg ( x , y ) and D 2 g = ∂ ∂ yg ( x , y ) . D_1 g = \frac{\partial}{\partial x}g( x,y) \text{ and } D_2 g = \frac{\partial}{\partial y}g(x,y).D1g=xg(x,y) and D2g=yg(x,y).
D g Dg D g meansggJacobian matrix of g
: D g = [ D 1 g D 2 g ] Dg = \begin{bmatrix} D_1g & D_2g \end{bmatrix}Dg=[D1gD2g]
More generally, for a multivariate functiong : R n ⟶ R mg:\mathbb{R}^n \longrightarrow \mathbb{R}^mg:RnRm D g Dg D g represents an m×n Jacobian matrix.

D i g D_i g Dig means the partial differential of g, that is,D g DgThe element at row i and column i in Dg .

Finally D 1 2 g D^2_1gD12g SumD 1 D 2 g D_1D_2gD1D2g stands for higher order partial differential.

Guess you like

Origin blog.csdn.net/wanggao_1990/article/details/129712087