[Tutorial] Learn Deep Declarative Networks (DDN Network) [Principle Explanation + Code Combat] [Chapter 1 - DDN Background and Motivation]

DDN

written in front

This series of notes is recorded by the author from the official tutorial video released by the Deep Declarative Networks development team on Youtube. If you have magic, you can watch it on the Internet to deepen your understanding. The link is as follows https://www.youtube.com/watch?v=fnJIj906qoA

All formulas and theories in this series of tutorials are based on the following paper:
Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: A new hope. Tech. rep., Australian National University (arXiv:1909.04866) ( Sep 2019)
and its supplementary paper published at 2020ECCV
.


Primer

Traditional deep learning architectures involve a combination of simple and well-defined feed-forward processing functions (inner products, convolutions, element-wise nonlinear transformations, and pooling operations). Over the past few years, researchers have been exploring deep learning models with embedded differentiable optimization problems, and these models have recently been applied to solve problems in computer vision and other areas of machine learning.

dnn

So a network that has an optimized part like this in a neural network is called a deep declarative network (DDN). Importantly, even without prior knowledge of the algorithm used to solve the optimization problem, the gradient of the optimization problem solution with respect to the input and parameters can be computed, enabling efficient backpropagation and end-to-end learning.

theme

This tutorial will introduce DDN and its variants. We will discuss the theory behind differentiable optimization, specifically the problems that need to be overcome when developing such models and applying these models to computer vision problems. Additionally, this tutorial will provide practice in designing and implementing custom declarative nodes. Tutorial topics include:

  • Declarative end-to-end learnable processing nodes.
  • Differentiable convex optimization problems.
  • Declarative nodes for computer vision applications.
  • Implementation techniques and considerations.

1. Neural Network Basics

In the first chapter, we will understand the motivation and implementation details of DDN through three key ideas.

1.1 Neural network model (the first key idea)

First of all, we start with an ordinary learning model or deep neural network. We can regard it as a data flow graph, which defines the existence form of data in the neural network and how they are processed in the forward propagation, so as to obtain Make some predictions or estimates.
NN

where fff is the function representation of each node,θ \thetaθ is a parameter that can be updated in the node.
This will outputyyy is defined as a combination of these functions, as shown in the following formula:
2
each function obtains input from the parent node in the graph, and the propagation mode of the error signal (or gradient) is defined in the graph, so that backpropagation is performed to update eachθ \thetaof nodesθ .
The figure below shows the process of forward propagation and back propagation of an ordinary neural network:
NN
through the chain rule, we can update the parameters of the entire network. These are the basic knowledge of neural networks and will not be repeated here.
From this figure, we can see that the forward propagation and backpropagation of the general neural network are decoupled, and we can easily complete the calculation and update of each node parameter through the calculation of automatic differentiation.

1.2 Neural Network Example (Second Key Idea)

The Babylonian algorithm used in this example does not require the backpropagation process, the author is just giving an example

Take the forward propagation of Newton's iterative method (Babylon algorithm) as an example:
1

We can see that in the forward pass yyy is iterating continuously, and finally outputs the resulty T y_TyT. In this iterative process, we can get the gradient formula of backpropagation through the chain rule:
2
I don’t think the part of the partial derivative of x in the formula given here is quite right, but it doesn’t affect the understanding, after all, it’s just an example. (Of course, there may be deviations in my understanding, welcome to discuss)
the backpropagation part, because the result to be solved is y = xy = \sqrt{x}y=x , when it is backpropagated:
3
Then we get the function of the backpropagation part: In
4
this way, we get the formula expression of the whole process:
5
Note that the process demonstrated by the author of the original video here is not a neural network, but just demonstrates this kind of closure Calculation methods such as the forward function and the chain rule are provided for readers to deepen their understanding of the differentiable optimization problems after learning (they have no closed forward implementation) .
To summarize, when we normally build neural networks, we usually use automatic differentiation to get backpropagation from forward pass. This works very well in general neural networks, but one thing to remember is that sometimes we don't have to use automatic differentiation, and it is equally feasible to perform calculations by separating the implementation details of the forward pass and the backward pass.

1.3 Implicit functions in deep learning models (the third key idea)

In our previous example, we can see that the nodes of the neural network generally have clear function definitions. We can usually calculate the output of a given input through mathematical expressions. This method may easily calculate the Derivative. But here we are going to study another kind of function—implicit function:
6
although the input and output of this function satisfy a certain relationship, the output is not directly given as an explicit function of the given input.
The example function given here by the author is shown in the figure above, where the function has one input variable x and two output variables y1 and y2.
Here y and x can be represented by an implicit function:
ψ ( x , y ) = ( x − y 1 2 + y 2 2 ) 2 + ( y 1 2 + y 2 2 − 1 ) 2 \psi(x ,y)= (x-y_1^2+y_2^2)^2+ (y_1^2+y_2^2-1)^2ψ ( x ,y)=(xy12+y22)2+(y12+y221)2For
the formula
ψ ( x , y ) = 0 \psi(x,y)=0ψ ( x ,y)=0
we need a way to find the corresponding y, in the second half of the tutorial we will explain some methods of solving implicit functions.
We can now write the solution space for the example just presented.
7
First of all, we can think of this formula as a combination of the link of input x and output y (blue part) and the constraint of y (yellow part).
The following figure shows the zero level set of this example:
8
This figure shows the individual relationship of each quantity, even if there is no explicit function expression, we can still use the implicit function theorem to calculate the gradient of y relative to x. For all x in the range (-1, 1), there are exactly four solutions, and for x equal to ±1, there are two solutions (when y1 or y2 is equal to 0).
The implicit function theorem answers when the implicit functionψ ( x , y ) = 0 \psi(x,y)=0ψ ( x ,y)=0 d y / d x dy/dx Calculation of the value of d y / d x
: through partial derivative
9

We can find dy / dx dy/dxThe value of d y / d x :
10
This formula is the implicit function theorem, which we will use later.

2. Basic knowledge of DDN

As explained in the previous part, the traditional neural network architecture is generally a combination of simple explicit functions. Now researchers are discussing deep learning models for embedded differentiable optimization problems. For optimization problems, we can use This translates to a well-defined function in the forward pass. For example, the example just mentioned:
11
For the constraints ∣ ∣ y ∣ ∣ 2 = 1 ||y||^2=1∣∣y2=1 to minimize the value of y, we can convert it to x under the value of (-1, 1) to find the formula( x − y 1 2 + y 2 2 ) 2 + ( y 1 2 + y 2 2 − 1 ) 2 = 0 (x-y_1^2+y_2^2)^2+ (y_1^2+y^2_2-1)^2=0(xy12+y22)2+(y12+y221)2=The process corresponding to y of 0 .

Next, we will introduce the basic components of DDN by introducing imperative nodes and declarative nodes .

2.1 Composition of DDN

2.1.1 Imperative & Declarative Nodes

Imperative node
12
For imperative node, it is a kind of forward channel function f ~ \tilde{f}f~is a neural network node with an explicit function, and the output of the node at this time is

y = f ~ ( x ; θ ) y=\tilde{f}(x; \theta)y=f~(x;i )

xx herex is the input,θ \thetaθ is the parameter to be learned by the node.

Declarative Node
13
In a declarative node, the relationship between input and output is defined as a solution to an optimization problem:
14
here fff is the objective function,CCC is constraint.

2.1.2 DDN Example - Pooling

Let's take average pooling as an example:
{ xi ∈ R m ∣ i = 1 , . . . , n } → R m \{x_i \in \mathbb{R}^m|i=1,...,n\} \to\mathbb{R}^m{ xiRmi=1,...,n}R
In the process of average pooling, what we need to solve is to input xi x_ixiaverage of.
We can formulate this problem in two different ways:
the first way - imperative and
15
the second way - declarative
16
here the vector uuu and input vectorxi x_ixiThe distance between is the minimum value.

The advantage of thinking about average pooling in this way is that it allows us to replace the l2 distance by minimizing the robust penalty, which can replace the current average with a robust average. This also shows that some cases can be forced to use declarative representation, and all imperative nodes can be represented as declarative nodes.

Declarative and imperative nodes can exist in the same network:
17

Then the question arises - if we can solve the optimization problem in the forward pass, then how do we obtain the gradient of the backpropagation of the declaration layer:
18
this is the main problem in the deep declaration network.
Now that we know all the ingredients needed for an optimization problem, i.e. optimal conditions in implicit functional form and the implicit function theorem, we can write an equation that finds the derivative of a particular solution with respect to the input. For the input x ∈ R nx\in \mathbb{R}^nxRWe can define its solution setY ( x ) : R n ⇉ R m Y(x):\mathbb{R}^n\rightrightarrows\mathbb{R}^mY(x):RnRm is as follows:

Y ( x ) = arg min ⁡   u { f ( x , u ) : h ( x , u ) = 0 p , g ( x , y ) ≤ 0 q } Y(x)=\argmin\ _u\{f(x,u):h(x,u)=0_p,g(x,y)\leq0_q\} Y(x)=argmin u{ f(x,u):h(x,u)=0p,g(x,y)0q}

f ( x , u ) f(x,u)f(x,u ) is the objective function,h ( x , u ) = 0 p , g ( x , y ) ≤ 0 qh(x,u)=0_p,g(x,y)\leq0_qh(x,u)=0p,g(x,y)0qas constraints.

The author added that
in the above expression, 0 p 0_p0pand 0 q 0_q0qIndicates the constraint number, which is used to distinguish different constraints. These numbers are used to indicate how many equality constraints and inequality constraints there are in the problem.

Then for any y ∈ Y ( x ) y\in Y(x)yY(x),有:
d y / d x = H − 1 A T ( A H − 1 A T ) ( A H − 1 B − C ) − H − 1 B dy/dx=H^{-1}A^T(AH^{-1}A^T)(AH^{-1}B-C)-H^{-1}B dy/dx=H1AT(AH1AT)(AH1BC)H1B

This expression includes some first and second partial derivatives of the objective and constraint functions. The proof of this result is the direct application of the implicit function theorem to the KKT optimum condition.

2.1.3 Proof (no constraints)

19

The proof of the theorem under unconstrained conditions is given below. Assuming y is the solution to an optimization problem with unconstrained variables u, we all know that when the function fff toyyThe solution must occur when the derivative of y is 0.
y ∈ argminuf ( x , u ) ⇒ ∂ f ( x , y ) ∂ y = 0 y\in argmin _uf(x,u)\Rightarrow\frac{\partial{f(x,y)}}{\partial{ y}}=0yargminuf(x,u)yf(x,y)=0
We define it as an implicit functionψ ( x , y ) ≜ ∂ f ( x , y ) ∂ y \psi(x,y)\triangleq\frac{\partial f(x,y)}{\partial y }ψ ( x ,y)yf(x,y)And applying the implicit function theorem we can get:
dydx = − ( ∂ 2 f ∂ y 2 ) − 1 ∂ 2 f ∂ x ∂ y \frac{dy}{dx}=-(\frac{\partial^2f}{\ partial y^2})^{-1}\frac{\partial^2f}{\partial x \partial y}dxdy=(y22 f)1xy2 f
这里的 − ( ∂ 2 f ∂ y 2 ) − 1 ∂ 2 f ∂ x ∂ y -(\frac{\partial^2f}{\partial y^2})^{-1}\frac{\partial^2f}{\partial x \partial y} (y22 f)1xy2 fIt is the previously mentioned − H − 1 B -H^{-1}BH1B

Summarize

At this point, we know what a declarative layer needs. To sum it up:
13
forward channel

  • A method for solving optimization problems

backpropagation

  • Optimization conditions (objective function and constraints)
  • (The result cache obtained by the forward channel)
  • You don't need to know how to solve this problem, just calculate it directly

In the next few chapters, I will give specific explanations through the official tutorial routines provided by the author, so stay tuned. (Some concepts that the author didn’t understand will be supplemented later if he is enlightened. Welcome to discuss and correct)

Guess you like

Origin blog.csdn.net/qq_45912037/article/details/132290236