Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and

Summary

The article first unified the representation of complex learning and vision problems such as hyperparameter optimization, multi-task and meta-learning from the perspective of two-layer optimization, and established a single-layer reconstruction based on the best response and a unified algorithm framework to formulate a model based on A two-level optimization method for gradients, and finally discusses the potential of this framework and future research directions.

BLO thought

Replace the LL subproblem equation with its KKT condition and minimize the original variables x and y and their multipliers.
The resulting problem is called the Mathematical Procedure for Equilibrium Constraints (MPEC). There are two ways to solve this problem: . The first approach, the nonlinear programming method, rewrites the complementary constraints as nonlinear inequalities, which then allow the utilization of powerful numerical nonlinear programming solvers; the other, the combinatorial approach, deals with the combinatorial nature of disjunctive constraints.
Using the MPEC method to deal with the BLO problem will lead to two problems: first, in theory, if there are multiple multipliers for the LL subproblem, MPEC will not be equivalent to the original BLO; second, when solving the BLO problem, the introduced auxiliary multiplier variable will limit the numerical efficiency.

two-tier optimization

Bi-level optimization (BLO) consists of two layers of optimization tasks, where one layer is nested within the other as a constraint. The inner (or nested) and outer optimization tasks are often referred to as low-level (LL) and high-level (UL) subproblems, respectively.
The LL subproblem is formulated as the following parameter optimization task.
min ⁡ y ∈ Y f ( x , y ) , ( parameterizedbyx ) \min\limits_{y\in Y}f(x,y),(parameterized by x)yYminf(x,y),( par a me ter i zed by x ) The standard BLO problem can be
formally expressed as the following formula .
min ⁡ x ∈ XF ( x , y ) , s . t . y ∈ S ( x ) \min\limits_{x\in \mathcal{X}}F(x,y),st y\in \mathcal{S }(x)xXminF(x,y),s.t.yS ( x )
refers to the UL and LL subproblems as "leader" and "follower", respectively. A "leader" first chooses a decision x, and then a "follower" observes x in response to decision y.
Thus, followers may depend on the leader's decisions. Likewise, the leader must satisfy constraints that depend on the decisions of the followers.
It is worth noting that, as shown in Figure 1, for each (or some) fixed value of the UL decision variable x, the LL subproblem may have multiple solutions. When the solution to the LL subproblem is not unique, it is difficult for the leader to predict which point in S(x) the follower will choose.Alt

Figure 1 BLO problem

BLOS solves practical problems

hyperparameter optimization

Hyperparameter Optimization (HO) refers to the problem of identifying the optimal set of hyperparameters that cannot be learned using training data alone. It is the most direct application of BLO in the field of learning and vision.
Specifically, the UL target F ( x , y ; D val ) F(x,y;Dval)F(x,y;D v a l ) aims to minimize the validation set loss related to hyperparameters (such as weight decay), while the LL objectiveF ( x , y ; D tr ) F(x,y;Dtr)F(x,y;D t r ) needs to output a learning algorithm by minimizing the training loss associated with model parameters such as weights and biases. As shown in Figure 2, the full dataset D is divided into training and validation datasets (i.e.Dtr ∪ Dval Dtr∪ DvalDtrD v a l ) and show how to model the HO task from the perspective of BLO. Inspired by this nested optimization, most HO applications can be formulated as BLO problems characterized by a two-level structure. The UL subproblem involves the optimization of the hyperparameter x, and the LL subproblem (w.r.t. the weight parameter y) aims to find the learning algorithmgy( ) g_y( )gy
Alt

figure 2

Multitasking and Meta-Learning

Meta Learning Meta Learning means to learn to learn, that is, learn to learn. Meta Learning hopes to enable the model to acquire the ability to "learn to learn", so that it can quickly learn new tasks based on the acquisition of existing "knowledge".
According to the dependencies between meta-parameters and network parameters, current meta-learning based methods can be roughly divided into two groups, namely, meta-feature learning and meta-initialization learning.
Meta-initialization learning aims to study the meta-information of multiple tasks through network initialization. From the perspective of BLO, the network parameters and their initialization (based on multi-task information) are actually represented by LL and UL sub-problems respectively.
Meta-feature learning methods first separate the network architecture into a meta-feature extraction part and a task-specific part. Then develop a hierarchical learning process. Therefore, in these tasks, the meta-feature part and the task-specific part can be modeled using UL and LL subproblems, respectively.

Neural Architecture Search

Neural Architecture Search (NAS) aims to automate the process of selecting the best neural network architecture.
The commonly used method now is the gradient-based differentiable NAS method. Given an appropriate search space, gradient-based differentiable NAS methods help to derive optimal architectures for different vision and learning tasks. From a BLO perspective, UL targets w.r.t. architectural weights (e.g., blocks/units) can be parameterized by x, and LL targets w.r.t. model weights can be parameterized by y. Therefore, the entire search process can actually be expressed as a BLO paradigm, where the UL target is defined by F (x,y; Dval) based on the validation dataset Dval, and the LL target is defined by f (x,y; Dtr ) based on the training dataset Dtr ) is given.

adversarial learning

Adversarial learning (AL) is currently considered to be one of the most important learning tasks. It has been applied in various application domains, namely image generation, adversarial attack and face verification.
Most AL methods can describe the unsupervised learning problem as a two-layer game between two opponents: a generator that samples from a distribution and a discriminator that classifies samples as real or fake. Under the BLO paradigm, participants and critics correspond to UL and L variables, respectively.

Mainstream BLO model

As shown in Figure 3, existing methods can be divided into two categories, namely w/ and w/o LLS assumptions. When solving BLOs using the LLS assumption, these methods can be further divided into two categories: EGBR and IGBR. For EGBR, they can be solved by different automatic differentiation (AD) techniques (shown as dashed rectangles). Recent studies also proposed two algorithms to solve BLO without LLS assumption, actually introducing two-layer gradient aggregation or value function based interior point method to compute indirect gradients.
insert image description here

image 3

The optimization process of the existing mainstream gradient-based BLO method is shown in Figure 4.
Alt

Figure 4

Supongo que te gusta

Origin blog.csdn.net/dawnyi_yang/article/details/125292257
Recomendado
Clasificación