【NeuIPS‘2023】《Hypernetwork-based Meta-Learning for Low-Rank Physics-Informed Neural Networks》

prerequisite knowledge

1) Differential operators, differential equations
\quadReference: A powerful tool for linear differential equations-differential operator method , differential equations (1)-basic concepts and classification
\quadLet’s first take a look at what a function is. As shown in the figure below, a number is entered and transformed into another number after some kind of processing . The number that goes in is the independent variable, the number that comes out is the dependent variable, and some kind of processing in the middle is the corresponding rule . The mathematical expression is: y=f(x), this is the function.
Insert image description here
\quadSo now let’s change it. What if what goes in is a function and what comes out is also a function? As shown below:
Insert image description here
\quadIt can be seen that function f becomes function g after some operation, and operators can be understood as operations that change the function . Enter a function and undergo some processing to become another function.
\quad What is a differential operator? The function y becomes y' after processing, and this processing is a differential operator. If represented by D, as shown below.
Insert image description here
\quadDifferential operators have some good properties and are often used to solve differential equations.

2) Differential equations, ordinary differential equations, partial differential equations
\quadA differential equation is a mathematical equation that describes the relationship between a certain type of function and its derivatives, that is, an equation containing an unknown function and its derivatives .
\quad Ordinary Differential Equation ( ODE ) refers to a differential equation containing only one independent variable . If the unknown function in a differential equation contains two or more independent variables , the differential equation is a partial differential equation ( PDE ).
\quadIn the following formula, (1)(2)(3) is an ordinary differential equation because it contains only one independent variable x; and example (4) contains two independent variables t and x, so it is a partial differential equation .
Insert image description here
\quad A special solution refers to a certain solution that satisfies a differential equation; a general solution refers to a set of solutions that satisfies a differential equation. Some differential equations have infinite solutions, some have no solutions, and some have only a finite number of solutions.

\quad The order of a differential equation depends on the order of the highest derivative occurring in the equation.

\quad Initial value problem and boundary value problem
\quadWhen adding additional conditions to a differential equation, if the independent variables of the unknown function and its derivatives in the additional conditions have the same value , it is an initial value problem ; if the independent variables of the unknown function and its derivatives in the additional conditions have different values , it is a boundary problem. value question .

\quadFor example, example (5) is an initial value problem; example (6) is a boundary value problem.
Insert image description here
\quadThe solution y(x) of an initial value problem or boundary value problem must not only satisfy the differential equation, but also satisfy all additional conditions.

3) The relationship between partial differential equations and ordinary differential equations:

\quad the difference :

  • ODEs (Ordinary Differential Equations):
    • Describes the derivative of a variable with respect to a single independent variable.
    • Typically used to describe systems involving only one independent variable (usually time).
  • PDEs (Partial Differential Equations) :
    • Describes the partial derivatives of a multivariable function with respect to two or more independent variables.
    • Typically used to describe systems involving (that is, having multiple independent variables) multiple independent variables (perhaps space coordinates and time) .

\quad relation :

  • ODEs are a special case of PDEs :
    • When a function in a PDE depends only on one variable, the PDE degenerates into an ODE.
  • Mutual conversion:
    • Through appropriate variable selection, certain PDEs can be converted into ODEs and vice versa.
  • Application areas:
    • ODEs are more commonly used to describe dynamical systems, ecological models, etc., where only one independent variable is involved (usually time).
    • PDEs are more commonly used to describe fluctuations, heat transfer, fluid dynamics, etc., where the behavior of a system involves multiple independent variables.

\quadOverall, ODEs and PDEs are mathematical tools used to model and understand the evolution of systems. The choice of which one to use usually depends on the nature of the problem and the characteristics of the system.

4) The difference between parameterized partial differential equations (PPDEs) and partial differential equations (PDE)

\quad Differences in form :

  • Often used to describe the relationship between an unknown function and its partial derivatives with respect to two or more independent variables.
  • In PPDEs, in addition to independent variables and unknown functions, they also contain parameters, which can be constants, functions, or other variables.

\quad Differences in application :

  • PDEs: Mainly used to describe changes in space and time, involving the relationship between multiple independent variables, such as heat conduction, fluid mechanics, etc.
  • PPDEs: More common for simulating phenomena with parameter changes. These parameters may represent material properties, boundary conditions, initial conditions, etc., and the response of the system can be studied by adjusting the parameters.

\quad The difference in solution :

  • PDEs: Solving PDEs usually involves finding the unknown function u such that the equation holds, either analytically or numerically.
  • PPDEs: When solving PPDEs, in addition to finding u, you also need to determine the values ​​of the parameters. This may require the use of methods such as optimization or fitting.

Personal understanding: PPDE may be a type of PDE, such as human face. PDE represents different faces.

5) The relationship between PDE and deep learning
\quadThere are some interesting relationships between PDEs (Partial Differential Equations) and deep learning, especially in the fields of scientific computing, physical modeling, and engineering. Here are some important aspects of relationships:

  • Physical modeling and data-driven approaches: PDEs are often used to describe the behavior of physical systems. In some cases, we may not be able to find an analytical solution to the system, or the analytical solution may be very complex. Deep learning provides a data-driven approach to approximate solving these equations by learning from large amounts of experimental or simulated data without having to solve the equations explicitly.
  • Physics-based constraints: In deep learning, especially in physics problems, the physics knowledge contained in PDEs can be exploited as additional constraints. This can be done by embedding the information of PDEs into the architecture or loss function of the neural network, so that the network can better adapt to the laws of physics. This type of method is sometimes called "Physics-informed neural networks" (PINNs).
  • Data assimilation: In some scientific applications, we may have some observational data, but the complete model of the physical system is unknown. Deep learning can be used to fuse observational data and physical equations to estimate the state or parameters of a system.
  • Solve high-dimensional and complex problems: Some PDEs describe problems that may involve high-dimensional spaces or have complex geometric structures. Traditional numerical methods can become prohibitively expensive or difficult in these cases. Deep learning methods have a certain degree of flexibility, can learn in high-dimensional spaces, and can adapt to more complex data structures.
  • Nonlinear modeling: Deep learning is a powerful nonlinear modeling tool, and many natural phenomena can be modeled using nonlinear PDEs. Deep learning methods can handle these nonlinear relationships more naturally.

Overall, PDEs and deep learning can complement each other, and their combination may provide a powerful tool, especially when dealing with problems that are complex, high-dimensional, or lack analytical solutions.

6)Physics-informed neural networks(PINN)
\quad "Physics-informed neural networks" is a method that combines physical models and neural networks to solve problems in science and engineering. The core idea of ​​this method is to embed known physical equations into the training process of the neural network to improve the neural network's ability to understand and predict system behavior.
\quadIn traditional data-driven machine learning methods, neural networks usually learn models by training on large amounts of observation data. However, in many scientific and engineering problems, we usually already know some basic physical laws about the behavior of the system, such as equations or constraints. PINN is based on this point of view and attempts to integrate these prior physical knowledge into neural networks.
\quadThe basic idea of ​​PINN is to build a neural network that not only learns the mapping relationship from input to output, but also satisfies known physical equations. By embedding physical equations into the loss function of the neural network, the training process not only requires the network to adapt to the observation data, but also requires the network to satisfy the physical equations. This helps improve the generalization ability of the model, especially with limited data, because the physical equations provide additional constraints on the system's behavior.
\quadThis method can usually make better use of known physical laws and improve the accuracy and interpretability of the model when it comes to physical problems, such as fluid mechanics, heat conduction, etc.

7) If PINN and AI
want to use neural networks to approximate the solution of PDE, they can only use initial boundary value conditions and equations to constrain the neural network, which means that they need to spend time on the loss function. After all, the loss function determines the training results of the neural network. direction.

The loss function of PINNs is divided into two parts, one is the initial conditions and boundary conditions, and the other is the equation. However, we also need to select internal points for training and constraints. Therefore, the input of another type of training set needs to be the coordinates of internal points, because internal points have no labels (i.e., true values) and are only subject to equations, so only the Loss of the equation can be calculated.

Using initial boundary conditions as constraints can be regarded as using the data in the training set (can these points be regarded as initial/boundary points?) in deep learning for supervised learning. To summarize, the design idea of ​​this loss function is to select a certain number of coordinate points in the definition domain as input to the training set to constrain the neural network. Points on the boundary (initial conditions can also be regarded as boundaries in time) are also constrained by equations and labels at the same time, and points inside are only constrained by equations.

The most interesting thing about PINN is the equation constraint term in the loss function. Just like some NeRF methods, in addition to MSE loss, geometric loss is also introduced to design an equation constraint term based on physical rules.

8) Operator learning
\quadThe goal of operator learning is to solve a type of problem and obtain better generalization capabilities. If you can have a well-trained model, then ideally, changing parameters, initial boundary conditions or changing forcing terms should theoretically be applicable. In other words, operator learning is learning a family of PDEs.


Author: Yonsei University,Arizona State University

1) Motive

\quad Physics-informed neural networks (PINNs) are a special class of coordinate-based MLPs, also known as implicit neural representations (INRs), used for numerical approximate solutions of partial differential equations (PDEs). That is, PINNs take a spatiotemporal coordinate (x, t) as input, then predict the PDE solution at this coordinate, and are trained by minimizing the (implicit) PED residual loss and data matching loss under initial and boundary conditions .
\quadPINNs have the same weaknesses as coordinate-based MLPs (or INRs). For a new data instance (a new PDE for PINNs or a new image for INRs), a neural network needs to be retrained. Therefore, using PINNs to solve PDEs (especially in parameterized PDEs) is often computationally expensive. This burden precludes the application of PINNs to important scenarios involving many queries, since these scenarios require parameterized PDE models to be simulated thousands of times (e.g., design optimization, uncertainty propagation), i.e., many PDE solutions are required.
\quadIn order to solve the above problems, this article proposes: 1) a low-rank structured neural network architecture of PINNs, called low-rank PINNs (LR-PINNs); 2) an effective rank-revealing training algorithm, It adaptively adjusts the rank of LR-PINNs to adapt to different PDE inputs; 3) A two-stage procedure (offline training/online testing) is used to handle many-query scenarios.
\quadThis research was inspired by observations in the study of numerical PDE solvers, which showed that: Numerical solutions to parameteric PDEs can often be approximated in low-rank matrix or tensor formats to reduce computational/memory requirements (then can the initialization weight in meta-learning also be as this form?). Specifically, the proposed method adopts the computational form used in reduced-order modeling (ROM), which is the main method when solving parameteric PDEs.

2) Method

\quadLR-PINNs are a type of PINNs with hidden fully-connected layers (low-rank weight matrix). Denote each intermediate layer as LR-FC, and the lth hidden layer can be defined as follows:
Insert image description here
\quadAmong them, U and V are full column rank matrices, containing a set of orthogonal basis vectors.
[picture]

\quad The challenge is: representing the weights of the hidden layers themselves is trivial, and indeed has been actively studied in many different areas of deep learning, such as NLP. This article differs from previous approaches by attempting to reveal the rank of intermediate layers as training proceeds, which poses unique challenges. These challenges can be summarized with some research questions:
\quad 1) Should we make all parameters learnable? (i.e., U, V, ∑);
\quad 2) How do we determine the rank of each layer separately and adapt to different μ? ;
\quad 3) Can we exploit the low-rank structure to avoid expensive and repetitive PINN training for every new µ instance?
\quadBelow, we will address these issues by proposing a new neural network structure.
\quadThe overall structure is shown in the figure below. The PDE parameter u can be understood as an image in INR, and u is input into the hypernetwork to generate a weight matrix for the MLP in the middle layer. However, these methods usually assume the existence of a pre-trained model and approximate the model's weights by running a truncated SVD algorithm. The lower tributary is LR-PINN, and the upper one is Hypernetwork.
Insert image description here
\quadFor the above network, training is divided into two stages. In the first stage, basis vectors and supernetworks are learned. In the second stage, fine-tuning is performed for a specific set of test PDE parameters. The set of model parameters trained at each stage is shown below.
Insert image description here

3) Experimental part

\quadThe figure below depicts the training loss and test error of meta-learning with MAML, Reptile, and our method. Compared with optimization-based meta-learning algorithms, our hypernetwork-based approach simultaneously reduces the meta-initial weights of each task , thereby learning to perform well “on average” on randomly sampled training tasks.
Insert image description here
Rank structure: The following table shows the trainable model parameters of several methods. In the model of this article, each hidden layer has a different Rank structure, with the smallest number of parameters. (This method in this article may be a feasible small model method)
Insert image description here
\quadAnswers to three questions:

\quad Ablation on fixed or learnable basis vectors : Whether {U, V} is trainable. The author found that fixed {U, V} produces an order of magnitude higher prediction accuracy and convergence speed than trainable {U, V}. . (Proof that some parameters are fixed, which may be beneficial)
Insert image description here

4) Related work

\quad Meta-learning of PINNs and INRs . HyperPINNs also uses hypernetwork to generate model parameters, but it can only generate full-rank weights and does not have the ability to process parameterized partial differential equations. In the field of INRs, the method of obtaining the initial weight of INRs through meta-learning methods such as MAML and Reptile has been explored in "Learned initializations for optimizing coordinate-based neural representations", "Meta-learning sparse implicit neural representations", "Meta- "learning sparse compression networks" explores the acquisition of sparse representations of INRs.

\quad Low-rank formats in neural networks . In natural language processing, the models used (e.g., Bert) usually have hundreds of millions of model parameters, so improving computational efficiency during inference is one of the urgent issues. As a method, approximating layers to low-rank via truncated SVD has been explored. The low-rank layer modeling of MLPs "Initialization and regularization of factorized neural layers" and convolutional neural network structures "Speeding up convolutional neural networks with low-rank expansions" is generally studied. Low-rank formats have not been studied in the context of PINN or INR. The closest work to ours is SVD-PINNs, which represent hidden layers in decomposed form but always full-rank.

The difference between this paper and previous work is that the parameters of the intermediate layer are modeled in a low-rank form .

Guess you like

Origin blog.csdn.net/DUDUDUTU/article/details/134374146