Deep learning to solve differential equation series four: a PINN solution method based on adaptive activation function - Burger equation inverse problem

Below I will introduce the embedded physics knowledge neural network (PINN) to solve differential equations. Firstly, the basic method of PINN is introduced, and the PINN solution framework based on the adaptive activation function is used to solve the inverse problem of the one-dimensional Burger equation with time term using Pytorch.
Embedded Physical Knowledge Neural Network (PINN) Introduction and Related Papers
Deep Learning to Solve Differential Equations Series 1: PINN Solution Framework (Poisson 1d)
Deep Learning to Solve Differential Equations Series 2: PINN to Solve Burger Equation Forward Problems
Deep Learning to Solve Differential Equations Series 3: PINN solves the inverse problem of the burger equation
Deep learning to solve the differential equation series IV: Based on the adaptive activation function PINN solves the inverse problem of the burger equation
Deep learning to solve the differential equation series V: PINN solves the direct and inverse problem of the Navier-Stokes equation

1. Introduction to PINN

As a powerful information processing tool, neural network has been widely used in the fields of computer vision, biomedicine, and oil and gas engineering, triggering technological changes in many fields. The deep learning network has a very strong learning ability, not only can discover physical laws, but also solve partial differential equations. In recent years, the solution of partial differential equations based on deep learning has become a new research hotspot. Embedded physics-informed neural network (PINN) is an application of scientific machines in the traditional numerical field, which can be used to solve various problems related to partial differential equations (PDE), including equation solving, parameter inversion, model discovery, control and optimization etc.

2. PINN method

The main idea of ​​PINN is shown in Figure 1, first construct an output result as u ^ \hat{u}u^ 's neural network, which is used as a proxy model for the PDE solution, and the PDE information is used as a constraint, encoded into the neural network loss function for training. The loss function mainly includes four parts: partial differential structure loss (PDE loss), boundary value condition loss (BC loss), initial value condition loss (IC loss) and real data condition loss (Data loss).
insert image description here

Figure 1: Schematic diagram of PINN

In particular, consider the following PDE problem, where the solution of the PDE u ( x ) u(x)u(x) Ω ⊂ R d \Omega \subset \mathbb{R}^{d} OhRd definition, wherex = ( x 1 , … , xd ) \mathbf{x}=\left(x_{1}, \ldots, x_{d}\right)x=(x1,,xd)
f ( x ; ∂ u ∂ x 1 , … , ∂ u ∂ x d ; ∂ 2 u ∂ x 1 ∂ x 1 , … , ∂ 2 u ∂ x 1 ∂ x d ) = 0 , x ∈ Ω f\left(\mathbf{x} ; \frac{\partial u}{\partial x_{1}}, \ldots, \frac{\partial u}{\partial x_{d}} ; \frac{\partial^{2} u}{\partial x_{1} \partial x_{1}}, \ldots, \frac{\partial^{2} u}{\partial x_{1} \partial x_{d}} \right)=0, \quad \mathbf{x} \in \Omega f(x;x1u,,xdu;x1x12 and,,x1xd2 and)=0,xΩAt
the same time, satisfy the following boundary
B ( u , x ) = 0 on ∂ Ω \mathcal{B}(u, \mathbf{x})=0 \quad \text { on } \quad \partial \OmegaB(u,x)=0 on Ω

The PINN solution process mainly includes:

  • The first step is to define the neural network model of the fully connected layer of D layer:
    N Θ : = LD ∘ σ ∘ LD − 1 ∘ σ ∘ ⋯ ∘ σ ∘ L 1 N_{\Theta}:=L_D \circ \sigma \circ L_{D-1} \circ \sigma \circ \cdots \circ \sigma \circ L_1NTh:=LDpLD1ppL1
    式中:
    L 1 ( x ) : = W 1 x + b 1 , W 1 ∈ R d 1 × d , b 1 ∈ R d 1 L i ( x ) : = W i x + b i , W i ∈ R d i × d i − 1 , b i ∈ R d i , ∀ i = 2 , 3 , ⋯ D − 1 , L D ( x ) : = W D x + b D , W D ∈ R N × d D − 1 , b D ∈ R N . \begin{aligned} L_1(x) &:=W_1 x+b_1, \quad W_1 \in \mathbb{R}^{d_1 \times d}, b_1 \in \mathbb{R}^{d_1} \\ L_i(x) &:=W_i x+b_i, \quad W_i \in \mathbb{R}^{d_i \times d_{i-1}}, b_i \in \mathbb{R}^{d_i}, \forall i=2,3, \cdots D-1, \\ L_D(x) &:=W_D x+b_D, \quad W_D \in \mathbb{R}^{N \times d_{D-1}}, b_D \in \mathbb{R}^N . \end{aligned} L1(x)Li(x)LD(x):=W1x+b1,W1Rd1×d,b1Rd1:=Wix+bi,WiRdi×di1,biRdi,i=2,3,D1,:=WDx+bD,WDRN×dD1,bDRN.
    and σ \sigmaσ is the activation function,WWW andbbb is the weight and bias parameters.
  • The second step, in order to measure the neural network u ^ \hat{u}u^和约束之间的差异,考虑损失函数定义:
    L ( θ ) = w f L P D E ( θ ; T f ) + w i L I C ( θ ; T i ) + w b L B C ( θ , ; T b ) + w d L D a t a ( θ , ; T d a t a ) \mathcal{L}\left(\boldsymbol{\theta}\right)=w_{f} \mathcal{L}_{PDE}\left(\boldsymbol{\theta}; \mathcal{T}_{f}\right)+w_{i} \mathcal{L}_{IC}\left(\boldsymbol{\theta} ; \mathcal{T}_{i}\right)+w_{b} \mathcal{L}_{BC}\left(\boldsymbol{\theta},; \mathcal{T}_{b}\right)+w_{d} \mathcal{L}_{Data}\left(\boldsymbol{\theta},; \mathcal{T}_{data}\right) L( i )=wfLPDE( i ;Tf)+wiLIC( i ;Ti)+wbLBC( i ,;Tb)+wdLData( i ,;Tdata)
    where:
    L P D E ( θ ; T f ) = 1 ∣ T f ∣ ∑ x ∈ T f ∥ f ( x ; ∂ u ^ ∂ x 1 , … , ∂ u ^ ∂ x d ; ∂ 2 u ^ ∂ x 1 ∂ x 1 , … , ∂ 2 u ^ ∂ x 1 ∂ x d ) ∥ 2 2 L I C ( θ ; T i ) = 1 ∣ T i ∣ ∑ x ∈ T i ∥ u ^ ( x ) − u ( x ) ∥ 2 2 L B C ( θ ; T b ) = 1 ∣ T b ∣ ∑ x ∈ T b ∥ B ( u ^ , x ) ∥ 2 2 L D a t a ( θ ; T d a t a ) = 1 ∣ T d a t a ∣ ∑ x ∈ T d a t a ∥ u ^ ( x ) − u ( x ) ∥ 2 2 \begin{aligned} \mathcal{L}_{PDE}\left(\boldsymbol{\theta} ; \mathcal{T}_{f}\right) &=\frac{1}{\left|\mathcal{T}_{f}\right|} \sum_{\mathbf{x} \in \mathcal{T}_{f}}\left\|f\left(\mathbf{x} ; \frac{\partial \hat{u}}{\partial x_{1}}, \ldots, \frac{\partial \hat{u}}{\partial x_{d}} ; \frac{\partial^{2} \hat{u}}{\partial x_{1} \partial x_{1}}, \ldots, \frac{\partial^{2} \hat{u}}{\partial x_{1} \partial x_{d}} \right)\right\|_{2}^{2} \\ \mathcal{L}_{IC}\left(\boldsymbol{\theta};\mathcal{T}_{i}\right) &=\frac{1}{\left|\mathcal{T}_{i}\right|} \sum_{\mathbf{x}\in \mathcal{T }_{i}}\|\hat{u}(\mathbf{x})-u(\mathbf{x})\|_{2}^{2} \\ \mathcal{L}_{BC} \left(\ballsymbol{\theta};\mathcal{T}_{b}\right) &=\frac{1}{\left|\mathcal{T}_{b}\right|}\sum_{\ mathbf{x} \in \mathcal{T}_{b}}\|\mathcal{B}(\hat{u}, \mathbf{x})\|_{2}^{2}\\\mathcal {L}_{Data}\left(\bold symbol{\theta}; \mathcal{T}_{data}\right) &=\frac{1}{\left|\mathcal{T}_{data}\ right|} \sum_{\mathbf{x} \in \mathcal{T}_{data}}\|\hat{u}(\mathbf{x})-u(\mathbf{x})\|_{ 2}^{2} \end{aligned}=\frac{1}{\left|\mathcal{T}_{b}\right|}\sum_{\mathbf{x}\in \mathcal{T}_{b}}\|\mathcal{B} (\hat{u}, \mathbf{x})\|_{2}^{2}\\ \mathcal{L}_{Data}\left(\bold symbol{\theta}; \mathcal{T}_ {data}\right) &=\frac{1}{\left|\mathcal{T}_{data}\right|} \sum_{\mathbf{x}\in \mathcal{T}_{data}} \|\hat{u}(\mathbf{x})-u(\mathbf{x})\|_{2}^{2} \end{aligned}=\frac{1}{\left|\mathcal{T}_{b}\right|}\sum_{\mathbf{x}\in \mathcal{T}_{b}}\|\mathcal{B} (\hat{u}, \mathbf{x})\|_{2}^{2}\\ \mathcal{L}_{Data}\left(\bold symbol{\theta}; \mathcal{T}_ {data}\right) &=\frac{1}{\left|\mathcal{T}_{data}\right|} \sum_{\mathbf{x}\in \mathcal{T}_{data}} \|\hat{u}(\mathbf{x})-u(\mathbf{x})\|_{2}^{2} \end{aligned}LPDE( i ;Tf)LIC( i ;Ti)LBC( i ;Tb)LData( i ;Tdata)=Tf1xTff(x;x1u^,,xdu^;x1x12u^,,x1xd2u^)22=Ti1xTiu^(x)u(x)22=Tb1xTbB(u^,x)22=Tdata1xTdatau^(x)u(x)22
    w f w_{f} wf w i w_{i} wi w b w_{b} wband wd w_{d}wdis the weight. T f \mathcal{T}_{f}Tf T i \mathcal{T}_{i} Ti T b \mathcal{T}_{b} Tb T d a t a \mathcal{T}_{data} TdataRepresents residual points from PDE, initial value, boundary value and true value. Here T f ⊂ Ω \mathcal{T}_{f} \subset \OmegaTfΩ is a predefined set of points to measure the neural network outputu ^ \hat{u}u^ Extent of match to PDE.
  • Finally, use the gradient optimization algorithm to minimize the loss function until the network parameters that meet the prediction accuracy are found. KaTeX parse error: Undefined control sequence: \theat at position 1: \̲t̲h̲e̲a̲t̲^{*} .

It is worth noting that for inverse problems, i.e. some parameters in the equation are unknown. If only the PDE equation and boundary conditions are known, and the PDE parameters are unknown, the inverse problem is an indeterminate problem, so other information must be known, such as some observation points uuthe value of u . In this case, the PINN method can use the parameters in the equation as unknown variables and add them to the trainer for optimization. The loss function includes Data loss.

3. PINN based on adaptive activation function

Brown University Jagtap proposed PINN based on adaptive activation function in 19 years. Specifically, a trainable parameter is introduced in the activation function. Since the topology of the loss function involved in the optimization process will change dynamically, this parameter will be optimized during the training process to achieve the best performance of the network. Compared with the traditional PINN with fixed activation function, PINN based on adaptive activation function has better learning ability, which greatly improves the convergence speed and solution accuracy, especially in the early training.

  • A neural network with a fixed activation function is represented as follows:
    L k ( xk − 1 ) : = wkxk − 1 + bku Θ ( x ) = ( L k ∘ σ ∘ L k − 1 ∘ … ∘ σ ∘ L 1 ) ( x ) \begin{aligned} &\mathcal{L}_k\left(x^{k-1}\right):=w^kx^{k-1}+b^k \\ &u_{\Theta}(x) =\left(\mathcal{L}_k \circ \sigma \circ \mathcal{L}_{k-1} \circ \ldots \circ \sigma \circ \mathcal{L}_1\right)(x) \ end{aligned}Lk(xk1):=wkxk1+bkuTh(x)=(LkpLk1pL1)(x)
    In the formula: use a fixed activation function.
  • The neural network acquisition based on adaptive parameters adds a parameter before the output passes through the activation function:
    σ ( a L k ( xk − 1 ) ) a ∗ = arg ⁡ min ⁡ a ∈ R + \ { 0 } ( J ( a ) ) \begin{aligned} &\sigma\left(a \mathcal{L}_k\left(x^{k-1}\right)\right) \\ &a^*=\underset{a \in \mathbb {R}^{+} \backslash\{0\}}{\arg \min }(J(a)) \end{aligned}p(aLk(xk1))a=aR+\{ 0}argmin( J ( a ) )
    In the formula: variable parameter a ∗ a^{*}a will be added to the neural network optimizer, and will be optimized with neural network weight parameters during the training process.

Jagtap A D, Kawaguchi K, Karniadakis G E. Adaptive activation functions accelerate convergence in deep and physics-informed neural networks[J]. Journal of Computational Physics, 2020, 404: 109136.

4. Solving problem definition - inverse problem

u t + u u x = v u x x , x ∈ [ − 1 , 1 ] , t > 0 u ( x , 0 ) = − sin ⁡ ( π x ) u ( − 1 , t ) = u ( 1 , t ) = 0 \begin{aligned} u_t+u u_x &=v u_{x x}, x \in[-1,1], t>0 \\ u(x, 0) &=-\sin (\pi x) \\ u(-1, t) &=u(1, t)=0 \end{aligned} ut+u u uxu(x,0)u(1,t)=vuxx,x[1,1],t>0=sin ( π x )=u(1,t)=0

In the formula: parameter vvv is an unknown parameter, the real value isv ∈ [ 0 , 0.1 / π ] v \in[0,0.1 / \pi]v[0,0.1 / π ] . _ _ The numerical solution is obtained by Hopf-Cole transformation, as shown in Figure 2.
Task requirements:

  • The task is to know the boundary conditions and differential equations, but the parameters in the equations are unknown, and solve u and the equation parameters.
  • This problem is a typical inverse problem, an inversion problem of optimizing equation parameters.

    Please add a picture description
Figure 2: Burger Numerical Solution

5. Result display

The training process and the parameter change diagram are shown in Figure 3. It can be clearly seen that in the early stage of training, the PINN using the adaptive activation function can descend faster and converge to the exact value.

insert image description here

Figure 3: Variation diagram of training process problem parameters and training error

The prediction results during training are shown in Figure 4-6.

insert image description here

Figure 4: Prediction Error Plot

insert image description here

Figure 5: Prediction graph

insert image description here

Figure 6: Forecast results at different times

Guess you like

Origin blog.csdn.net/weixin_45521594/article/details/127781628