[Thesis] Reinforcement learning based control input nonlinear underwater robot adaptive neural network control

[Thesis] Reinforcement learning based control input nonlinear underwater robot adaptive neural network control

Summary

This paper studies the trajectory tracking of a fully-driven autonomous underwater vehicle moving in the horizontal plane. In our control design, external disturbance, control input nonlinearity and model uncertainty are considered. Based on the dynamic model in the discrete time domain, two neural networks (including a critical neural network and an acting neural network) are integrated into our adaptive control design. The critical neural network is introduced to evaluate the long-term performance of the designed controller in the current time step, and the neural network is used to compensate for unknown dynamics. In order to eliminate the nonlinearity of the control input of the underwater robot, a compensation term is also designed in the adaptive control. Strict theoretical analysis proves the stability and performance of the control law. In addition, a large number of numerical simulation results verify the robustness and effectiveness of the control method.

Index term-adaptive control, autonomous underwater vehicle, neural network, trajectory tracking.

Introduction

At present, underwater robots, including autonomous underwater vehicles (AUV), remotely controlled submersibles (ROV) and underwater gliders, have been widely used in various underwater tasks[1]–[5]. AUV is also involved in scientific research on oceans, seabeds and lakes. When underwater robots perform underwater tasks, precise motion control is essential. However, this is a challenge because of the model's nonlinear, coupled, and time-varying hydrodynamic coefficient dynamics, which requires further study.

Underwater robots usually move in three-dimensional space with 6 degrees of freedom, and there are coupling dynamics between their planar motion and diving motion. In most studies, the underwater robot model is always decoupled, making the application of various control methods possible. Several methods have been proposed for tracking the trajectory of an underwater robot in three-dimensional space, especially for planning sports or diving. The nonlinear underwater vehicle model is usually linearized first, and then the controller is designed based on the linear model [8], [9]. On the basis of the decoupling model, Wen [6] analyzed the diving control of the underwater robot, and used a differentiator to improve the noise attenuation performance, so as to realize the active disturbance rejection control. By decoupling the depth and heading motion, a fuzzy depth partial discharge controller is designed in [10]. In addition, in the literature [8], by converting the path tracking error into a Serret-Frenet framework and linearizing the error dynamics, an output feedback control of an underwater robot moving in a vertical plane is proposed. For the plane motion control of underwater robots, literature [7] proposed a nonlinear control with full drive and underdrive configuration. They analyzed the effectiveness of the sideslip angle of the underwater robot in detail. In addition, a tilting thruster structure is proposed in the literature [3], and selective switching control is designed for two decoupled three-degree-of-freedom subsystems. In [11], both the current induction ship model and the general vehicle model are considered, and the former model considers the main current load. Then use the cascade system theory and the observer to design the nonlinear Luenberg observer and controller of the underwater robot. In addition, these results show that the performance of the model-based controller is better than the traditional partial discharge control. In this case, the model dynamics in the controller should be corrected when there is a deviation.

Literature [12]-[14] also studied the optimal control based on the dynamic model of underwater robots. In literature [12], an optimal control is designed to control the trajectory of the underwater robot on the kinematics level, and the cost function is described as the kinetic energy cost. Then design an appropriate Hamiltonian according to the maximum principle, and finally get the optimal solution. Aiming at the non-human-like underwater vehicle model, a nonlinear suboptimal control method is proposed, and the state-dependent Riccati equation controller is applied to the point-to-point tracking of the NPS II underwater vehicle [13]. Taking the uncertainty boundary as one of the cost functions, an optimal control problem is obtained by transforming the original robust control problem; then, indirect robust depth control is proposed [14].

The hydrodynamic parameters of underwater robots are usually obtained through computational fluid dynamics methods or towing experiments. However, due to the time-varying environment and state changes that occur during underwater missions, the obtained hydrodynamic parameters are not constant [15]. Therefore, when designing a suitable controller, external disturbances and the uncertainty of model parameters should be considered [16]–[23]. In order to solve the uncertainty of model parameters, PID parameter adjustment based on Mamdani fuzzy rules was used in [24], and then the control design was decoupled into two channels of heading and depth. Literature [25] proposed a discrete time-delay control method, which directly estimates the dynamics of the underwater vehicle and compensates for the uncertainty of the model through time-delay estimation.

The speed of underwater robots can be measured by Doppler Velocity Recording (DVL), and the update speed of new data is usually very slow. In order to enhance the robustness of unmodeled dynamics and external disturbances of underwater robots using DVL, integral sliding mode control is introduced in [26]. Literature [27] gives a new method to compensate for bounded external disturbances and model uncertainty, gives the integral of the error sign control structure, and establishes the semi-global asymptotic tracking performance through Lyapunov stability analysis. Literature [28] combined sliding mode control and reverse thrust to design an underwater robot trajectory tracking controller with parameter uncertainty and external interference.

In order to solve the external interference, the interference force measurement method is introduced in [2] to measure the force/torque acting on the underwater vehicle; then, based on the predicted response of the dynamic model, feedforward control is used in the vehicle. Disturbance observer is another main method to compensate for unknown external disturbances [11], [20], [29]–[32]. Literature [20] used a nonlinear observer to estimate the low-frequency motion and wave-frequency motion of the underwater robot, and designed a nonlinear tracking control for the underwater robot motion under the interference of shallow water waves. In order to control the vehicles in the adjacent space, a sliding mode tracking control based on disturbance observer is applied in [32]. In addition, in [33], the adaptive tracking control of a fully driven surface ship using a disturbance observer is designed.

Due to the function approximation capabilities of neural networks, fuzzy approximators, neural networks, and fuzzy control-based algorithms have been extensively studied to compensate for environmental disturbances and the model uncertainty of underwater robots [34]–[41]. In [35], neural network approximation is used to compensate for unknown model parameters and external disturbances caused by ocean currents and waves, and to achieve consistent and final boundedness of tracking errors. Neural networks are used to solve the model uncertainty of underwater robots, and dynamic surface control is also used in the control design in [36]. In [38], the nonlinear uncertainty of underwater vehicle dynamics is approximated by a two-layer neural network. In order to control the diving of underwater robots, literature [42] proposed an adaptive control method based on a stable neural network. Literature [43] proposed neural network adaptive control for multiple unmanned surface ships, and a local observer estimated the unmeasured state. In [44], a radial basis function neural network was proposed to derive the adaptive controller of the system affected by external disturbance and unknown lag. In the recent work [45], a discrete-time nonlinear system with non-pure feedback affected by the input dead zone is considered. In order to compensate the dead zone, an adaptive compensation term and an n-step lead predictor are constructed by transforming the original system.

The actual control system of the underwater robot is usually implemented digitally on an embedded computer through a sampler. Therefore, the continuous-time controller needs to be converted into a discrete-time version [46]. By directly using the discrete-time model, we developed trajectory tracking control in the presence of external disturbances, model parameter uncertainties and nonlinear control input. It should be noted that there are many methods to solve the input nonlinearity problem, such as input dead zone and saturation [47]–[52]. Based on the backstepping method and Lyapunov analysis, an adaptive trajectory tracking controller is designed to overcome the uncertainty of model parameters in [51], and the saturation function is used to solve the problem of actuator saturation. In order to prevent violation of speed constraints, a robust adaptive controller for underwater robots was proposed in [48], and obstacle Lyapunov functions were used in Lyapunov synthesis. In [52], a new dynamic surface control method was proposed for a pure feedback system with unknown input dead zone. Due to the use of differential scanning calorimetry, the complexity is significantly reduced. Aiming at the multi-input multi-output nonlinear system, considering the unknown dead zone and control direction, a new adaptive control method based on neural network is proposed. In addition, reinforcement learning has been researched and applied in many fields, such as machine learning and artificial intelligence [53]–[55]. Reinforcement learning was first investigated from the perspective of computer science in [53]. In [54], the "goalkeeper" of a football team is trained to learn when to hold or pass the ball. In addition, in [55], deep Q learning was proposed to successfully solve more than 20 simulation tasks with continuous control space. In this paper, inspired by the work of [45], [56] and [57], we propose a reinforcement learning technique that uses two neural networks to achieve optimal trajectory tracking of underwater robots. Unknown nonlinearity and interference are approximated by neural network; at the same time, the tracking evaluation of tracking performance is approximated by critical neural network. In addition, adaptive compensation for nonlinear control input is also considered. The preliminary results of this paper have been given in [58], and have been extended by considering not only the dead zone and saturation of the actuator, but also the nonlinear relationship between nominal force/torque and actual force/torque. More, a nonlinear compensation strategy is proposed, which will be discussed later.
Insert picture description here
The rest of this article is organized as follows. We introduced the nonlinear model of the underwater robot in the second section. The third section designs an adaptive neural network. Sections 4 and 5 respectively introduce simulation research and conclusions

equation

Equation of motion

As mentioned in the first section, underwater robots usually move in a three-dimensional space with 6 degrees of freedom, leading to coupling dynamics in their planning and diving motion. In order to facilitate the control design, the model is usually decoupled, and the designed control will be verified using coupled nonlinear dynamics. We consider the plane motion of an underwater robot with 3 degrees of freedom, as shown in Figure 1. Let us express the position coordinates of the underwater robot as (x, y), the yaw as (ψ) in the inertial coordinate system, and the speed as (u) in the surge in the body coordinates of the underwater robot, swing V in and r in yaw. In addition, let us denote the inertial matrix of the underwater robot as M, and the Coriolis acceleration and centripetal acceleration and the damping matrix as C(ν) and D(ν), respectively. In addition, we express the force and moment produced by gravity and buoyancy as g(η). Consider the existence of unknown external interference and model parameter uncertainty; then, the dynamics of the underwater vehicle can be given as follows:
Insert picture description here

This abridged

(The control design in this article is mainly for the three-degree-of-freedom model. Based on the full drive model used in this article, we can easily extend the control strategy to 6 degrees of freedom)

in conclusion

This paper proposes an adaptive trajectory tracking control law for a fully driven underwater vehicle based on neural network approximation in the discrete time domain. Reinforcement learning algorithms based on neural networks have been used to solve unknown disturbances, parameter uncertainties and control input nonlinearities. The controller embeds two neural networks: the first key neural network is used to evaluate the long-term performance of the controller in the current time step, and the second action neural network is used to compensate for unknown dynamics. Through rigorous theoretical analysis and a large number of simulation studies, the robustness and effectiveness of the method are proved. The future research direction is to apply the proposed control to actual systems.

Paper link

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109231360