Control of a Quadrotor with Reinforcement Learning

Goal

Control a quadrotor with a neural network trained using reinforcement learning.
Policy network is a function directly mapping a state to rotor thrusts.

Related Work

Guided Policy Search with a MPC Controller
This work uses a policy that maps the raw sensor data to the rotor velocities.

Contribution

Propose a deterministic on-policy method using zero-bias, zero variance samples.
Use small number of high quality samples, so there is only a small burden in neural network.

Network Structure
input: {orientation(rotation matrix), position, angular velocity, linear velocity} --> 18-dimensional state vector
output: 4-dimensional action vector

Exploration Strategy
TRPO

猜你喜欢

转载自blog.csdn.net/weixin_42018112/article/details/88350713