Part Three: Reinforcement Learning: From the Control Problem

Author: Zen and the Art of Computer Programming

1 Introduction

Reinforcement Learning (Reinforcement Learning, RL) is a field in machine learning that aims to train an agent (Agent) to perform a task (Task), enabling it to solve a series of reward signals (Reward Signal) in a certain environment . The purpose of RL is to promote the agent (Agent) to learn to convert environmental feedback information into action instructions through continuous trial and error, so as to maximize the realization of its expected goals. Its key feature is that it faces a continuous and dynamic system. In each iteration, the agent (Agent) needs to decide what action to take in the current state, and receives environmental feedback information in real time. Feedback is used to improve the pros and cons of the strategy and find the optimal behavior strategy.

In the RL research community, the control problem (Control Problem) has been widely concerned. The control problem refers to the goal of an agent (Agent) in a given state. In the control problem, the agent needs to design a controller (Controller), which can apply appropriate control signals to the environment according to its own state estimation and knowledge learned from experience, so that the agent can achieve the desired goal.

Starting from the control problem, this paper expounds the relevant terms and basic concepts of RL, mainly including Markov decision process (Markov Decision Process, MDP), state, action, reward, state value function, Bellman equation, etc. Then, two main algorithms in reinforcement learning - Monte Carlo Methods and Temporal Difference Methods - are introduced, and specific operation steps are given. at last&

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132364024