How to get started with reinforcement learning?

Reprinted from: https://www.itcodemonkey.com/article/3646.html

From: qubit (WeChat: QbitAI), compiled by Wang Xiaoxin

We know very little about how the brain works, but we do know that the brain learns by trial and error. We are rewarded for making good choices and punished for making inappropriate choices, which is how we adapt to our environment. Today, we can model this specific process in software using powerful computing power, which is reinforcement learning.

A recent article on the Algorithmia blog gave a detailed introduction to reinforcement learning in five areas: fundamentals, decision-making process, practical applications, practical challenges, and learning resources. The qubits are transported, and the following is the translation:

Basic knowledge

We can understand Reinforcement Learning (RL), one of the simplest mental models, in terms of video games. It just so happens that video games are also one of the most widely used areas of reinforcement learning algorithms. In classic video games, there are the following categories of objects:

  • Agent (agent, that is, intelligent body), can move freely, corresponding to the player;

  • Actions, made by agents, including moving up and selling items, etc.;

  • Rewards, obtained by agents, including gold coins and killing other players;

  • Environment, which refers to the map or room where the agent is located;

  • State, which refers to the current state of the agent, such as being located in a specific block in the map or in a corner of a room;

  • Goal, which means that the agent's goal is to get as many rewards as possible;

The above objects are specific components of reinforcement learning, and of course, various parts of machine learning can also be imitated. In reinforcement learning, after setting up the environment, we can guide the agent through state-by-state, and get rewarded when the agent makes the right action. The above process is better understood if you understand the Markov Decision Process (https://en.wikipedia.org/wiki/Markov_decision_process).

In the maze in the picture below, there is a mouse:

Imagine you are the mouse, what would you do in order to collect as many rewards (drops and cheese) as possible in the maze? In each state, the location in the maze, you calculate what steps you need to take to get a nearby reward. When there are 3 rewards on the right and 1 reward on the left, you will choose to go right.

This is how reinforcement learning works. In each state, the agent computes and evaluates all possible actions (up, down, left, and right), and chooses the action that yields the most reward. After a few steps, the mice in the maze become familiar with the maze.

But how do you decide which action will yield the best results?

Decision-making process

Decision Making in reinforcement learning, that is, how to make the agent make the correct action in the reinforcement learning environment, here are two ways.

strategy learning

Policy Learning can be understood as a set of very detailed instructions that tell the agent what to do at each step. This strategy can be likened to: when you are close to the enemy, if the enemy is stronger than you, back away. We can also think of this policy as a function that has only one input, the current state of the agent. But knowing your strategy in advance is not easy, we need to understand this complex function that maps states to goals.

There is some interesting research on using deep learning to explore policy problems in reinforcement learning scenarios. Andrej Karpathy built a neural network to teach an agent to play table tennis (http://karpathy.github.io/2016/05/31/rl/). This does not sound surprising, since neural networks can approximate arbitrarily complex functions very well.

   pingpong  

Q-Learning algorithm

Another way to instruct an agent is to let the agent act independently based on the current environment, given the frame, rather than explicitly telling it what to do in each state. Unlike policy learning, the Q-Learning algorithm takes two inputs, a state and an action, and returns a corresponding value for each state-action pair. When faced with a choice, the algorithm calculates the expected value of the agent taking different actions (up, down, left, and right).

The innovation of Q-Learning is that it not only estimates the short-term value of the action taken in the current state, but also obtains the potential future value that the specified action may bring. Similar to discounted cash flow analysis in corporate finance, it also takes into account all potential future values ​​when determining the current value of an action. Since the future reward will be less than the current reward, the Q-Learning algorithm also uses a discount factor to simulate this process.

Policy learning and Q-Learning algorithms are the two main methods for guiding agents in reinforcement learning, but some researchers have tried to combine the two using deep learning techniques, or have come up with other innovative solutions. DeepMind proposed a neural network (https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf), called Deep Q Networks (DQN), to approximate the Q-Learning function and obtain A very good effect. Later, they combined the Q-Learning method with policy learning and proposed a method called A3C (https://arxiv.org/abs/1602.01783).

Combining neural networks with other methods can sound complicated. Remember, these training algorithms all have a simple goal of effectively coaching the agent throughout the environment for maximum reward.

practical application

Although reinforcement learning research has been underway for decades, it is reported that its implementation in the current business environment is limited (https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning- in-industry). There are many reasons for this, but they all face a common problem: there is still a certain gap between the performance of reinforcement learning on some tasks and the current applied algorithms.

Most of the applications of reinforcement learning in the past decade have been in video games. State-of-the-art reinforcement learning algorithms have achieved great results in classic and modern games, beating human players by a large margin in some games.

The above image is from DeepMind's DQN paper. In more than half of the games tested, the agents in the paper were able to outperform human benchmarks, often by twice the human level. But in some games, the algorithm performed worse than humans.

Reinforcement learning also has some successful practical applications in robotics and industrial automation. We can understand robots as agents in the environment, and reinforcement learning has been shown to be a viable guidance scheme. It is worth mentioning that Google also uses reinforcement learning to reduce the operating costs of data centers.

Reinforcement learning is also expected to have applications in medicine and education, but most of the current research is still in the laboratory stage.

practical challenges

The application prospects of reinforcement learning are very bright, but the road to practice can be very tortuous.

The first is the problem of data. Reinforcement learning often requires a large amount of training data to achieve performance levels that other algorithms can efficiently achieve. DeepMind recently proposed a new algorithm, called RainbowDQN, which requires 18 million frames of Atari game interface, or about 83 hours of game video to train the model, and the time for humans to learn to play is much less than the algorithm. This problem also arises in the task of gait learning.

Another challenge of reinforcement learning in practice is domain-specificity. Reinforcement learning is a general-purpose algorithm that should theoretically be applicable to a wide variety of different types of problems. However, most of these problems have a domain-specific solution that often outperforms reinforcement learning methods, such as the online trajectory optimization of the MuJuCo robot. Therefore, we have to weigh the relationship between range and strength.

Finally, in reinforcement learning, the most pressing problem at the moment is designing the reward function. When designing rewards, algorithm designers usually have some subjective understanding. Even in the absence of this problem, reinforcement learning can get stuck in local optima during training.

Many challenges in reinforcement learning practice have been mentioned above, and it is hoped that follow-up research will continue to solve these problems.

Learning Resources

function library

1. RL-Glue: Provides a standard interface that connects reinforcement learning agents, environments, and experimental procedures, and enables cross-language programming.

Address: http://glue.rl-community.org/wiki/Main_Page

2. Gym: Developed by OpenAI, it is a toolkit for developing reinforcement learning algorithms and performance comparisons. It can train agents to learn many tasks, including walking and playing ping pong games.

Address: https://gym.openai.com/

3. RL4J: It is a reinforcement learning framework integrated under the deeplearning4j library, which has obtained the Apache 2.0 open source license.

Address: https://github.com/deeplearning4j/rl4j

4. TensorForce: A TensorFlow library for reinforcement learning.

Address: https://github.com/reinforceio/tensorforce

Proceedings

1. Master chess and shogi by playing against yourself with general reinforcement learning algorithms

题目:Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Address: https://arxiv.org/abs/1712.01815

This article has 13 authors and proposes the AlphaZero method. In this paper, the authors generalize the previous AlphaGo Zero approach to a single AlphaZero algorithm that can achieve superhuman performance in multiple challenging domains, again utilizing "whiteboard" reinforcement learning ("whiteboard" It means that all knowledge is acquired by senses and experience, i.e. learning from scratch). Starting with a random game of chess, with no input of domain knowledge other than the rules of the game, AlphaZero achieved superhuman-level performance in chess, shogi, and Go within 24 hours, and performed convincingly on all three. The results beat the current world championship program.

2. Overview of Deepening Reinforcement Learning

题目:Deep Reinforcement Learning: An Overview

Address: https://arxiv.org/abs/1701.07274

This paper provides an overview of some of the latest exciting work in deep reinforcement learning, focusing on six core elements, six important mechanisms, and twelve related applications. The article begins with an introduction to the background of machine learning, deep learning, and reinforcement learning, and then discusses the core elements of reinforcement learning, including DQN networks, policies, rewards, models, planning, and search.

3. Play Atari games with deep reinforcement learning

题目:Playing Atari with Deep Reinforcement Learning

Address: https://arxiv.org/abs/1312.5602

This is a 2014 NIPS paper by DeepMind. This paper proposes a deep learning approach that uses reinforcement learning to learn control policies directly from high-dimensional perceptual inputs. The model is a convolutional neural network, trained using a variant of Q-learning, where the input is raw pixels and the output is a value function that predicts future rewards. This method was applied to the Atari 2600 game without adjusting the structure and learning algorithm, and outperformed previous methods in 6 of the 7 games tested and exceeded human level in 3 of them.

4. Human-Level Control Using Deep Reinforcement Learning

题目:Human-Level Control Through Deep Reinforcement Learning

Address: https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf

This is a 2015 Nature paper by DeepMind. Reinforcement learning theory, rooted in psychology and neuroscience about animal behavior, provides a good explanation of how agents optimize their control of their environment. In order to successfully use reinforcement learning algorithms in the real complex physical world, agents must face the difficult task of exploiting high-dimensional sensor input data, deriving effective representations of the environment, and generalizing prior experience to new, unknown environments .

Lecture Tutorial

1. Reinforcement Learning (Georgia Tech, CS 8803)

Address: https://www.udacity.com/course/reinforcement-learning-ud600

Official website introduction: If you are interested in machine learning and want to learn from a theoretical perspective, you should choose this course. This course takes you to explore the charm of automatic decision-making from the perspective of computer science by introducing classic papers and the latest work. This course examines efficient algorithms for both single-agent and multi-agent planning and learning near-optimal decisions from experience. By the end of the course, you will have the ability to reproduce published papers in reinforcement learning.

2. Reinforcement Learning (Stanford, CS234)

Address: http://web.stanford.edu/class/cs234/index.html

Official website introduction: To achieve true artificial intelligence, the system must be able to learn and make the right decisions on its own. Reinforcement learning is one such powerful paradigm that can be applied to many tasks, including robotics, gaming, consumer modeling, and healthcare. This course introduces the knowledge of reinforcement learning in detail. Through learning, you can understand the current problems and main methods, including how to generalize and search.

3. Deep Reinforcement Learning (Berkeley, CS 294, Fall 2017)

Address: http://rll.berkeley.edu/deeprlcourse/

Official website introduction: This course requires certain basic knowledge, including reinforcement learning, numerical optimization and machine learning. Students unfamiliar with the concepts below are encouraged to read the references provided below in advance. These are briefly reviewed before the class begins.

4. Play Deep Reinforcement Learning with Python (Udemy Advanced Tutorial)

Address: https://www.udemy.com/deep-reinforcement-learning-in-python/

Official website introduction: This course mainly introduces the application of deep learning and neural networks in reinforcement learning. This course requires some basic knowledge (including Reinforcement Learning Fundamentals, Markov Decision Making, Dynamic Programming, Monte Carlo Search, and Temporal Difference Learning), as well as deep learning fundamentals programming.

Finally, the original address is here: https://blog.algorithmia.com/introduction-to-reinforcement-learning/

Hope this article gave you a better introduction to reinforcement learning

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324139378&siteId=291194637