Design and Implementation of Model Quantitative Investment Strategy Based on Reinforcement Learning

Author: Zen and the Art of Computer Programming

1 Introduction

In recent years, as the application of artificial intelligence (AI) in various industries has continued to grow, machine learning methods have also become more and more popular. Reinforcement Learning (RL) is a supervised learning method based on dynamic programming, which can train an agent (Agent) to complete a series of tasks, some of which are not rigid and require the agent to adjust according to the feedback of the environment. In recent years, reinforcement learning has been widely used in many fields such as the stock market, robot control, and financial fields. Quantitative investment is also an important research direction, and the method based on reinforcement learning is also a good choice. This article will introduce how to design and practice quantitative investment strategies through reinforcement learning methods from four aspects: the classic Monte Carlo method, Deep Reinforcement Learning (DRL), Actor-Critic Methods, and Advantage Actor-Critic (A2C) methods.

2. Related work

Monte Carlo method

The Monte Carlo method is a long-established and widely used computational numerical analysis method for simulating various probability distributions. The main idea is to use random number generators to simulate experiments, and to count the characteristics of the simulation results, so as to estimate the parameters of the unknown distribution. Because stochastic simulations can have errors, Monte Carlo methods are generally used to solve some simple probability problems, especially when sampling a large number of independent events. The basic process is shown in the figure below: image.pngwhere π(θ) represents the unknown target distribution, ω(θ) represents the sample distribution, and θ∼π represents the true value of the parameter. We want to use θ∗=(θ^ ,θ^ )+1-2η[r(θ^ )] as the parameter estimate, where η ∈ [0,1] is the weight and r(θ^ ) is the reward function of θ^*. This process can be used as a variant of the Monte Carlo method - the path index dividend method (Pathwise

Guess you like

Origin blog.csdn.net/universsky2015/article/details/131875153