Albert-Z-Guo/Deep-Reinforcement-Stock-Trading

Deeply strengthened stock trading

This project intends to utilize deep reinforcement learning in portfolio management . The frame structure is inspired by Q-Trader . The agent's reward is the net unrealized profit (meaning the stock is still in the portfolio and has not been cashed out) assessed at each action step . For each step of inaction, a negative penalty is added to the portfolio for missed opportunities to invest in "risk-free" Treasuries. Many new features and improvements have been made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from the ground up.

For inaction at each step, a negtive penalty is added to the portfolio as the missed opportunity to invest in "risk-free" Treasury bonds. A lot of new features and improvements are made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from scratch.

Key assumptions and limitations of the current framework:

  • Transactions have no effect on the market
  • Only supports a single stock type
  • Only 3 basic operations: buy, hold, sell (no short selling or other complex operations)
  • Broker only performs 1 portfolio reallocation operation at the end of each trading day
  • All reallocations can be done at closing price
  • No missing data in price history
  • no transaction costs

The main challenges of the current framework:

  • Implement algorithms from scratch with a thorough understanding of their strengths and weaknesses
  • Build solid rewards (learning tends to be static / often gets stuck in local optima)
  • Make sure the framework is scalable and extensible

Currently, state is defined as the normalized difference in adjacent daily stock prices plus nthe number of days [stock_price, balance, num_holding].

In the future, we plan to add other state-of-the-art deep reinforcement learning algorithms to the framework , such as proximal policy optimization (PPO) , and increase the complexity of states in each algorithm by building more complex price tensors, etc. With more A wide range of deep learning methods, such as convolutional neural networks or attention mechanisms. Additionally, we plan to integrate better pipelines for high-quality data sources, e.g. from vendors like Quandl ; and backtests, e.g. zipline .

getting Started

To install all libraries/dependencies used in this project, run

pip3 install -r requirement.txt

To train a DDPG agent or a DQN agent to, say, exceed the S&P 500 from 2010 to 2015, run

python3 train.py --model_name=model_name --stock_name=stock_name
  • model_nameis the model to use: either DQN; DDPGdefault isDQN
  • stock_nameis the stock used to train the model; defaults to ^GSPC_2010-2015the S&P 500 from Jan 1, 2010 to Dec 31, 2015
  • window_sizeis the span of observations (days); default is10
  • num_episodeis the number of episodes used for training; the default is10
  • initial_balanceis the initial balance of the portfolio; default is50000

To evaluate a DDPG or DQN agent, run

python3 evaluate.py --model_to_load=model_to_load --stock_name=stock_name
  • model_to_loadis the model to load; default is DQN_ep10; alternatives are DDPG_ep10etc.
  • stock_nameis the stock used to evaluate the model; defaults to ^GSPC_2018, i.e. S&P 500 from 1/1/2018 to 12/31/2018
  • initial_balanceis the initial balance of the portfolio; default is50000

where can stock_namebe referenced in or within a directory.datamodel_to_laodsaved_models

To visualize training loss and portfolio value volatility history, run:

tensorboard --logdir=logs/model_events

where model_eventsto find it in the directory logs.

example result

Note that the results below are only obtained with 10 epochs of training.

 

Frequently Asked Questions (FAQ)

  • How is this project different from other price prediction methods such as logistic regression or LSTM?
    • Price prediction methods like logistic regression have numerical outputs that must be mapped (by some interpretation of the predicted price) to the action space (e.g. buy, sell, hold) respectively. On the other hand, reinforcement learning methods directly output the actions of the agent.

reference:

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/130302751