Deeply strengthened stock trading
This project intends to utilize deep reinforcement learning in portfolio management . The frame structure is inspired by Q-Trader . The agent's reward is the net unrealized profit (meaning the stock is still in the portfolio and has not been cashed out) assessed at each action step . For each step of inaction, a negative penalty is added to the portfolio for missed opportunities to invest in "risk-free" Treasuries. Many new features and improvements have been made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from the ground up.
For inaction at each step, a negtive penalty is added to the portfolio as the missed opportunity to invest in "risk-free" Treasury bonds. A lot of new features and improvements are made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from scratch.
Key assumptions and limitations of the current framework:
- Transactions have no effect on the market
- Only supports a single stock type
- Only 3 basic operations: buy, hold, sell (no short selling or other complex operations)
- Broker only performs 1 portfolio reallocation operation at the end of each trading day
- All reallocations can be done at closing price
- No missing data in price history
- no transaction costs
The main challenges of the current framework:
- Implement algorithms from scratch with a thorough understanding of their strengths and weaknesses
- Build solid rewards (learning tends to be static / often gets stuck in local optima)
- Make sure the framework is scalable and extensible
Currently, state is defined as the normalized difference in adjacent daily stock prices plus n
the number of days [stock_price, balance, num_holding]
.
In the future, we plan to add other state-of-the-art deep reinforcement learning algorithms to the framework , such as proximal policy optimization (PPO) , and increase the complexity of states in each algorithm by building more complex price tensors, etc. With more A wide range of deep learning methods, such as convolutional neural networks or attention mechanisms. Additionally, we plan to integrate better pipelines for high-quality data sources, e.g. from vendors like Quandl ; and backtests, e.g. zipline .
getting Started
To install all libraries/dependencies used in this project, run
pip3 install -r requirement.txt
To train a DDPG agent or a DQN agent to, say, exceed the S&P 500 from 2010 to 2015, run
python3 train.py --model_name=model_name --stock_name=stock_name
model_name
is the model to use: eitherDQN
;DDPG
default isDQN
stock_name
is the stock used to train the model; defaults to^GSPC_2010-2015
the S&P 500 from Jan 1, 2010 to Dec 31, 2015window_size
is the span of observations (days); default is10
num_episode
is the number of episodes used for training; the default is10
initial_balance
is the initial balance of the portfolio; default is50000
To evaluate a DDPG or DQN agent, run
python3 evaluate.py --model_to_load=model_to_load --stock_name=stock_name
model_to_load
is the model to load; default isDQN_ep10
; alternatives areDDPG_ep10
etc.stock_name
is the stock used to evaluate the model; defaults to^GSPC_2018
, i.e. S&P 500 from 1/1/2018 to 12/31/2018initial_balance
is the initial balance of the portfolio; default is50000
where can stock_name
be referenced in or within a directory.data
model_to_laod
saved_models
To visualize training loss and portfolio value volatility history, run:
tensorboard --logdir=logs/model_events
where model_events
to find it in the directory logs
.
example result
Note that the results below are only obtained with 10 epochs of training.
Frequently Asked Questions (FAQ)
- How is this project different from other price prediction methods such as logistic regression or LSTM?
- Price prediction methods like logistic regression have numerical outputs that must be mapped (by some interpretation of the predicted price) to the action space (e.g. buy, sell, hold) respectively. On the other hand, reinforcement learning methods directly output the actions of the agent.
reference:
- Deep Q-learning with Keras and Gym
- Dual Deep Q-Network
- Play TORCS with Keras and Deep Deterministic Policy Gradient
- A Practical Deep Reinforcement Learning Approach to Stock Trading
- Learning the Introduction to Trading with Reinforcement Learning
- Adversarial Deep Reinforcement Learning in Portfolio Management
- A Deep Reinforcement Learning Framework for Financial Portfolio Management Problems