Python stock trading --- mean regression

Disclaimer : The information provided in this article is for educational purposes only and should not be considered professional investment advice. It is important to do your own research and exercise caution when making investment decisions. Investing involves risk and any investment decisions you make are entirely your own responsibility.

In this article, you will learn what is the mean reversion trading algorithm ? How can I achieve this using Python?

3 different implementations will be explained:

  • basic
  • Z-score
  • statistical arbitrage

What is the Mean Reversion Trading Algorithm?

Mean reversion is an algorithm that indicates that prices tend to revert to their long-term average. When a stock price deviates from its historical average, it means that the asset is overbought or oversold. Then, a trade signal might be triggered to sell short or buy the instrument with the expectation that its price will return to the mean value.

In the following, you will see different implementations of the mean reversion algorithm.

Load the dataset:

In the first and second implementations, we will use historical Netflix prices:

def  download_stock_data ( ticker,timestamp_start,timestamp_end ): 
    url= f"https://query1.finance.yahoo.com/v7/finance/download/ {ticker} ?period1= {timestamp_start} &period2= {timestamp_end} &interval\ 
=1d&events =history&includeAdjustedClose=true"
     df = pd.read_csv(url) 
    return df 

datetime_start=dt.datetime( 2022 , 1 , 1 , 7 , 35 , 51 ) 
datetime_end=dt.datetime.today() 

# 转换为时间戳:
 timestamp_start= int(datetime_start.timestamp()) 
timestamp_end= int (datetime_end.timestamp()) 

ticker= 'NFLX'

 df = download_stock_data(ticker,timestamp_start,timestamp_end) 
df = df.set_index( '日期' ) 
df.head()

Implementation N°1: Basic

Proceed as follows:

  • Netflix 20-day moving average price calculation
  • Calculate the difference between the price and this moving average
  • If the difference is positive, a sell order is triggered. When the difference is negative, a buy order is triggered.

On the one hand, if the difference is positive, it means that the price is above the 20-day moving average. This means that the asset is overbought and it will revert (decrease) to that average. Therefore, a sell order is triggered.

On the other hand, if the difference is negative, meaning the asset is oversold, it tends to increase and reach its mean value, triggering a buy order.

Python code

I plotted the price in relation to its 20-day moving average in this graph:

window = 20

 df[ "ma_20" ] = df[ "Adj Close" ].rolling(window=window).mean() 
df[ "diff" ] = df[ "Adj Close" ] - df[ "ma_20" ] 
df [ 'signal' ] = np.where(df[ "diff" ] > 0 , - 1 , 1 ) 

Figs=( 8 , 4 ) 

df[[ 'Adj Close' , "ma_20" ]].plot(figsize=figs ) 
plt.title( "均值回归" ) 
plt.show() 

df[ 'diff' ].情节(无花果大小=无花果)
#我将信号乘以20能够在图表中清楚地显示出来
( 20 *df[ 'signal' ]).plot(figsize=figs, linestyle= '--' ) 
plt.title( "Diff vs Signal" ) 
plt.legend() 
plt.show() 

(df[ "Adj Close" ]/df[ "ma_20" ] ).plot(figsize=figs) 
plt.title( "Ratio=Close/ma_20" ) 
plt.show()

I plot divergences (price - 20-day moving average) and signals in this graph. It shows when buy and sell orders are triggered:

In this graph, I plot the ratio between the price and its moving average. The goal is to understand how this ratio oscillates. If it is around 1, it means that the price is recovering to the moving average. We can clearly see that there is a big jump in April 2022.

limitation:

As you can see, during April 2022, there was a sharp drop in stock prices that lasted for several months. If we follow the basic implementation, a buy order is triggered. Buying at this point will result in huge losses in the days and months that follow. That's why it is necessary to combine this implementation with other indicators, or choose a different calculation method.

Backtesting strategy:

As noted before, the sharp price drop in April 2022 severely impacted the strategy's performance:

# 回测策略
# 计算每日收益
df[ 'returns' ] = df[ 'Adj Close' ].pct_change() 

# 计算策略收益
df[ 'strategy_returns' ] = df[ 'signal' ] .shift( 1 ) * df[ 'returns' ] 

# 计算累积收益
df=df.dropna() 
df[ 'cumulative_returns' ] = ( 1 + df[ 'strategy_returns' ]).cumprod() 

Figs = ( 8 , 4 ) 
# 绘制累积回报
df[ 'cumulative_returns' ].情节(无花果大小=无花果)
plt.title( "累计回报" ) 
plt.show()

Implementation N°2: z-scores

This implementation can be used in quantitative trading algorithms:

  • Calculate the 20-day moving average price
  • Compute the 20-day standard deviation
  • How to calculate the z-score:

A sell order is triggered if the price crosses the upper limit (20-day moving average + n_std standard deviation). This means that the instrument is overbought.

If the price falls below the lower limit (20-day moving average - n_std standard deviations), a buy order is triggered.

Python code

window= 20 

# 计算50日均线
df[ 'ma_20' ] = df[ 'Adj Close' ].rolling(window=window).mean() 

# 计算10日均线的标准差
df[ 'std_20' ] = df[ '调整关闭' ].rolling(window=window).std() 

# 计算 z 分数(偏离平均值的标准差数)
 df[ 'zscore' ] = (df[ 'Adj Close' ] - df[ 'ma_20' ]) / df[ 'std_20' ] 

#如果 z 分数小于 n_std (=1),则买入订单
# 如果 z 分数大于 n_std (=1),则卖出订单
# 如果在 -1 到 1 之间,则持有
n_std= 1.25
 df[ '信号' ] = np.where(df[ 'zscore' ] < -n_std, 1 , np.where(df[ 'zscore' ] > n_std, - 1 , 0 )) 

Figs=( 8 , 4 ) 
df[ 'signal' ].plot(figsize=figs, linestyle= "--" )    
df[ 'zscore' ].plot(figsize=figs)           
plt.title( "带有 z 分数的均值回归" ) 
plt.图例() 
plt.show()

In this graph we have z-scores, and trading signals for buy or sell orders:

upper_band=df[ 'ma_20' ]+n_std*df[ 'std_20' ] 
lower_band=df[ 'ma_20' ]-n_std*df[ 'std_20' ] 

Figs=( 10 , 6 ) 
df[ 'Adj Close' ].plot (figsize=figs) 
df[ 'ma_20' ].plot(figsize=figs,linestyle= '-.' , color= "w" ) 
upper_band.plot(linestyle= '--' ,label= 'upper_band' ) 
lower_band.情节(线型= ':',标签= 'lower_band')
plt.fill_ Between(df.index,lower_band,upper_band,阿尔法 = 0.3 ) 
plt. 标题(“上限和下限” ) 
plt.legend() 
plt.show()

With this graph, we can clearly see when the price is out of range. By breaching the ceiling, the stock becomes overbought, which is a signal to enter a short position.

When the price falls and breaks the lower band, the stock is oversold, which can be considered as a buy signal order.

Backtesting strategy

# 计算每日收益
df[ 'returns' ] = df[ 'Adj Close' ].pct_change() 

# 计算策略收益
df[ 'strategy_returns' ] = df[ 'signal' ] .shift( 1 ) * df[ ' returns' ] 

# 计算累计收益
df=df.dropna() 
df[ 'cumulative_returns' ] = ( 1 + df[ 'strategy_returns' ]).cumprod() 

# 绘制累计收益
df[ 'cumulative_returns' ].plot( Figsize=figs) 
plt.title ( "累计回报" ) 
plt.show()

When n_std=1.25, this strategy shows good performance:

Play around with this number to see how it affects overall performance

Compare

By adding a limit on how many standard deviations a stock must deviate from its moving average before a buy or sell order is triggered, the performance of the strategy becomes even more attractive compared to the first implementation of the first paragraph.

other

The implementation can also be used for high-frequency trading by adjusting calculations to accommodate intraday prices.

  • Intraday prices can be sampled down to seconds, or even milliseconds.
  • rolling mean and standard deviation in seconds
  • If the upper or lower limit is breached, a buy or sell order is triggered.

Implementation N°3: Statistical Arbitrage

In this implementation, we will study the mean reversion of the price difference between two stocks:

  • Calculate the price difference between two stocks
  • Calculate the 20-day moving average of the spread
  • Calculate the 20-day moving standard deviation of the spread
  • How to calculate the z-score:

Python code

Load the dataset for 2 stocks: Apple and Google:

import pandas as pd
import datetime as dt

def download_stock_data(ticker,timestamp_start,timestamp_end):
    url=f"https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={timestamp_start}&period2={timestamp_end}&interval\
=1d&events=history&includeAdjustedClose=true"
    df = pd.read_csv(url)
    return df

# Determine Start and End dates
datetime_start=dt.datetime(2022, 2, 8, 7, 35, 51)
datetime_end=dt.datetime.today()

# Convert to timestamp:
timestamp_start=int(datetime_start.timestamp()) 
timestamp_end=int(datetime_end.timestamp()) 

tickers=['AAPL','GOOG']

df_global=pd.DataFrame()
for ticker in tickers:
    df_temp = download_stock_data(ticker,timestamp_start,timestamp_end)[['Date','Adj Close']]
    df_temp = df_temp.set_index('Date')
    df_temp.columns=[ticker]
    df_global=pd.concat((df_global, df_temp),axis=1)
df_global.head()

Index Calculation

# Calculate the spread between two stocks:
ticker_long = 'AAPL'
ticker_short = 'GOOG'
spread = df_global[ticker_long] - df_global[ticker_short]

window = 20
n_std = 1.5

# Calculate the rolling mean and standard deviation of the spread
rolling_mean = spread.rolling(window=30).mean()
rolling_std = spread.rolling(window=30).std()

# Calculate the z-score (number of standard deviations away from the rolling mean)
zscore = (spread - rolling_mean) / rolling_std

upper_band = rolling_mean + n_std * rolling_std
lower_band = rolling_mean - n_std * rolling_std

Now we plot different indicators to see how the spread behaves against the lower and upper bounds:

figs=(8,4)
plt.figure(figsize = figs)
spread.plot(label='Spread = '+ticker_long+' - '+ ticker_short,linestyle='--')
df_global[ticker_long].plot(label=ticker_long+'_price')
df_global[ticker_short].plot(label=ticker_short+'_price')
plt.title("Spread and Prices of {0} and {1}".format(ticker_long,ticker_short))
plt.legend()
plt.show()

plt.figure(figsize = figs)
upper_band.plot(label='Upper_band')
lower_band .plot(label='Lower_band')
spread.plot(label = 'Spread = '+ticker_long+' - '+ ticker_short,linestyle='--', color='r')
rolling_mean.plot(label = 'ma_30days_spread', linestyle = '-.')
plt.fill_between(df_global.index,lower_band, upper_band, alpha=0.2)
plt.legend()
plt.show()

The spread has breached or fallen below the upper and lower limits. Thus giving a trade signal to buy or short the spread:

Backtesting strategy

# Enter a long position if the z-score is less than -n_std
# Enter a short position if the z-score is greater than n_std
signal = np.where(zscore < -n_std, 1, np.where(zscore > n_std, -1, 0))
signal = pd.Series(signal, index=df_global.index)

# Calculate the daily returns
returns = df_global[ticker_long].pct_change() - df_global[ticker_short].pct_change()

# Calculate the strategy returns : # Shift the signal by one day to compute the returns
strategy_returns = signal.shift(1) * returns

# Calculate the cumulative returns
cumulative_returns = (1 + strategy_returns).cumprod()

# # Plot the cumulative returns
cumulative_returns.plot(figsize = figs)
plt.title("Cumulative Return with n_std={0}".format(n_std))
plt.show()

The cumulative returns produced by the strategy show positive values ​​throughout the period.

By modifying the number of standard deviations (n_std) in the model, you will see the effect on the performance of the strategy. When n_std=1.25, the performance is worse.

Guess you like

Origin blog.csdn.net/qq_41929396/article/details/132578436