Python implements advanced stock quantitative trading learning (1) basic library (knowledge preparation) construction

1. Write in front

Starting from this article, I am going to write a serial blog on quantitative trading of stocks, focusing on recording my learning path for the realization of the quantitative trading platform, and make a memo for some key third-party libraries and key knowledge.
The quantitative trading framework I want to implement is just a function for financial data acquisition, cleaning, integration and the realization of some quantitative strategies, backtesting and other functions. It will not realize the real programmatic trading functions of stocks. All the quantitative trading frameworks must be mastered first Some knowledge of third-party libraries, the related libraries involved are given here, which need to be installed in advance and the use of the library knowledge is reserved in advance.

1.1, installation of Numpy library

NumPy (Numerical Python) is an extension library of the Python language. It supports a large number of dimensional arrays and matrix operations, and also provides a large number of mathematical function libraries for array operations.
You can install NumPy, Pandas and other basic libraries by installing Anaconda3, or you can use the following command to install Numpy:

pip3 install numpy scipy matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple

By default, foreign lines are used. Foreign countries are too slow, so we use Tsinghua's mirror image.
For the Numpy tutorial, you can click the "Numpy Tutorial" link to learn.

1.2, the installation of Pandas library

Pandas is Python's core data analysis support library. It provides a fast, flexible, and clear data structure, and aims to handle relational and labeled data simply and intuitively. Pandas' goal is to become an indispensable advanced tool for Python data analysis practice and actual combat. Its long-term goal is to become the most powerful and flexible open source data analysis tool that can support any language.
Similarly, Pandas can be installed uniformly by installing Anaconda3. If you need to install it separately, you can use the following command:

  • Install using conda
conda install Pandas

If you want to install the specified version of pandas, use the following command

conda install Pandas=0.20.3
  • Install using pip
pip3 install Pandas

1.3. Financial data acquisition

I mainly obtain stock market financial data through two platforms, one is tushare and the other is Jukuan. Both platforms can be used for free after registration. Guotai Junan's quantitative trading library is also the jqdata library of Jukuan platform. If you are interested, you can search for it. Now, if you have a trading account, you can directly implement the programmatic trading part yourself.

pip3 install tushare

Note that the platform has two interface documents, the old and the new interface. The new interface has migrated some functions from the old interface. Please pay attention to the documentation when using it.
New interface document link
Import package method

import tushare as ts
 pip3 install jqdatasdk 

Or the following, which can be faster:

pip3 install jqdatasdk -i https://mirrors.aliyun.com/pypi/simple/

upgrade:

pip3 install -U jqdatasdk

Import jqdatasdk package method
import jqdatasdk as jq

1.4, talib financial library installation and document link

TA-Lib, the full name "Technical Analysis Library", is the technical analysis library, which is an advanced library for Python financial quantification, covering more than 150 stocks and futures trading software commonly used technical analysis indicators, such as MACD, RSI, KDJ, momentum indicators , Bollinger Bands and so on. TA-Lib can be divided into 10 sub-sections: Overlap Studies, Momentum Indicators, Volume Indicators, Cycle Indicators, Price Transform, Volatility Indicators Rate index), Pattern Recognition (pattern recognition), Statistic Functions (statistic function), Math Transform (mathematical transformation) and Math Operators (mathematical operation), see the figure below.
Insert picture description here
Installation and use

Installation: Using the "pip install talib" command on cmd will generally report an error. The correct installation method is to enter https://www.lfd.uci.edu/~gohlke/pythonlibs/, drop down and select TA_Lib-0.4.19-cp38- cp38-win_amd64.whl (win system 64-bit, python3.8 version, select the corresponding installation package according to your system and python version), put the download package in a certain path, and then in Anaconda Prompt (or windows cmd) input the command:

pip install [文件全路径名]

Installation effect (TA_Lib-0.4.19-cp38-cp38-win_amd64.whl file is placed in C:\Users\ml\Desktop\Python path)
Insert picture description hereLog in to Python test after installation:
Insert picture description here
Since the talib library does not have a Chinese document, here are two Reference links, please search for in-depth study. Link one , link two .
The second connection is a relatively complete translation document.

1.5, Matplotlib library installation and document link

Matplotlib is a plotting library for Python. It can be used with NumPy and provides an effective open source alternative to MatLab.

It can be installed via anaconda3 or in the following way:

pip3 install matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple

Use the mirror address to install the speed block.
Links to learning documents.

2. Analysis of stock technical indicators

2.1, moving average analysis

The moving average refers to the connection of the arithmetic average of the stock price within a certain trading period, which reflects the average cost price of stockholders over a period of time. For example, the 5-day moving average is the last 5 trading daysClosing priceAdd up and divide by 5 to get the arithmetic average of 5 days.

In talib, the series of moving average indicators include: SMA simple moving average, EMA exponential moving average, WMA weighted moving average, DEMA double moving average, TEMA triple exponential moving average, TRIMA triangular moving average, KAMA test Fuman adaptive moving average, MAMA is MESA adaptive moving average, T3 triple exponential moving average.
General function name: MA
call code: ta.MA(close,timeperiod=30,matype=0)
Parameter description:

  • closeNumpy.ndarray type or pandas.Series type for closing price
  • timeperiodThe average calculation interval, the default is 30 days, enter 5, which is the average of 5 days, and so on.
  • matypeAverage indicator type, the default value is SMA, if you enter a number, then: 0=SMA, 1=EMA, 2=WMA, 3=DEMA, 4=TEMA, 5=TRIMA, 6=KAMA, 7=MAMA, 8=T3 ; You can also use the MA_Type in talib, and input directly according to the MA_Type.SMA method after importing.
    Different types of moving averages also have their own corresponding calling functions:
    Insert picture description hereCode example 1:
import numpy as np
import talib as ta
from talib import MA_Type
import pandas as pd
close = np.array([1, 2, 3, 4, 5, 6], dtype='f8')
se = pd.Series(close, dtype='f8')
# 方法一使用通用函数MA计算5日算术均线,传入matype参数调用简单移动平均线
# 如果调用指数移动平均线则参数matype=MA_Type.EMA
# 第一个参数可以传入close,也可以传入se
output = ta.MA(se, timeperiod=5, matype=MA_Type.SMA)
print(output)
# 方法二 直接调用SMA函数,传入计算时间间隔计算5日均线
output = ta.SMA(se, timeperiod=5)
print(output)

Output result:
Insert picture description here Moving average is one of the most commonly used indicators in technical analysis theory. It is mainly used to confirm, track and judge trends, prompt buy and sell signals, and better grasp market opportunities in unilateral market conditions. And avoid risks. However, moving averages are generally used in combination with other technical indicators or fundamentals, especially when the market is in a consolidation market, its buy and sell signals will appear frequently and are easily distorted.
Code example two:
Calculate the 5-day arithmetic moving average (simple and easy to understand average) of the stock price between 2021-01-01 and 2021-01-20 on the cross-border pass. Use the Jukuan
platform to obtain stock data.

import numpy as np
import talib as ta
from talib import MA_Type
import pandas as pd
import jqdatasdk as jq
# 聚宽平台权限验证
jq.auth('*******', '********')
# 聚宽平台获取股价
close = jq.get_price('002640.XSHE',
                     start_date='2021-01-01',
                     end_date='2021-01-20',
                     frequency='1d',
                     fq='pre')
# 方法一使用通用函数MA计算5日算术均线,传入matype参数0等同于MA_Type.SMA
close['5MA'] = ta.MA(close['close'], timeperiod=5, matype=0)
close['10MA'] = ta.MA(close['close'], timeperiod=10, matype=MA_Type.SMA)
# 由于日期间隔短,无法计算20日、30日、60日的均线
'''close['20MA'] = ta.MA(close['close'], timeperiod=20, matype=MA_Type.SMA)
close['30MA'] = ta.MA(close['close'], timeperiod=30, matype=MA_Type.SMA)
close['60MA'] = ta.MA(close['close'], timeperiod=60, matype=MA_Type.SMA)'''
print(close)

Output result:
Insert picture description hereCode example three:
Improve the previous example to achieve graphical output

import numpy as np
import talib as ta
from talib import MA_Type
import pandas as pd
import jqdatasdk as jq
from matplotlib import pyplot as plt
from pylab import mpl
# 聚宽平台权限验证
jq.auth('******', '*******')
# 聚宽平台获取股价
close = jq.get_price('002640.XSHE',
                     start_date='2020-11-01',
                     end_date='2021-01-20',
                     frequency='1d',
                     fq='pre')

mpl.rcParams['font.sans-serif'] = [
    'SimHei'
]  # 使图形中的中文正常编码显示,其中,sans-serif 表示字体中的无衬线体,SimHe是黑体

mpl.rcParams['axes.unicode_minus'] = False  # 使坐标轴刻度表签不显示正负号

types = ['SMA', 'EMA', 'WMA', 'DEMA', 'TEMA', 'TRIMA', 'KAMA', 'MAMA', 'T3']
for i in range(len(types)):
    close[types[i]] = ta.MA(close['close'], timeperiod=5, matype=i)
#close.tail()
# 使用loc[]获取局部行和列的切片
'''
iloc:即index locate 用index索引进行定位,所以参数是整型,如:df.iloc[10:20,3:5]
loc:则可以使用column名和index名进行定位,如:df.loc['image1':'image10','age':'score']
df['col1']取得第一列或df[['col1','col2','col3']]获取三列
'''

# 使用DataFrame的plot方法绘制图像会按照数据的每一列绘制一条曲线
# figsize=(16, 6)图片尺寸大小16*6
close.loc['2020-12-16':,'SMA':].plot(figsize=(16, 6))
# 获取当前坐标轴
ax = plt.gca()
# 设置右侧框线无颜色(隐藏)
ax.spines['right'].set_color('none')
# 设置顶部框线无颜色(隐藏)
ax.spines['top'].set_color('none')
# 设置图形标题
plt.title('上证指数各种类型移动平均线',fontsize=15)
# 设置X轴标签文字为空
plt.xlabel('')
# plt.ylabel('y轴') # y轴标签文字设置
plt.show()

Output result:

Insert picture description hereFrom the output result, the coordinate axis is not drawn from the origin, modify the code

2.1.1, get the current axis

ax = plt.gca()
# spines是指坐标图四周的框
# 获取你想要挪动的坐标轴,这里只有顶部、底部、左、右四个方向参数
ax.xaxis.set_ticks_position('bottom')  #  要挪动底部的X轴,所以先目光锁定底部!
# 在这里,position位置参数有三种,这里用到了“按Y轴刻度位置挪动”
# 'data'表示按数值挪动,其后数字代表挪动到Y轴的刻度值
ax.spines['bottom'].set_position(('data', 0))

By default, there is a border around the coordinate system. This border is called spines. The black line selected by the red box in the figure below is called the border.
Insert picture description hereThese four borders can be hidden by color. Set the color to colorless to hide

ax.spines['right'].set_color('none')

The code does not use matplotlib's pyplot for drawing, but uses the pandas.DataFrame.plot() method to draw
plt drawing method

plt.figure(figsize = (5,5))
plt.plot()  # 画个只有坐标系的图(因为没有传参数,所以显示空白)

Introduction to pandas.DataFrame.plot( )
Using the plot method of DataFrame to draw an image will draw a curve according to each column of the data. By default, the legend is displayed in the appropriate position according to the name of the column, which saves time than matplotlib drawing, and the data in DataFrame format is more Standardization, convenient for vectorization and calculation.
DataFrame.plot() function:

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, 
                sharex=None, sharey=False, layout=None, figsize=None, 
                use_index=True, title=None, grid=None, legend=True, 
                style=None, logx=False, logy=False, loglog=False, 
                xticks=None, yticks=None, xlim=None, ylim=None, rot=None, 
                fontsize=None, colormap=None, position=0.5, table=False, yerr=None, 
                xerr=None, stacked=True/False, sort_columns=False, 
                secondary_y=False, mark_right=True, **kwds)

Note: Each plot type has a corresponding method
df.plot(kind='line') is equivalent to df.plot.line()
Parameter introduction

  • x: label or position, default None# refers to the label or position parameter of the data column

  • y : label, position or list of label, positions, default None

  • kind: str#Drawing
    type'line': line plot (default)#Line
    graph'bar': vertical bar plot#Bar graph. When stacked is True, it is a stacked histogram'barh
    ': horizontal bar plot#horizontal bar graph'hist':
    histogram# histogram (numerical frequency distribution)
    'box': boxplot#box plot'kde
    ': Kernel Density Estimation plot#Density graph, mainly add Kernel probability density
    line'density' to the histogram : same
    as'kde''area': area plot# and the area graph (area graph) enclosed by the x axis. When Stacked=True, each column must be all positive or negative values. When stacked=False, there is no requirement for
    data'pie': pie plot#pie chart. The value must be a positive value, and you need to specify the Y axis or subplots=True'scatter
    ': scatter plot#scatter plot. Need to specify X-axis Y-
    axis'hexbin': hexbin plot# Honeycomb map. Need to specify X axis Y axis
    ax: matplotlib axes object, default None# Subgraph (axes, can also be understood as axis) The matplotlib subplot object to be drawn on. If not set, the current matplotlib subplot is usedAmong them, variables and functions describe figure and axes together by changing the elements in figure and axes (for example: title, label, point and line, etc.), that is, drawing on the canvas.

  • subplots: boolean, default False#Whether to make subplots for the columns separately

  • sharex: boolean, default True if ax is None else False#If ax is None, the default is True, otherwise it is False

  • sharey: boolean, default False#If there is a subgraph, the subgraph shares the y-axis scale, label

  • figsize: a tuple (width, height) in inches#Picture size

  • use_index: boolean, default True#Use index as x axis by default

  • title: string#The title string of the picture

The above is an introduction to some of the parameters of the excerpt. For the specific content of the function, you can click the link to refer to the summary of others.

2.2, MACD Moving Average Convergence and Divergence

Function name: MACD
name: Smoothing Convergence and Divergence Moving Average
Introduction: Use the aggregation and separation between the short-term (usually 12-day) exponential moving average of the closing price and the long-term (usually 26-day) exponential moving average to buy Technical indicators for making judgments on the timing of buying and selling.
Function call method:

macd, macdsignal, macdhist = MACD(close, fastperiod=12, slowperiod=26, signalperiod=9)

MACD is the DIFF line, and its response will be faster.

MACDsinal is the DEA line, and its response will be slower.

And MACDhist is a column on the x-axis.
Example one

import numpy as np
import talib as ta
from talib import MA_Type
import pandas as pd
import jqdatasdk as jq
from matplotlib import pyplot as plt
from pylab import mpl
# 聚宽平台权限验证
jq.auth('19935162681', 'ByKy19935162681')
# 聚宽平台获取股价
close = jq.get_price('601899.XSHG',
                     start_date='2020-10-14',
                     end_date='2021-01-20',
                     frequency='1d',
                     fq='pre')
# macd为DIFF线,macdsignal为DEA线,macdhist为MACD柱状线
macd, macdsignal, macdhist = ta.MACD(close["close"],
                                     fastperiod=12,
                                     slowperiod=26,
                                     signalperiod=9)
close['DIFF'] = macd
close['DEA'] = macdsignal
close['hists'] = macdhist
# 画线,由于三条线是两种类型,需要单画
df=close.loc['2020-11-30':'2021-01-21', 'DIFF':]
with pd.plotting.plot_params.use('x_compat', True): #方法一
  df.DIFF.plot(figsize=(16, 6))
  df.DEA.plot()
  #df.hists.plot.bar(width=0.1)
  df.hists.plot()
print(df)
mpl.rcParams['font.sans-serif'] = [
    'SimHei'
]  # 使图形中的中文正常编码显示,其中,sans-serif 表示字体中的无衬线体,SimHe 是 黑体

ax = plt.gca()

ax.spines['right'].set_color('none')

ax.spines['top'].set_color('none')

plt.title('MACD平滑异同移动平均线', fontsize=15)

plt.xlabel('')

plt.show()

Output result:
Insert picture description here

Guess you like

Origin blog.csdn.net/u011930054/article/details/112944314