Python tutorial: Taught you how to build your own database of quantitative analysis

Python tutorial: Taught you how to build your own database of quantitative analysis

introduction:

Financial data is an important basis for quantitative analysis of historical transaction data including stock listed company fundamental data, macro and industry data. With the ever-expanding flow of information, learn acquisition, data processing and query information is becoming increasingly important. For people who tinker with quantitative trading, how can you say the database will not play it? The most commonly used open source (free) database has MySQL, Postgresql, Mongodb and SQLite (Python comes), among the top ten (see below) in the 2018- 2019 DB-Engines list, showing its use and popularity higher. These databases have their own characteristics and application environment, learning how to learn on the Internet or which have a lot of relevant information. This article is a brief introduction on how to use Python operate Postgresql database (similar to other databases), use psycopg2 and sqlalchemy achieve dataframe postgresql interact with pandas, step by step to build their own quantitative analysis database.

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

Installation and use of PostgreSQL

Install PostgreSQL. To its official website to choose their own computer configuration download version can be installed, the installation process in addition to set a password (This article is set to "123456"), all the other optional default, truthfully do not refer to the article on the CSDN: PostgreSQL installation detailed steps (windows). After installation can also be seen in the installation directory pgAdmin4, this is the database that comes with graphical tools, the latest version of a Web application, somewhat similar to Python's Jupyter Notebook, it can be used to view and manipulate the postgresql database.

Psycopg2 installation and sqlalchemy library on Python. Python is psycopg2 PostgreSQL database interface, SQLAlchemy broader application, may be connected to a database (MySQL, SQLite, PostgreSQL), especially for data type dataframe pandas, the operation is very convenient. The two had a lot of python libraries introduce online, not to proceed here in detail, using pip install xxx can be installed on cmd.

Examples of applications

First, tushare get more than 3000 stock market data to local, psycopg2 and sqlalchemy use the interface, the data is stored in a local PostgreSQL database, to facilitate further inquiries and operations.

# First introduced later analysis, visualization, etc. may be used in the library 
Import tushare TS AS
Import PANDAS AS pd
Import numpy AS NP
Import matplotlib.pyplot AS plt
# normal display appears when drawing Chinese and negative signs
from pylab Import mpl
mpl.rcParams [ '-font.sans serif'] = [ 'SimHei']
mpl.rcParams [ 'axes.unicode_minus'] = False
# set token
token = 'enter your token'
Pro = ts.pro_api (token)

Data acquisition function, the default time can be changed at any time.

# If the error, upgrade to the latest tushare 
DEF get_Data (code, Start = '20,190,101', End = '20,190,425'):
df = ts.pro_bar (ts_code = code, ADJ = 'qfq', start_date = Start, END_DATE = End )
return df
stock code gets function to get the code of the latest trading day.
# Get the current trading day of the new stock symbol and abbreviation
DEF GET_CODE ():
Codes = pro.stock_basic (list_status = 'L') ts_code.values.
Return Codes

 

Insert PostgreSQL database operations, functions in use try ... except ... pass in order to avoid some data errors cause the program to crash.

Import create_engine SQLAlchemy from 
Import psycopg2
Engine create_engine = ( '+ PostgreSQL psycopg2: // Postgres: 123456 @ localhost: 5432 / Postgres')
DEF insert_sql (Data, db_name, if_exists = 'the append'):
# Use try ... except. .continue avoid errors, run the collapse of
the try:
data.to_sql (db_name, Engine, index = False, if_exists = if_exists)
#Print (code + 'successfully written to the database')
the except:
Pass

Due to the huge amount of market data download more slowly, to download the daily trading period 20190101-20190425

Data, subsequent re constantly updated.


# 20190101-20190425 download data into the database and stock_data
# This step is time consuming, generally about 25-35 minutes
for GET_CODE code in ():
Data = get_Data (code)
insert_sql (Data, 'stock_data')
# read the entire table data
DF = pd.read_sql ( 'stock_data', Engine)
Print (len (DF))
# output: 270998
# ts_code = select the stock data 000001.SZ
df = pd.read_sql ( "select * from stock_data where ts_code = ' 000001.SZ ' ", Engine)
Print (len (DF))

Construction of a data update function can be downloaded, and other data inserted into the time period. January 1, 2018 to April 25, 2019, data had reached 1.08 million.

# Update data, or download other during data 
DEF update_sql (Start, End, db_name):
from datetime Import datetime, timedelta
for code in GET_CODE ():
the Data = get_Data (code, Start, End)
insert_sql (the Data, db_name)
Print (f '{start}: {end} during the data has been successfully updated')
during a download 20180101-20181231 data #
# run only once, no longer runs can be commented
# downloaded data is relatively slow, it takes about 20-35 minutes
start = ' 20,180,101 '
End =' 20,181,231 '
db_name =' stock_data '
# downloaded data into a database and
update_sql (Start, End, db_name)
# pandas using the read data read_sql
df_all_data = pd.read_sql (' stock_data ', Engine)
Print (len (df_all_data))
# output: 1087050
# View transaction code number and date of the transaction
Print (len (df_all_data.ts_code.unique ()))
Print (len (df_all_data.trade_date.unique ()))
Output #: 3604; 319
D = df_all_data.trade_date.unique ()
Print (d.max ())
Print (d.min ())
2019-04-25T00: 00: 00.000000000
2018-01-02T00: 00: 00.000000000
# Gets April 25, 2019 trading day data
pd.read_sql ( "select * from stock_data where trade_date = '2019-04-25'", engine) .head ()
Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 


Construction data query and visualization functions:
DEF plot_data (for condition Condition, title):
from pyecharts Bar Import
from SQLAlchemy Import create_engine
Engine create_engine = ( '+ PostgreSQL psycopg2: // Postgres: 123456 @ localhost: 5432 / Postgres')
Data = pd.read_sql ( "SELECT * WHERE from stock_data +" + for condition Condition, Engine)
the count_ = data.groupby ( 'trade_date') [ 'ts_code']. COUNT ()
attr = count_.index
V1 = count_.values
bar = bar (title, title_text_size = 15)
bar.add ( '', attr, V1, is_splitline_show = False, as linewidth = 2)
return bar
mixup data distribution stocks less than 2 yuan
C1 = "Close <2"
T1 = "shares less than 2 yuan time distribution stocks "
plot_data (c1, T1)

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

Check stock quotes Japanese stocks rose more than 9.5%, data distribution:

c2="pct_chg>9.5"
t2="股价涨幅超过9.5%个股时间分布"
plot_data(c2,t2)

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

 

查询股价日跌幅超过-9.5%个股数据分布:

c3="pct_chg<-9.5"
t3="股价跌幅超过-9.5%个股时间分布"
plot_data(c3,t3)

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

 

结合选股策略对数据库进行查询和提取数据:

#筛选代码
#获取当前交易的股票代码和名称
def get_new_code(date):
#获取当前所有交易股票代码
df0 = pro.stock_basic(exchange='', list_status='L')
df1 =pro.daily_basic(trade_date=date)
df=pd.merge(df0,df1,on='ts_code')
#剔除2017年以后上市的新股次新股
df=df[df['list_date'].apply(int).values<20170101]
#剔除st股
df=df[-df['name'].apply(lambda x:x.startswith('*ST'))]
#剔除动态市盈率为负的
df=df[df.pe_ttm>0]
#剔除大市值股票
df=df[df.circ_mv<10**5]
#剔除价格高于20元股票
#df=df[df.close<20]
codes=df.ts_code.values
return codes
len(get_new_code('20190425'))
#输出结果:46
import talib as ta
#20日均线交易策略
def find_stock(date):
f_code=[]
for code in get_new_code(date):
try:
data=df_all_data.loc[df_all_data.ts_code==code].copy()
data.index=pd.to_datetime(data.trade_date)
data=data.sort_index()
data['ma_20']=ta.MA(data.close,timeperiod=20)
if data.iloc[-1]['close']>data.iloc[-1]['ma_20']:
f_code.append(code)
except:
pass
return f_code
fs=find_stock('20190305')
print(f'筛选出的股票个数:{len(fs)}')
if fs:
df_find_stocks=pd.DataFrame(fs,columns=['ts_code'])
#将选出的股票存入数据库,如果表已存在,替换掉,相当于每次更新
insert_sql(df_find_stocks,'find_stocks',if_exists='replace')
print('筛选的股票已入库')
筛选出的股票个数:9
筛选的股票已入库
#查看数据库中筛选的股票池
codes=pd.read_sql('find_stocks',engine)
codes=codes.values.tolist()
codes=[c[0] for c in codes]
#print(codes)

对筛选的股票作进一步分析:

select_data=pd.DataFrame()
for code in codes:
try:
df_= df_all_data[df_all_data.ts_code.values==code]
df_.index=pd.to_datetime(df_.trade_date)
df_=df_.sort_index()
select_data[code]=df_.close
except:
pass
select_data.fillna(method='ffill',inplace=True)
select_data.tail()
ret=select_data.apply(lambda x:x/x.shift(1)-1)
ret=ret.dropna()
ret.tail()
prod_ret=ret.apply(lambda x:(1+x).cumprod())
prod_ret.plot(figsize=(12,5))
plt.xlabel('',fontsize=15)
plt.title('股票池累计净值',size=15)
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
plt.show()

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

 

#根据代码从数据库中获取数据
def get_data_from_sql(code):
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')
data=pd.read_sql(f"select * from stock_data where ts_code='{code}'",engine)
data.index=pd.to_datetime(data.trade_date)
data=data.sort_index()
#计算20日均线
data['ma20']=data.close.rolling(20).mean()
return data

利用20日均线交易策略,搭建数据查询和可视化函数kline_plot(),完整代码将分享在知识星球上。对选出的股票日K线、20日均线、成交量、买入(buy)和卖出(sell)信号进行可视化,下面以002790.和300573股票的K线图为例。

kline_plot('002790.SZ')

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

kline_plot('300573.SZ')

 

Python tutorial: Dried practical operation - taught you how to build your own database of quantitative analysis

 

 

结语

 

In fact, to learn what database operations are still many, this article aims to initiate a brief introduction to using Python and PostgreSQL database dataframe Data interact, step by step to build their own quantitative analysis database. As the text of the data used is only about one million, is actually used to read and write csv excel quickly and relatively intuitive, but with the growing number of data, to establish their own sound quantitative analysis system, learn the database it is particularly important. Note that the text referred to stock selection and stock code is used only as a sample application and does not constitute investment advice

Guess you like

Origin www.cnblogs.com/cherry-tang/p/10968239.html