Quantitative trading factor data acquisition

Strategy process

If we want to build a multi-factor strategy, then factor mining and selection are crucial. Let's look at the flow chart of the strategy again:
Insert picture description here

Multi-factor strategy process

Insert picture description here

Factor mining

Factor data processing:

  • De-extreme
  • standardization
  • Neutralization

Single factor validity test:

  • Factor IC analysis
  • Factor yield analysis
  • Factor direction

Multi-factor correlation and combination analysis:

  • Factor correlation
  • Factor synthesis

Backtest

  • Multi-factor stock selection weight
  • Adjustment cycle

Platform Introduction

We will use the investment research platform provided by RiceQuant for the next explanation.

The investment research platform is the Ricequant research platform based on IPython NoteBook, which provides a wealth of tools for strategic research.
Insert picture description here
Website:
https://www.ricequant.com/quant/notebook

Get function

Get contract historical data

get_price-Get contract historical data

get_price(order_book_ids, start_date, end_date=None, frequency='1d', fields=None, adjust_type='pre', skip_suspended=False,expect_df=False

Get historical data of a specified contract or contract list (including start and end dates, daily or minute lines). Currently, only the Chinese market is supported. It is recommended to use history_bars when writing strategies.

parameter:

parameter Types of Description
order_book_id str OR str list Contract code, you can pass in order_book_id, order_book_id list
start_date str, datetime.date, datetime.datetime, pandasTimestamp Start date, the default is '2013-01-04'. When using the transaction, the user must specify
end_date str, datetime.date, datetime.datetime, pandasTimestamp The end date, the default is '2014-01-04'. When trading, the default is the day before the current date of the strategy
frequency str Frequency of historical data. Now supports historical data at the day/minute level, and the default is '1d'. The user can freely select different frequencies, for example, '5m' represents the 5-minute line
fields str OR str list Return field name
adjust_type str Pre-recovery processing. Pre-restoration-pre, post-restoration-post, no restoration-none, backtest use-internal It is important to note that the internal data is consistent with the data used in the backtest, and only the split event is processed for the price and transaction volume. Does not consider the impact of dividends on stock prices. So before and after the dividend, the price will jump.
skip_suspended bool Whether to skip the suspension data. The default is False, do not skip, use the data before suspension to fill in. True means skip the trading suspension period. Note that when set to True, the function order_book_id only supports a single contract to pass in
country str The default is the Chinese market ('cn'), currently only supports the Chinese market 10

return:

  • Pass in an order_book_id and multiple fields, the function will return a pandas DataFrame
  • Pass in an order_book_id, a field, the function will return pandas Series
  • Pass in multiple order_book_id, a field, the function will return pandas DataFrame
  • Pass in multiple order_book_id, the function will return pandas Panel

Case:

Get historical daily quotations of a single stock (return to pandas DataFrame):

[In]get_price('000001.XSHE', start_date='2015-04-01', end_date='2015-04-12')
[Out]
open    close    high    low    total_turnover    volume    limit_up    limit_down
2015-04-01    10.7300    10.8249    10.9470    10.5469    2.608977e+09    236637563.0    11.7542    9.6177
2015-04-02    10.9131    10.7164    10.9470    10.5943    2.222671e+09    202440588.0    11.9102    9.7397
2015-04-03    10.6486    10.7503    10.8114    10.5876    2.262844e+09    206631550.0    11.7881    9.6448
2015-04-07    10.9538    11.4015    11.5032    10.9538    4.898119e+09    426308008.0    11.8288    9.6787
2015-04-08    11.4829    12.1543    12.2628    11.2929    5.784459e+09    485517069.0    12.5409    10.2620
2015-04-09    12.1747    12.2086    12.9208    12.0255    5.794632e+09    456921108.0    13.3684    10.9403
2015-04-10    12.2086    13.4294    13.4294    12.1069    6.339649e+09    480990210.0    13.4294    10.9877

Get the historical daily closing price of the stock list (returns pandas DataFrame):

[In]get_price(['000024.XSHE', '000001.XSHE', '000002.XSHE'], start_date='2015-04-01', end_date='2015-04-12', fields='close')
[Out]
000024.XSHE    000001.XSHE    000002.XSHE
2015-04-01    32.1251    10.8249    12.7398
2015-04-02    31.6400    10.7164    12.6191
2015-04-03    31.6400    10.7503    12.4891
2015-04-07    31.6400    11.4015    12.7398
2015-04-08    31.6400    12.1543    12.8327``
2015-04-09    31.6400    12.2086    13.5941
2015-04-10    31.6400    13.4294    13.2969

Get historical daily quotations of stocks list (return to pandas DataPanel):

[In]get_price(['000024.XSHE', '000001.XSHE', '000002.XSHE'], start_date='2015-04-01', end_date='2015-04-12')
[Out]
<class 'rqcommons.pandas_patch.HybridDataPanel'>
Dimensions: 8 (items) x 7 (major_axis) x 3 (minor_axis)
Items axis: open to limit_down
Major_axis axis: 2015-04-01 00:00:00 to 2015-04-10 00:00:00
Minor_axis axis: 000024.XSHE to 000002.XSHE

Get a list of trading days

get_trading_dates-Get a list of trading days

get_trading_dates(start_date, end_date, market='cn')

parameter:

parameter Types of Description
start_date str, datetime.date, datetime.datetime, pandasTimestamp start date
end_date str, datetime.date, datetime.datetime, pandasTimestamp End date
market str The default is the Mainland China market ('cn'). Optional'cn'-Mainland China market;'hk'-Hong Kong market

return:

datetime.date list-transaction date list

example:

[In]
get_trading_dates(start_date='20160505', end_date='20160505')
[Out]
[datetime.date(2016, 5, 5)]

Query financial data

get_fundamentals-query financial data

get_fundamentals(query, entry_date=None, interval='1d', report_quarter=False,expect_df=False)

Get historical financial data table. Currently, it supports more than 400 indicators in the Chinese market. For details, please refer to the financial data file. Currently only supports the Chinese market. It should be noted that querying the financial data of too many stocks at one time will cause the system to run slowly.

Note: get_fundamentals will be obsolete, please use get_factor to get financial data.

parameter:

parameter Types of Description
query SQLAlchemyQueryObject SQLAlchmey's Query object. Among them, you can fill in the index to be queried in the'query', and fill in the data filtering conditions in the'filter'. For details, please refer to sqlalchemy's query documentation to learn to use more convenient query statements. From a data scientist's point of view, the use of sqlalchemy is simpler and more powerful than sql
entry_date str, datetime.date, datetime.datetime, pandasTimestamp The base date for querying financial data should be earlier than the current date of the strategy. The default is the day before the current date of the policy.
interval str The interval for querying financial data, the default is '1d'. For example, fill in '5y', it means to go back 5 years from entry_date (including entry_date), and the time of returning data is at intervals of years. 'd'-day,'m'-month,'q'-quarter,'y'-year
report_quarter bool Whether to display the reporting period, the default is False, not displayed. 'Q1'-quarterly report,'Q2'-semi-annual report,'Q3'-quarterly report,'Q4'-annual report
expect_df bool The original Panel data structure is returned by default. If it is set to true, it returns a pandas dataframe

return:

  • pandas DataPanel
  • If the query result is empty, return an empty pandas DataFrame
  • If the given interval is 1d, 1m, 1q, 1y, return pandas DataFrame

example:

Get the pe_ratio and revenue indicators in the financial data:

# 并且通过filter过滤掉得到符合一定范围的pe_ratio的结果
# 最后只拿到按照降序排序之后的前10个
fundamental_df = get_fundamentals(
     query(
                fundamentals.income_statement.revenue, fundamentals.eod_derivative_indicator.pe_ratio
            ).filter(
                fundamentals.eod_derivative_indicator.pe_ratio > 25
            ).filter(
                fundamentals.eod_derivative_indicator.pe_ratio < 30
            ).order_by(
                fundamentals.income_statement.revenue.desc()
            ).limit(
                10
            )
    )
context.stocks = fundamental_df.columns.values
update_universe(context.stocks)

Obtain historical financial data of certain specified stocks:

def init(context):
    context.stocks = industry('A01')
    logger.info("industry stocks: " + str(context.stocks))

    #每个表都有一个stockcode在用来方便通过股票代码来过滤掉查询的数据,比如次数是只查询'A01'板块的revenue 和 pe_ratio
    #最后加入 entry_date 参数获取2015年12月31日的数据
    context.fundamental_df = get_fundamentals(
        query(
            fundamentals.income_statement.revenue,      fundamentals.eod_derivative_indicator.pe_ratio
        ).filter(
            fundamentals.eod_derivative_indicator.pe_ratio > 5
        ).filter(
            fundamentals.eod_derivative_indicator.pe_ratio < 300
        ).filter(
            fundamentals.income_statement.stockcode.in_(context.stocks)
            ),entry_date='20151231'
    )
    logger.info(context.fundamental_df)
    update_universe(context.fundamental_df.columns.values)

Guess you like

Origin blog.csdn.net/weixin_46274168/article/details/114958524