Multi-factor strategy process
Strategy process
If we want to build a multi-factor strategy, then factor mining and selection are crucial. Let's look at the flow chart of the strategy again:
Multi-factor strategy process
Factor mining
Factor data processing:
- De-extreme
- standardization
- Neutralization
Single factor validity test:
- Factor IC analysis
- Factor yield analysis
- Factor direction
Multi-factor correlation and combination analysis:
- Factor correlation
- Factor synthesis
Backtest
- Multi-factor stock selection weight
- Adjustment cycle
Platform Introduction
We will use the investment research platform provided by RiceQuant for the next explanation.
The investment research platform is the Ricequant research platform based on IPython NoteBook, which provides a wealth of tools for strategic research.
Website:
https://www.ricequant.com/quant/notebook
Get function
Get contract historical data
get_price-Get contract historical data
get_price(order_book_ids, start_date, end_date=None, frequency='1d', fields=None, adjust_type='pre', skip_suspended=False,expect_df=False
Get historical data of a specified contract or contract list (including start and end dates, daily or minute lines). Currently, only the Chinese market is supported. It is recommended to use history_bars when writing strategies.
parameter:
parameter | Types of | Description |
---|---|---|
order_book_id | str OR str list | Contract code, you can pass in order_book_id, order_book_id list |
start_date | str, datetime.date, datetime.datetime, pandasTimestamp | Start date, the default is '2013-01-04'. When using the transaction, the user must specify |
end_date | str, datetime.date, datetime.datetime, pandasTimestamp | The end date, the default is '2014-01-04'. When trading, the default is the day before the current date of the strategy |
frequency | str | Frequency of historical data. Now supports historical data at the day/minute level, and the default is '1d'. The user can freely select different frequencies, for example, '5m' represents the 5-minute line |
fields | str OR str list | Return field name |
adjust_type | str | Pre-recovery processing. Pre-restoration-pre, post-restoration-post, no restoration-none, backtest use-internal It is important to note that the internal data is consistent with the data used in the backtest, and only the split event is processed for the price and transaction volume. Does not consider the impact of dividends on stock prices. So before and after the dividend, the price will jump. |
skip_suspended | bool | Whether to skip the suspension data. The default is False, do not skip, use the data before suspension to fill in. True means skip the trading suspension period. Note that when set to True, the function order_book_id only supports a single contract to pass in |
country | str | The default is the Chinese market ('cn'), currently only supports the Chinese market 10 |
return:
- Pass in an order_book_id and multiple fields, the function will return a pandas DataFrame
- Pass in an order_book_id, a field, the function will return pandas Series
- Pass in multiple order_book_id, a field, the function will return pandas DataFrame
- Pass in multiple order_book_id, the function will return pandas Panel
Case:
Get historical daily quotations of a single stock (return to pandas DataFrame):
[In]get_price('000001.XSHE', start_date='2015-04-01', end_date='2015-04-12')
[Out]
open close high low total_turnover volume limit_up limit_down
2015-04-01 10.7300 10.8249 10.9470 10.5469 2.608977e+09 236637563.0 11.7542 9.6177
2015-04-02 10.9131 10.7164 10.9470 10.5943 2.222671e+09 202440588.0 11.9102 9.7397
2015-04-03 10.6486 10.7503 10.8114 10.5876 2.262844e+09 206631550.0 11.7881 9.6448
2015-04-07 10.9538 11.4015 11.5032 10.9538 4.898119e+09 426308008.0 11.8288 9.6787
2015-04-08 11.4829 12.1543 12.2628 11.2929 5.784459e+09 485517069.0 12.5409 10.2620
2015-04-09 12.1747 12.2086 12.9208 12.0255 5.794632e+09 456921108.0 13.3684 10.9403
2015-04-10 12.2086 13.4294 13.4294 12.1069 6.339649e+09 480990210.0 13.4294 10.9877
Get the historical daily closing price of the stock list (returns pandas DataFrame):
[In]get_price(['000024.XSHE', '000001.XSHE', '000002.XSHE'], start_date='2015-04-01', end_date='2015-04-12', fields='close')
[Out]
000024.XSHE 000001.XSHE 000002.XSHE
2015-04-01 32.1251 10.8249 12.7398
2015-04-02 31.6400 10.7164 12.6191
2015-04-03 31.6400 10.7503 12.4891
2015-04-07 31.6400 11.4015 12.7398
2015-04-08 31.6400 12.1543 12.8327``
2015-04-09 31.6400 12.2086 13.5941
2015-04-10 31.6400 13.4294 13.2969
Get historical daily quotations of stocks list (return to pandas DataPanel):
[In]get_price(['000024.XSHE', '000001.XSHE', '000002.XSHE'], start_date='2015-04-01', end_date='2015-04-12')
[Out]
<class 'rqcommons.pandas_patch.HybridDataPanel'>
Dimensions: 8 (items) x 7 (major_axis) x 3 (minor_axis)
Items axis: open to limit_down
Major_axis axis: 2015-04-01 00:00:00 to 2015-04-10 00:00:00
Minor_axis axis: 000024.XSHE to 000002.XSHE
Get a list of trading days
get_trading_dates-Get a list of trading days
get_trading_dates(start_date, end_date, market='cn')
parameter:
parameter | Types of | Description |
---|---|---|
start_date | str, datetime.date, datetime.datetime, pandasTimestamp | start date |
end_date | str, datetime.date, datetime.datetime, pandasTimestamp | End date |
market | str | The default is the Mainland China market ('cn'). Optional'cn'-Mainland China market;'hk'-Hong Kong market |
return:
datetime.date list-transaction date list
example:
[In]
get_trading_dates(start_date='20160505', end_date='20160505')
[Out]
[datetime.date(2016, 5, 5)]
Query financial data
get_fundamentals-query financial data
get_fundamentals(query, entry_date=None, interval='1d', report_quarter=False,expect_df=False)
Get historical financial data table. Currently, it supports more than 400 indicators in the Chinese market. For details, please refer to the financial data file. Currently only supports the Chinese market. It should be noted that querying the financial data of too many stocks at one time will cause the system to run slowly.
Note: get_fundamentals will be obsolete, please use get_factor to get financial data.
parameter:
parameter | Types of | Description |
---|---|---|
query | SQLAlchemyQueryObject | SQLAlchmey's Query object. Among them, you can fill in the index to be queried in the'query', and fill in the data filtering conditions in the'filter'. For details, please refer to sqlalchemy's query documentation to learn to use more convenient query statements. From a data scientist's point of view, the use of sqlalchemy is simpler and more powerful than sql |
entry_date | str, datetime.date, datetime.datetime, pandasTimestamp | The base date for querying financial data should be earlier than the current date of the strategy. The default is the day before the current date of the policy. |
interval | str | The interval for querying financial data, the default is '1d'. For example, fill in '5y', it means to go back 5 years from entry_date (including entry_date), and the time of returning data is at intervals of years. 'd'-day,'m'-month,'q'-quarter,'y'-year |
report_quarter | bool | Whether to display the reporting period, the default is False, not displayed. 'Q1'-quarterly report,'Q2'-semi-annual report,'Q3'-quarterly report,'Q4'-annual report |
expect_df | bool | The original Panel data structure is returned by default. If it is set to true, it returns a pandas dataframe |
return:
- pandas DataPanel
- If the query result is empty, return an empty pandas DataFrame
- If the given interval is 1d, 1m, 1q, 1y, return pandas DataFrame
example:
Get the pe_ratio and revenue indicators in the financial data:
# 并且通过filter过滤掉得到符合一定范围的pe_ratio的结果
# 最后只拿到按照降序排序之后的前10个
fundamental_df = get_fundamentals(
query(
fundamentals.income_statement.revenue, fundamentals.eod_derivative_indicator.pe_ratio
).filter(
fundamentals.eod_derivative_indicator.pe_ratio > 25
).filter(
fundamentals.eod_derivative_indicator.pe_ratio < 30
).order_by(
fundamentals.income_statement.revenue.desc()
).limit(
10
)
)
context.stocks = fundamental_df.columns.values
update_universe(context.stocks)
Obtain historical financial data of certain specified stocks:
def init(context):
context.stocks = industry('A01')
logger.info("industry stocks: " + str(context.stocks))
#每个表都有一个stockcode在用来方便通过股票代码来过滤掉查询的数据,比如次数是只查询'A01'板块的revenue 和 pe_ratio
#最后加入 entry_date 参数获取2015年12月31日的数据
context.fundamental_df = get_fundamentals(
query(
fundamentals.income_statement.revenue, fundamentals.eod_derivative_indicator.pe_ratio
).filter(
fundamentals.eod_derivative_indicator.pe_ratio > 5
).filter(
fundamentals.eod_derivative_indicator.pe_ratio < 300
).filter(
fundamentals.income_statement.stockcode.in_(context.stocks)
),entry_date='20151231'
)
logger.info(context.fundamental_df)
update_universe(context.fundamental_df.columns.values)