How to obtain the required data for quantitative investment research?

Data is the basis for decision-making and analysis, and the speed and completeness of data are very important.

Nuggets Quantitative provides two major types of data acquisition methods: subscription data to obtain real-time data, and interface direct access to historical data.

The following is an introduction to the steps of each of the two methods.

retrieve data

1. Obtain high-frequency market data through subscription

step

1. Set the initialization function: init, use the subscribe function for data subscription;

2. Implement a function: on_bar to perform logical processing based on data push;

3. Execute strategy.

describe

Subscribe to the required data in advance. When using it, use the corresponding event function to receive the data, return it when the data is updated, and be able to return time series sliding window data in the specified format. like:

# 第一步：订阅函数（参数规格）
subscribe(标的列表，数据频率，数据序列长度);

# 第二步：接收函数标识（全局变量，指定数据返回）
On_event (全局变量，指定数据集);
    print (指定数据集)
    print (全局变量)

Example

# coding=utf-8
from __future__ import print_function, absolute_import
from gm.api import *


# 订阅平安银行10个长度1分钟的bar数据，然后求收盘价均值
# 设置初始化函数, 通过订阅将需要的数据申明
def init(context):
    # 进行数据订阅
    subscribe(symbols='SZSE.000001', frequency='60s', count=10)


# 通过on_bar函数接收bar数据事件，并在该函数中求均值
def on_bar(context,bar):
    # 打印当前获取的bar信息
    print(bars)

    # context.data提取缓存的数据滑窗, 可用于计算指标
    # 注意：context.data里的count要小于或等于subscribe里的count
    data = context.data(symbols='SZSE.000001', frequency='60s', count=10, fields='close'))
    print(data)

if __name__ == '__main__':
    '''
    strategy_id策略ID,由系统生成
    filename文件名,请与本文件名保持一致
    mode实时模式:MODE_LIVE回测模式:MODE_BACKTEST
    token绑定计算机的ID,可在系统设置-密钥管理中生成
    backtest_start_time回测开始时间
    backtest_end_time回测结束时间
    backtest_adjust股票复权方式不复权:ADJUST_NONE前复权:ADJUST_PREV后复权:ADJUST_POST
    backtest_initial_cash回测初始资金
    backtest_commission_ratio回测佣金比例
    backtest_slippage_ratio回测滑点比例
    '''
    run(strategy_id='strategy_id',
        filename='main.py',
        mode=MODE_BACKTEST,
        token='token_id',
        backtest_start_time='2020-04-01 09:00:00',
        backtest_end_time='2020-05-31 15:00:00',
        backtest_adjust=ADJUST_NONE,
        backtest_initial_cash=10000000,
        backtest_commission_ratio=0.0001,
        backtest_slippage_ratio=0.0001)

Save results

The subscribed data sliding window is stored in context.data. To extract data, you need to call the context.data() interface, which can be called in the custom function algo() or in the on_xxx() event-driven function. The format is:

data = context.data(target, frequency, sliding window size, field)

2. Get data through interface

step

1. set_token sets the user token. If the token is incorrect, the function call will throw an exception;

2. Call the data query function to directly query the data.

describe

Obtain data through the interface return value, and the data is only returned once, such as:

# 数据返回=请求函数（参数规格）

# 查询历史行情数据：获取指定时间段内的历史数据
history(标的，频率，开始时间，结束时间，是否复权)

# 查询基本面数据类：获取指定时间段内的历史数据
get_fundamentals（表名，字段名，标的，开始日期，结束日期）

# 查询成分股：获取指数成分股
get_constituents（指数代码）

# 查询业务数据：获取交易日期列表
get_trading_dates（交易所，开始时间，结束时间）

Example

# coding=utf-8
from __future__ import print_function, absolute_import
from gm.api import *


# 掘金终端需要打开，接口取数是通过网络请求的方式
# 设置token，可在用户-密钥管理里查看获取已有token ID
set_token('your token_id')


# 查询行情快照
current_data = current(symbols='SZSE.000001')


# 查询历史行情数据，并以结构方式返回
history_data = history(symbol='SHSE.000300', frequency='1d', start_time='2010-07-28', end_time='2017-07-30', df=True)


# 查询财务数据，在股票交易衍生表中查询几个字段的值
get_fundamentals(table='trading_derivative_indicator', symbols='SHSE.600000, SZSE.000001', start_date='2017-01-01', end_date='2017-01-01', fields='TCLOSE,PETTMNPAAEI')

Supplement: Use Jupyter Notebook to extract data for research

illustrate

Jupyter notebook is a tool package integrated with Anaconda. After installing anaconda, open jupyter notebook to obtain data.

Before starting, you need to confirm the following three points:

Nuggets terminal needs to be opened
The Python parser of jupyter notebook has the gm package installed (the previous document explains how to download the SDK)
Token ID has been set

step

1. Initialization settings, necessary code to obtain data

from __future__ import print_function, absolute_import, unicode_literals
from gm.api import *
# 终端开启 设置token
set_token('your token')

2. After the settings are completed, extract the data.