tick data research

      Often hear the tick data, backtesting, it also used, but really have not had to deal with their own data tick, tick data is said to have a lot of pits, so I plan to look at themselves. First, the first step is to acquire a normal tick data to generate bar, it is possible to understand some of the details, and that they used to receive ctp tick data to see if there ctp pit.

      Here, the perfect tick data on the wind.

      It is turned out of the wind above, looks quite normal, anyway one second two incorrect data. After all, we know that, we give our Exchange data is not real tick, but a snapshot, it means 500 milliseconds slices. All of the software market, in fact, are based on tick data to achieve.

      tick data, of course there are other things, like ask, bid, however, the most important thing last_price and volume. last price of course be understood that when a slice of the transaction price Well, as for volume, we look at the curve:

       So, volume tick data is accumulated volume, and start the day with a nine night disc begins. Of course, there is no night plate varieties of course, is nine o'clock the next morning.

      So how does data become minute? That is tick becomes a bar.

#encoding=utf-8
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.finance as mpf
from matplotlib.pylab import date2num
tick_df = pd.read_hdf('rb_tick.h5')


class mBar(object):
    def __init__(self):
        """Constructor"""
        self.open = None
        self.close = None
        self.high = None
        self.low = None
        self.datetime = None

bar = None
m_bar_list = list()

for datetime, last in tick_df[['last']].iterrows():
    new_minute_flag = False

    if not bar:  # 第一次进循环
        bar = mBar()
        new_minute_flag = True
    elif bar.datetime.minute != datetime.minute:
        bar.datetime = bar.datetime.replace(second=0, microsecond=0)  # 将秒和微秒设为0
        m_bar_list.append(bar)
        # 开启新的一个分钟bar线
        bar = mBar()
        new_minute_flag = True


    if new_minute_flag:
        bar.open, bar.high, bar.low = last['last'], last['last'], last['last']
    else:
        bar.high, bar.low = max(bar.high, last['last']), min(bar.low, last['last'])

    bar.close = last['last']
    bar.datetime = datetime

pk_df = pd.DataFrame(data=[[bar.datetime for bar in m_bar_list], 
                           [bar.close for bar in m_bar_list], 
                           [bar.open for bar in m_bar_list],
                           [bar.high for bar in m_bar_list],
                           [bar.low for bar in m_bar_list]],
             index=['datetime', 'close', 'open','high', 'low']
                     ).T[['datetime', 'open', 'high', 'low', 'close']]

pk_df['datetime'] = pk_df['datetime'].apply(lambda x: date2num(x)*1440) # 为了显示分钟而不叠起来
fig, ax = plt.subplots(facecolor=(0, 0.3, 0.5),figsize=(12,8))

mpf.candlestick_ohlc(ax,pk_df.iloc[:100].as_matrix(),width=0.7,colorup='r',colordown='green') # 上涨为红色K线,下跌为绿色,K线宽度为0.7
plt.grid(True)

We look at our bar and wind-generated bar given.

From the wind:

      I checked the exactly the same. But the point to note here a problem that minute Bar timestamp. Our above procedure, the time stamp bar minutes bar start time, i.e., 14:31 minutes bar line 14:31:00 start to 14:31:59. However, some software is used as the end time bar timestamp.

      Theoretically, after solve this problem, we can concentrate focus on how to obtain a higher quality of tick data. The actual process, our tick data in real time, so the quality of tick data is often determined by two factors, one is our ability to process data tick of the callback, and if the response is very slow process, it is clear that there will be big problem; another factor affecting real-time tick data is ctp pre-load in real time, if the server pressure, then it's easy to lose data.

      然后笔者花了几天,利用vnpy封装好的ctp接口接收了几天的数据。大致看起来,可能是网络比较好,又地处上海金融中心陆家嘴,所以没有丢包的情况,检查了一下,一直是一秒两个切片。

2019-03-15 14:59:49.500000,3763.0,3709870,3764.0,0.0,0.0,0.0,0.0,40,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,371,0,0,0,0
2019-03-15 14:59:50.500000,3764.0,3710314,3764.0,0.0,0.0,0.0,0.0,58,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,284,0,0,0,0
2019-03-15 14:59:51.500000,3763.0,3711114,3765.0,0.0,0.0,0.0,0.0,738,0,0,0,0,3764.0,0.0,0.0,0.0,0.0,3,0,0,0,0
2019-03-15 14:59:52,3764.0,3711880,3763.0,0.0,0.0,0.0,0.0,1,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,50,0,0,0,0
2019-03-15 14:59:52.500000,3764.0,3712004,3764.0,0.0,0.0,0.0,0.0,12,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,2,0,0,0,0
2019-03-15 14:59:53.500000,3762.0,3712846,3763.0,0.0,0.0,0.0,0.0,20,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,6,0,0,0,0
2019-03-15 14:59:54.500000,3764.0,3713058,3765.0,0.0,0.0,0.0,0.0,739,0,0,0,0,3764.0,0.0,0.0,0.0,0.0,55,0,0,0,0
2019-03-15 14:59:55.500000,3761.0,3713670,3762.0,0.0,0.0,0.0,0.0,2,0,0,0,0,3761.0,0.0,0.0,0.0,0.0,356,0,0,0,0
2019-03-15 14:59:56.500000,3763.0,3713982,3763.0,0.0,0.0,0.0,0.0,55,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,94,0,0,0,0
2019-03-15 14:59:57.500000,3761.0,3714472,3762.0,0.0,0.0,0.0,0.0,4,0,0,0,0,3761.0,0.0,0.0,0.0,0.0,327,0,0,0,0
2019-03-15 14:59:58.500000,3763.0,3714668,3763.0,0.0,0.0,0.0,0.0,6,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,5,0,0,0,0
2019-03-15 14:59:59,3762.0,3714990,3763.0,0.0,0.0,0.0,0.0,1,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,1,0,0,0,0

但是,中间也发生了比较奇怪的现象,比如:

2019-03-15 14:59:59,3762.0,3714990,3763.0,0.0,0.0,0.0,0.0,1,0,0,0,0,3762.0,0.0,0.0,0.0,0.0,1,0,0,0,0
2019-03-15 15:00:00.500000,3763.0,3715090,3764.0,0.0,0.0,0.0,0.0,140,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,12,0,0,0,0
2019-03-15 15:00:01.500000,3763.0,3715090,3764.0,0.0,0.0,0.0,0.0,140,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,12,0,0,0,0
2019-03-15 15:18:12,3763.0,3715090,3764.0,0.0,0.0,0.0,0.0,140,0,0,0,0,3763.0,0.0,0.0,0.0,0.0,12,0,0,0,0
2019-03-15 18:11:30.500000,3763.0,0,1.7976931348623157e+308,0.0,0.0,0.0,0.0,0,0,0,0,0,1.7976931348623157e+308,0.0,0.0,0.0,0.0,0,0,0,0,0
2019-03-15 19:48:09,3763.0,0,1.7976931348623157e+308,0.0,0.0,0.0,0.0,0,0,0,0,0,1.7976931348623157e+308,0.0,0.0,0.0,0.0,0,0,0,0,0
2019-03-15 20:59:01,3758.0,4626,3759.0,0.0,0.0,0.0,0.0,129,0,0,0,0,3758.0,0.0,0.0,0.0,0.0,32,0,0,0,0
2019-03-15 21:00:02,3760.0,8248,3760.0,0.0,0.0,0.0,0.0,255,0,0,0,0,3759.0,0.0,0.0,0.0,0.0,601,0,0,0,0

注意到,15点18分、16点11分等等,在非交易时间也出现了tick数据,而且,有一个8点59分01秒。

这些数据很奇怪,在实盘过程中都是要被剔除的。通常,我们可以设置ctp接受的开始和结束的时间,但是像8点59分这样的记录,其实很难去分离,所以大概还要叠加首个tick是否符合时间要求吧。

 

 

发布了205 篇原创文章 · 获赞 236 · 访问量 98万+

Guess you like

Origin blog.csdn.net/qtlyx/article/details/88559876