[Transfer] Stock time series data processing and moving average

1 What is a time series

A time series is a sequence of data points arranged in the order of time. Usually the time interval of a set of time series is a constant value (such as 1 second, 5 minutes, 12 hours, 7 days, 1 year), so the time series can be analyzed and processed as discrete time data.

For example: a broken line chart of a monitoring system, showing the trend of the number of requests and response time over time

2 Pandas time type

  • pd.to_datetime(): converted to pandas time type Timestamp('2018-03-02 00:00:00')
# pd将时间数据转换成pandas时间类型
# 1、填入时间的字符串,格式有几种, "2018-01-01" ,”01/02/2018“
pd.to_datetime("01/02/2017")

If we pass in multiple points in time, what will it look like?

3 Pandas time series types

  • 1 Convert time series type
from datetime import datetime
# 传入时间的列表
date1 = ["2017-01-01", "2017-02-01", "2017-03-01"]
pd.to_datetime(date1)
 
# 或者
date2 = [datetime(2018, 3, 1), datetime(2018, 3, 2), datetime(2018, 3, 3), datetime(2018, 3, 4), datetime(2018, 3, 5)]
date2 = pd.to_datetime(date2)
 
# 如果其中有空值
date3 = [datetime(2018, 3, 1), datetime(2018, 3, 2), np.nan, datetime(2018, 3, 4), datetime(2018, 3, 5)]
date3 = pd.to_datetime(date3)
# 结果会变成NaT类型
DatetimeIndex(['2018-03-01', '2018-03-02', 'NaT', '2018-03-04', '2018-03-05'], dtype='datetime64[ns]', freq=None)

2 Pandas time series type: DatetimeIndex

# DateTimeIndex
pd.to_datetime(date1)
DatetimeIndex(['2018-03-01', '2018-03-02', '2018-03-03', '2018-03-04',
               '2018-03-05'],
              dtype='datetime64[ns]', freq=None)
 
pd.to_datetime(date1).values
 
array(['2018-03-01T00:00:00.000000000', '2018-03-02T00:00:00.000000000',
       '2018-03-03T00:00:00.000000000', '2018-03-04T00:00:00.000000000',
       '2018-03-05T00:00:00.000000000'], dtype='datetime64[ns]')

We can also use DatetimeIndex to convert

  • 3 Conversion through pd.DatetimeIndex
pd.DatetimeIndex(date1)

Knowing the type of time series, we can use this as an index to get data

4 Pandas's basic time series structure

# 最基础的pandas的时间序列结构,以时间为索引的Series序列结构
>>>series_date = pd.Series(3.0, index=date1)
返回:
2017-01-01    3.0
2017-02-01    3.0
2017-03-01    3.0
dtype: float64
 
>>>pd.to_datetime(series_date)
2017-01-01   1970-01-01 00:00:00.000000003
2017-02-01   1970-01-01 00:00:00.000000003
2017-03-01   1970-01-01 00:00:00.000000003
dtype: datetime64[ns]
 
>>>pd.DatetimeIndex(series_date)
DatetimeIndex(['1970-01-01 00:00:00.000000003',
               '1970-01-01 00:00:00.000000003',
               '1970-01-01 00:00:00.000000003'],
              dtype='datetime64[ns]', freq=None)

The index of a pandas time series series must be DatetimeIndex

  • Properties of DatetimeIndex
    • year,month,weekday,day,hour….
time.year
time.month
time.weekday

5 Pandas generates a time series with a specified frequency

pandas.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)

Returna fixed frequency DatetimeIndex, with day (calendar) as the default frequency

start: start time

end: end time

periods: how long the sequence is generated

freq: frequency D, H, Q, etc.

tz: time zone

# 生成指定的时间序列
# 1、生成2017-01-02~2017-12-30,生成频率为1天, 不跳过周六周日
pd.date_range("2017-01-02", "2017-12-30", freq="D")
 
# 2、生成2017-01-02~2017-12-30,生成频率为1天, 跳过周六周日, 能够用在金融的数据,日线的数据
pd.date_range("2017-01-02", "2017-12-30", freq="B")
 
# 3、只知道开始时间日期,我也知道总共天数多少,生成序列, 从"2016-01-01", 共504天,跳过周末
pd.date_range("2016-01-01", periods=504, freq="B")
 
# 4、生成按照小时排列的时间序列数据
pd.date_range("2017-01-02", "2017-12-30", freq='H')
 
# 5、按照3H去进行生成
pd.date_range("2017-01-02", "2017-12-30", freq='3H')
 
# 6、按照1H30分钟去进行生成时间序列
pd.date_range("2017-01-02", "2017-12-30", freq='1H30min')
 
# 7、按照每月最后一天
pd.date_range("2017-01-02", "2017-12-30", freq='BM')
 
# 8、按照每个月的第几个星期几
pd.date_range("2017-01-02", "2017-12-30", freq='WOM-3FRI')

6 What is time series analysis

For time series types, there are unique analysis methods. Similarly, the stock itself is also a type of time series, so we use stock data for time series analysis.

 

The time series analysis method emphasizes the continuous observation and calculation of an area within a certain period of time, extracting relevant features, and analyzing its change process.

 

Time series analysis mainly includes deterministic change analysis

Deterministic change analysis: moving average method, moving variance and standard deviation, moving correlation coefficient

7 Moving average method

1 Move the window

Mainly used in the array transformation of time series. Functions with different functions are collectively referred to as moving window functions.

2 Moving average

Then there will be various methods of observing the window, the most commonly used is the moving average method

  • Moving Average is abbreviated as moving average,  which divides the sum of closing prices for a certain period of time by the period

3 Classification of moving averages

The moving average is divided into short-term (5 days), medium-term (20 days) and long-term (60 days, 120 days) according to the calculation period. There is no fixed limit for the moving average.

According to the algorithm, the moving average is divided into arithmetic, weighted and exponential moving average

Note: different moving average methods are different

 

1) Simple moving average

 

Simple Moving Average (SMA), also known as "arithmetic moving average", refers to the averaging of closing prices during a specific period.

 

For example, the 5-day moving average SMA=(C1+ C2 + C3 + C4 + C5) / 5

example:

Case: Calculating the moving average of stock data

Get the stock data and draw a candlestick chart

# 拿到股票K线数据
stock_day = pd.read_csv("./stock_day/stock_day.csv")
stock_day = stock_day.sort_index()
stock_day.index = pd.to_datetime(stock_day.index)
stock_day['date'] = date2num(stock_day.index)
arr = stock_day[['date', 'open', 'close', 'high', 'low']]
values = arr.values[:200]
# 画出K线图
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(20, 8), dpi=80)
candlestick_ochl(axes, values, width=0.2, colorup='r', colordown='g')
axes.xaxis_date()
plt.show()

2) Calculate the moving average

 

pandas.rolling_mean(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs) Moving mean.

 

Parameters:

 

arg : Series, DataFrame

window: calculation period

# 直接对每天的收盘价进行求平均值, 简单移动平局线(SMA)
# 分别加上短期、中期、长期局均线
pd.rolling_mean(stock_day["close"][:200], window=5).plot()
pd.rolling_mean(stock_day["close"][:200], window=10).plot()
pd.rolling_mean(stock_day["close"][:200], window=20).plot()
pd.rolling_mean(stock_day["close"][:200], window=30).plot()
pd.rolling_mean(stock_day["close"][:200], window=60).plot()
pd.rolling_mean(stock_day["close"][:200], window=120).plot()

3) Weighted Moving Average (WMA)

The weighted moving average (WMA) takes its average price over a certain period of time, and its weight is set by the length of the average line. The more recent the closing price, the more important the impact on market conditions.

Because the weighted moving average emphasizes increasing the proportion of prices in the near future, when the market conditions reverse, the weighted moving average is easier to predict price fluctuations than other averages. But we still will not use weighting lightly, it should be his weight is too large! ! ! !

 

4) Exponential smoothed moving average (EWMA)

 

It was developed in response to the lack of moving averages that are regarded as backward indicators. In order to solve the problem that once the price has deviated from the moving average and the average value fails to respond immediately, EWMA can reduce similar shortcomings.

to sum up:

pd.ewma(com=None, span=one)

  • Exponential average
  • span: time interval
# 画出指数平滑移动平均线
pd.ewma(stock_day['close'][:200], span=10).plot()
pd.ewma(stock_day['close'][:200], span=30).plot()
pd.ewma(stock_day['close'][:200], span=60).plot()

8 Moving variance and standard deviation

  • Variance and standard deviation: reflect the stability of the sequence in a certain period

# 求出指定窗口大小的收盘价标准差和方差
pd.rolling_var(stock_day['close'][:200], window=10).plot()
pd.rolling_std(stock_day['close'][:200], window=10).plot()

9 Pairwise scatter plot of various indicator data

  • pd.scatter_matrix(frame, figsize=None)
    • frame:DataFrame

 

frame = stock_day[['open','volume', 'ma20', 'p_change', 'turnover']]
pd.scatter_matrix(frame, figsize=(20, 8))

From this, we can simply see that there is a very obvious linear relationship between volume and turnover, because the definition of turnover is: volume divided by the total number of shares issued.

 

Some indicators of strong correlation can be found through some graphs or correlation analysis, which will be introduced in detail in machine learning and quantification.

 

Correlation coefficient: It will be introduced later, at present we only need to know that it reflects the relationship between the two sequences.

 

10 Case: Local storage of moving average data

ma_list = [5, 20 ,60]
for ma in ma_list:
    stock_day['MA' + str(ma)] = pd.rolling_mean(stock_day.close, window=ma)
for ma in ma_list:
    stock_day['EMA' + str(ma)] = pd.ewma(stock_day.close, span=ma)
 
data.to_csv("EWMA.csv")

11 The role of moving averages

Moving averages are often used as the basic theory of technical analysis, from which various technical indicator strategies are derived. A simple moving average-based strategy will be introduced later.

 

Guess you like

Origin blog.csdn.net/weixin_52071682/article/details/115217420