Three-minute introduction to quantification (4): Statistical analysis of market data

hello, i'm edamame. This series uses the most streamlined codes and cases to take you quickly to get started with quantification, and only talks about the most dry goods. Friends who want to learn quantification but don’t know how to get started, hurry up and read it!

Previous review:

Three-minute introduction to quantification (1): Acquisition of market data & drawing candlestick charts

Three-minute introduction to quantification (2): Introduction to

Three-minute introduction to quantification (3): Calculate the rate of return

This issue will continue to introduce how we do simple statistical analysis after obtaining the data. Maodou will update this series every weekend. It is recommended that you collect it for easy learning.

1. Statistics of rising and falling days

Let's take Cambridge Technology as an example and count its ups and downs this year.

First import the relevant packages and authenticate the Tushare Pro account.

import tushare as ts
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

pro = ts.pro_api('your token')
 
 

Obtain the market data of Cambridge Technology since this year. If you don’t know how to call the data interface, you can read my previous post.

df = pro.daily(ts_code='603083.SH', start_date='20230101', end_date='20230512')
df.head()

returns as follows:

1. Count the number of

print('总计天数:{}'.format(len(df)))
print('上涨天数:{}'.format(len(df[df['close']>df['pre_close']])))
print('下跌天数:{}'.format(len(df[df['close']<df['pre_close']])))

It prints as follows:

2. Count the positive/negative days of the stock this year

print('总计天数:{}'.format(len(df)))
print('阳线天数:{}'.format(len(df[df['close']>df['open']])))
print('阴线天数:{}'.format(len(df[df['close']<df['open']])))

It prints as follows:

3. Count the number of days and dates when the stock has increased by more than 5% this year

df1=df[((df['close']-df['pre_close'])/df['pre_close'])>=0.05]
print(len(df1))
print(list(df1.trade_date))

It prints as follows:

2. Interval growth statistics

1. Cumulative increase in range

rate1=(df.iloc[0].close-df.iloc[-1].close)/df.iloc[-1].close
print('区间累计涨幅:{}'.format(rate1))

2. The largest increase in the range

#仅用于上涨个股
rate2=(max(df.high)-min(df.low))/min(df.low)
print('区间最大涨幅:{}'.format(rate2))

3. Cumulative increase in a single month

Taking April as an example, first filter out the data in April:

df2=df[df.trade_date.str.contains('202304')]
df2.head()

returns as follows:

Then count according to the above method:

rate3=(df2.iloc[0].close-df2.iloc[-1].close)/df2.iloc[-1].close
print('四月累计涨幅:{}'.format(rate3))

4. The largest increase in a single month

#仅用于上涨个股
rate4=(max(df2.high)-min(df2.low))/min(df2.low)
print('区间最大涨幅:{}'.format(rate4))

3. Statistics of rising probability

1. Upward probability

p=len(df[df['change']>0])/len(df)

2. Estimate the probability that 3 days will rise in the next 5 days

In n independent repeated Bernoulli trials, let the probability of event A (rising) occur in each trial be p. Use X to represent the number of times event A occurs in n-fold Bernoulli experiments, then the discrete distribution of random variable X is a binomial distribution, X~B(n,p)

#scipy.stats包中的binom类对象表示二项分布
prob=stats.binom.pmf(3,5,p)
print(prob)

Print as follows:​​​​​​​​

The above is all the content of today’s dry goods. Maodou will update this series every weekend, and will continue to share with you the real situation of the Whirlwind Charge quantitative strategy every trading day. Welcome everyone to like and follow.

Backtest: Whirlwind Charge Strategy Description

Firm Offer: April Strategic Data Publicity & Frequently Asked Questions

Guess you like

Origin blog.csdn.net/weixin_37475278/article/details/130824543