Are stock prices roaming randomly? Tell you with Python

What is a random walk

The phenomena that exist in this world are roughly divided into two categories: inevitable phenomena and random phenomena. An inevitable phenomenon is like the sun inevitably rises from the east and sets from the west every day, and its results can be predicted in advance under the same conditions. The random phenomenon is different. It will show uncertain results in individual experiments. For example, a coin toss may be heads or tails. However, under the same conditions, a large number of repeated trials will show certain regularity, because When the number of throws gradually increases, the frequency of heads or tails will gradually approach 50%.

Obviously, the rise and fall of stocks is a random phenomenon, because no one can determine the specific trend of tomorrow. People are generally afraid of uncertainty, so a value between 0 and 1 is used to represent the probability of each random phenomenon. This value is the probability. Probability can help people make rational judgments about unknown results. The essence of quantitative trading is exactly this. It obtains trading strategies with high probability of profit from historical data.

The law of stock market volatility has always been a challenging world-class problem. So far, there have been many representative theories, and Random Walk Theory is one of them. The random walk theory describes the probability of stocks rising and falling. As early as 1990, a doctoral student in Paris, Louis Bachelier (1887-1946), followed the ups and downs of the Paris stock market at that time, hoping to use mathematical tools to describe the process of stock price changes. In his thesis "Speculation Theory", he pointed out that the daily changes of stock prices are fundamentally unpredictable, similar to the "Brownian motion", which is a random walk, and there is no rule to follow. Just like a person buys a stock and sells it immediately, then the probability of winning or losing is equal.

The theory explains the random walk phenomenon: because the stock information flowing into the market is public, thousands of professionals in the market will carry out detailed analysis of the stock, driving the stock long and short trading, so the current price of the stock has actually been Reflects the supply and demand relationship and intrinsic value, and this price is a reasonable price constructed by professionals after analysis, and subsequent market prices will fluctuate around it. The reasons for the volatility will be new economics, political news, acquisitions, mergers, interest rate hikes and interest rate cuts, etc. These news flow into the market without any trajectory, allowing professionals to re-analyze the value of stocks and give guidelines for buying and selling. Resulting in new changes in stocks. It can be seen that stocks currently have no memory system, and the past, present, and future ups and downs are not related. It is not feasible to try to use stock price fluctuations to find a principle to predict the whereabouts of the stock market. The random walk theory is undoubtedly a frontal enemy to the technical chart school. Although the theory is still undergoing the test of time, once the theory is established, all stock experts will have no place.

Many experts and scholars have studied the argument of random walks. An example was mentioned in the book "Walking on Wall Street". The author asked his students to toss a coin to construct a hypothetical stock price chart. The stock price is set at $50 at the beginning, and then the closing price of each trading day is determined by the result of a right toss coin: if you throw a heads up, it is assumed that the closing price of the stock rose 0.5% from the previous day, otherwise it fell 0.5%. Finally, the trend chart drawn based on random coin toss is very similar to the normal stock price trend chart, with a "head and shoulders" pattern, and even shows cyclical changes.

In fact, is the stock price really unpredictable? Life is short, we might as well use Python to explore the mysteries.

2. Python random number generation

Both the built-in random module of Python and the random module of the third-party library NumPy provide methods to generate random walk sequences, which are the functions random.randint() and numpy.random.randint() respectively. NumPy mainly uses the N-dimensional array object ndarray to store data. As the core of NumPy, ndarray not only has the ability of vector arithmetic operations, but also is fast and space-saving when processing multi-dimensional large-scale arrays.

Let's first understand the advantages of ndarray in terms of efficiency. By comparing the numpy.random.randint() method to generate 1000000 random arrays and the random.randint() method to generate an equivalent Python list, we can understand them. The specific performance gap between. The implementation code is as follows:  

def list_test():
walk = []
for _ in range(1000000):
walk.append(random.randint(01))

def ndarray_test():
np.random.randint(02, size=1000000)

t1 = timeit('list_test()''from __main__ import list_test', number=1)
t2 = timeit('ndarray_test()''from __main__ import ndarray_test', number=1)

print("list:{}".format(t1)) # list:1.3908312620000003
print("ndarray:{}".format(t2)) # ndarray:0.009495778999999871

It can be seen that the efficiency advantage of NumPy's random module is very obvious, which is basically more than 100 times that of Python's built-in module random. Therefore, it is recommended to use the numpy.random.randint() function to generate random numbers here. We understand its constructor and basic usage:

numpy.random.randint(low, high=None, size=None, dtype=’l’)

  • Returns a random integer, the range is [low,high], contains low, but does not contain high

  • Parameters: low is the minimum value, high is the maximum value, size is the array dimension size, dtype is the data type, and the default data type is np.int

  • When high is not filled in, the range of random number generation by default is [0, low]


    print("np.random.randint:\n {}".format(np.random.randint(1,size=5)))# 返回 [0,1) 之间的整数,所以只有 0
    """
    np.random.randint:
    [0 0 0 0 0]
    """

    print("np.random.randint:\n {}".format(np.random.randint(1,5)))# 返回 1 个 [1,5) 时间的随机整数
    """
    np.random.randint:
    2
    """

    print("np.random.randint:\n {}".format(np.random.randint(-5,5,size=(2,2))))
    """
    np.random.randint:
    [[-5 -3]
    [ 2 -3]]
    """


  • Note: It should be noted that the random numbers generated by the random module are pseudo-random numbers, which rely on special algorithms and specified uncertain factors (seed) to achieve, but this does not prevent us from experimenting.

3. Foresee the law of random walks

Some studies say that the daily price changes of stocks are as unpredictable as the drunkard's walk. Next, we use Python to reproduce the drunkard's random walk.

Let's suppose that a drunk man who is drunk starts walking aimlessly under a street light, and may move forward, backward or turn at each step. So after a certain period of time, where is the position of this drunkard? For ease of understanding, we simplify the movement of the drunk to a one-dimensional movement, stipulating that he can only move forward or backward randomly in a straight line.

We use the numpy.random.randint() function mentioned above to generate 2000 random numbers as a random walk route. The implementation code is as follows:  

draws = np.random.randint(02, size=2000)
print(f'random walk direction is {draws}')
#random walk direction is [1 0 1 ... 0 1 0]

Then we use the matplotlib.pyplot.plot() function to draw a simulated trajectory graph that the drunkard walks randomly from the 0 axis for 2000 steps, as shown in Figure 1-1:

从中可知在 2000 次漫步中,终点的距离为 32,第 1595 步前进最远的距离为 64,第 142 步后退最远的距离为 -25。我们把随机漫步轨迹的计算封装为函数 random_walk(),此处分享实现的代码,如下所示:  

def draw_random_walk():
walk_steps = 2000
walk_path = random_walk(walk_steps)

# 统计漫步过程中,终点、前进和后退最大的距离
start_y = 0
start_x = 0
end_y = walk_path[-1]
end_x = walk_steps-1

max_y = walk_path.max()
max_x = walk_path.argmax()

min_y = walk_path.min()
min_x = walk_path.argmin()

x = np.linspace(02000, num=2000)

# 绘制出漫步的足迹
plt.plot(x, walk_path, color='b', linewidth=1, label='walk step')

# 添加标注
# 起点坐标
plt.annotate(
'start:({},{})'.format(start_x, start_y),
xy = (start_x,start_y),
xycoords='data',
xytext=(+50, +20),
textcoords='offset points',
fontsize=8,
bbox=dict(boxstyle='round,pad=0.5',fc ='yellow', alpha = 0.5),
arrowprops=dict(arrowstyle='->', connectionstyle="arc3,rad=.2")
)

# 终点坐标
plt.annotate(
'end:({},{})'.format(end_x, end_y),
xy = (end_x,end_y),
xycoords='data',
xytext=(-50, +20),
textcoords='offset points',
fontsize=8,
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
arrowprops=dict(arrowstyle='->', connectionstyle="arc3,rad=.2")
)

# 最大距离坐标
plt.annotate(
'max:({},{})'.format(max_x,max_y),
xy = (max_x,max_y),
xycoords = 'data',
xytext = (-20, +20),
textcoords='offset points',
fontsize = 8,
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
arrowprops=dict(arrowstyle='->', connectionstyle="arc3,rad=.2")
)
# 最小距离坐标
plt.annotate(
'min:({},{})'.format(min_x,min_y),
xy = (min_x,min_y),
xycoords = 'data',
xytext = (-20, +20),
textcoords='offset points',
fontsize = 8,
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
arrowprops=dict(arrowstyle='->', connectionstyle="arc3,rad=.2")
)
plt.legend(loc='best')
plt.xlabel('游走步数')
plt.ylabel('分布轨迹')
plt.title(u"模拟随机漫步")
plt.show()

由于醉汉的每一步都是完全随机的,因此他最终准确的位置无法被预测出,就像每天的股票价格变动一样是不可预知的。但是,量化交易会从统计学的角度去分析问题,我们用 1000 次随机漫步来模拟醉汉从 0 轴开始 1000 次随机游走 2000 步的模拟轨迹图形,如图 1-2 所示:

图片

从统计学的角度来看,这名醉汉最终的位置的概率分布却是可以计算出来的。图中我们直观地观察出随机游走的发展情况,每一条淡淡的蓝线就是一次模拟,横轴为行走的步数,纵轴表示离开起始点的位置。蓝色越深,就表示醉汉在对应行走了对应的步数之后,出现在此位置的概率越大,可见随着醉汉可能出现的位置的范围不断变大,但是距离起始点越远的位置概率越小。

于是我们联想到正态分布。正态分布是连续随机变量概率分布的一种,也称“常态分布”、“高斯分布”(Gaussian distribution),最早的正态分布概念其实是由德国的数学家和天文学家阿伯拉罕·德莫弗尔 (Abraham de Moivre) 于 1733 年首次提出的,但由于德国数学家 Gauss 率先将其应用于天文学家研究,故正态分布又叫高斯分布。

正态分布描述的是某件事出现不同结果的概率分布情况,它的概率密度曲线的形状是两头低,中间高,左右对称呈钟型,与我们模拟的随机漫步图很相似。接下来我们继续验证,使用 matplotlib.pyplot 库中的 hist() 函数将随机漫步的位置绘制为直方图。如图 1-3 所示:

图片

从图中的显示可知醉汉的行走轨迹在一定意义上是符合正态分布的。可见正态分布现象在现实中意义重大,在自然界、人类社会、心理学等领域的大量现象中都服从或者近似服从正态分布,比如人们能力的高低,身高、体重等身体的状态,学生成绩的好坏,人们的社会态度、行为表现等等。不禁感慨道:数学的奇妙之处就在于,我们可以把不可预知性变为可预知。

4. 总结

金融证券市场一直充满了随机性,但是量化交易的精髓就是用数学公式来精确计算真实的概率分布,以应对不确定性。量化交易的鼻祖级大神爱德华·索普就是利用这种随机游走模型的思想,推算出认股权证在合约兑现的那一天相对应的股票的价格的概率分布,从而计算出当前认股权证的价格是过高还是过低,然后再利用凯利公式,进行买卖。笔者以模拟随机漫步为切入点将文中用 Python 分析金融数据的方法分享给大家,希望能够对大家有所启发。


Guess you like

Origin blog.51cto.com/15060462/2678209