[Finance] How to use cosine similarity to judge which period of time the current macro environment is similar to in history

The similarity of the macro environment will lead to similar behaviors in many economic activities of people. For example, the performance of the stock market is largely affected by macro factors such as the currency level and the degree of economic prosperity. If we can judge that the current macroeconomic environment is similar to a certain period of time in history, then we can use historical experience to judge the development of things we care about, such as the trend of stock indexes.

Table of contents

1. What is cosine similarity

2. Use Python to realize the calculation of macro-environmental similarity


Cosine similarity measures the similarity between two vectors by measuring the cosine of the angle between them. The cosine of an angle of 0 degrees is 1, and the cosine of any other angle is no greater than 1; and its minimum value is -1. Thus the cosine of the angle between two vectors determines whether the two vectors are generally pointing in the same direction. When two vectors have the same direction, the value of cosine similarity is 1; when the angle between two vectors is 90°, the value of cosine similarity is 0; when two vectors point in completely opposite directions, the value of cosine similarity is -1. This result is independent of the length of the vector, only the direction in which the vector points. Cosine similarity is usually used in positive spaces, so it is given a value between -1 and 1. Note that the upper and lower bounds are valid for vector spaces of any dimension , and cosine similarity is most commonly used for high-dimensional positive spaces.

 

The cosine between two vectors can be found using the Euclidean dot product formula:

a\cdot b=\left \| a \right \|\left \| b \right \|cos\Theta

Given two attribute vectors, A and B , the cosine similarity θ is given by the dot product and the length of the vectors as follows:

similarity = cos(\Theta ) = \frac{A\cdot B}{\left \| A \right \|\left \| B \right \|}

Here represent the components of vectors A and B, respectively .

The given similarity ranges from -1 to 1: -1 means that the two vectors point in exactly opposite directions, 1 means they point in exactly the same way, 0 usually means they are independent, and between Values ​​between represent intermediate similarities or dissimilarities.

2. Use Python to realize the calculation of macro-environmental similarity

The "macrodata.xlsx" data in the code are: CPI, PPI, GDP, ten-year treasury bond yield to maturity, new increment of social financing. (data not available yet)

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

#宏观数据
data_macro = pd.read_excel("宏观数据.xlsx")
data_macro_columns = data_macro.columns
data_macro_columns = list(data_macro_columns)
data_macro_columns.remove('日期')
#归一化
scaler = MinMaxScaler()
scaler = scaler.fit(data_macro.loc[:,data_macro_columns])  # 本质生成 max(x) 和 min(x)
result = scaler.transform(data_macro.loc[:,data_macro_columns])
#计算余弦相似度
data_macro_cosine_similarity = cosine_similarity(result)
#可视化
import matplotlib.pyplot as mp,seaborn
names = []
for date in data_macro['日期']:
    names.append(date.date())
seaborn.heatmap(data_macro_cosine_similarity, center=0, annot=True,xticklabels=names , yticklabels=names,cmap ='YlGnBu')
mp.rcParams['figure.figsize'] = (30, 20) # 设置figure_size尺寸
mp.rcParams['font.sans-serif']=['SimHei']#黑体
# 设置刻度字体大小
mp.xticks(fontsize=12)
mp.yticks(fontsize=12,wrap = True)
# 设置坐标标签字体大小
ax = mp.gca()
ax.set_xlabel(..., fontsize=12)
ax.set_ylabel(..., fontsize=12,wrap = True)
# 设置图例字体大小
#ax.legend(..., fontsize=20)
mp.show()

output:

It can be seen that the periods similar to the macro data of June 30, 2022, which is the second quarter of this year (the darker the color in the last row in the figure, the more similar) are from the second quarter of 2012 to the second quarter of 2013, and the second quarter of 2015 From the second quarter to the fourth quarter, in the second and third quarters of 2016 .

Guess you like

Origin blog.csdn.net/standingflower/article/details/125669260