Spearman correlation analysis method, and the complete code application of python

  • Spearman correlation analysis

Introduction

The Spearman correlation analysis method is a correlation calculation method for the nonlinear relationship between two variables, and at the same time, it does not make assumptions about the distribution of the data. The basic idea of ​​the method is to rank the values ​​of two (or more) variables and calculate the rank correlation (Spearman correlation coefficient) between them. The Spearman correlation coefficient ranges from -1 to 1, with a value of -1 indicating a complete negative correlation, a value of 1 indicating a complete positive correlation, and a value of 0 indicating no correlation between the two variables.

Basic usage

code example

import pandas as pd
import seaborn as sns

# 构造数据
data = {
    
    
    'x': [1, 3, 5, 7, 9],
    'y': [10, 8, 6, 4, 2]
}
df = pd.DataFrame(data)

# Spearman 相关性分析
corr = df.corr(method='spearman')
print(corr)

# 绘制相关系数热力图
sns.heatmap(corr, annot=True, cmap="YlGnBu")

Parameter Description

method: Correlation analysis method, which needs to be specified here spearman.

function return value

sns.heatmap(corr, annot=True, cmap="YlGnBu")A heat map of the correlation coefficient will be drawn, where:

  • corr: Correlation coefficient, that is, Spearman correlation coefficient.
  • annot: Whether to display the value in each square.
  • cmap: Colormap.

Introduce multiple variables

  • Spearman correlation analysis is usually used to analyze the relationship between two variables, but it can also analyze the relationship between multiple groups of variables by calculating the Spearman correlation coefficient between them. For correlation analysis between multiple variables, you can use corrthe function in pandas to calculate the Spearman correlation coefficient matrix between these variables. heatmapThen, a heatmap of the correlation coefficient matrix can be plotted using the function in seaborn .
import pandas as pd
import seaborn as sns

# 构造数据
data = {
    
    
    'x1': [1, 3, 5, 7, 9],
    'x2': [10, 8, 6, 4, 2],
    'x3': [9, -7, 5.4, -3, 1],
    'x4': [2, 4, 6, 8, 10]
}
df = pd.DataFrame(data)

# Spearman 相关性分析
corr = df.corr(method='spearman')
print(corr)

# 绘制相关系数热力图
sns.heatmap(corr, annot=True, cmap="YlGnBu")

In the sample code above, we constructed a data set containing 4 variables, including 'x1', 'x2', 'x3' and 'x4', and then used the Spearman correlation analysis method to calculate the relationship between these variables The Spearman correlation coefficient matrix of , and the heat map of the correlation coefficient is drawn.
result graph

Save the generated heat map

  • To save the generated Python heatmap as a file, you need to use savefig()the function of the Matplotlib library:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# 构造数据(可以使用你的数据)
data = {
    
    
    'x1': [1, 3, 5, 7, 9],
    'x2': [10, 8, 6, 4, 2],
    'x3': [9, -7, 5.4, -3, 1],
    'x4': [2, 4, 6, 8, 10]
}
df = pd.DataFrame(data)

# Spearman 相关性分析
corr = df.corr(method='spearman')
print(corr)

# 绘制相关系数热力图
sns.heatmap(corr, annot=True, cmap="YlGnBu")

# 保存热力图
plt.savefig('heatmap.png', dpi=300, bbox_inches='tight')

In this sample code, the Matplotlib savefig()function is used to save the heatmap. The first parameter of this function is the name and path of the file. In this example, we specified "heatmap.png"as the filename and saved the heatmap in the current working directory. dpiThe parameter sets the number of dots per inch (dots per inch), the default is 100, generally it can be set to 300 to obtain higher resolution. bbox_inches='tight'The parameter is used to tighten the image area to avoid being cropped, and can be adjusted as needed.

read data from excel

To get the data of each variable from an Excel table, you can use the Pandas library in Python. Pandas can read data in Excel tables and convert it into Pandas DataFrame objects for data analysis and visualization.

Sample code:

import pandas as pd
import seaborn as sns

# 读取 Excel 表中的数据
df = pd.read_excel('data.xlsx')

# Spearman 相关性分析
corr = df.corr(method='spearman')
print(corr)

# 绘制相关系数热力图
sns.heatmap(corr, annot=True, cmap="YlGnBu")

Among them, read_excel()the function is used to read the Excel file, and its parameters specify the name and path of the Excel file. By default, it reads the first sheet in the file and converts it to a Pandas DataFrame object. If there are multiple sheets in the Excel file, sheet_namethe specified sheet can be read as a DataFrame object using the argument.

Assuming that each column of data in your Excel file represents a variable, you can use the following code to separate each column of data in the Excel table into a new DataFrame object:

# 将列分离成新的 DataFrame 对象
var_dict = {
    
    }
for column in df:
    var_dict[column] = df[column]

# 输出各个变量的数据
for key, value in var_dict.items():
    print(key, value.tolist())

df[column]Returns a Pandas Series object containing data from a column in an Excel table. Store it in a Python dictionary object var_dict, which can be tolist()converted to a Python list using the function to output the data of each variable.

Then, the sample code above can be used to extract the variable data from the slides in each column described, and use Spearman correlation analysis to calculate the relationship between them, and finally generate a correlation coefficient heat map.
More complex heatmaps

Guess you like

Origin blog.csdn.net/weixin_67016521/article/details/129863814