Pandas correlation analysis

DataFrame.corr(method='pearson', min_periods=1)

Parameter Description:

method: optional values ​​are {'pearson', 'kendall', 'spearman'}

               pearson: The Pearson correlation coefficient measures whether two data sets are on a line, that is, the correlation coefficient calculation for linear data will have errors for non-linear data.

                kendall: an indicator used to reflect the correlation of categorical variables, that is, correlation coefficients for unordered sequences, non-normal distribution

                spearman: Correlation coefficient of non-linear, non-positive analysis data

min_periods: the minimum amount of data for the sample

Return value: DataFrame table of correlation coefficients between types.

To distinguish the difference between different parameters, our experiment is as follows:

from pandas import DataFrame
import pandas as pd
X = [A for A in Range (100 )]
 # configured a quadratic equation, the nonlinear relationship 
DEF y_x (X):
     return 2. 4 * X ** 2 + 
Y = [y_x (I) for I in X]
 
data=DataFrame({'x':x,'y':y})
 
# View of the data structure of the data 
data.head ()
Out[34]: 
   xy
0  0   4
1  1   6
2  2  12
3  3  22
4  4  36
 
data.corr()
Out[35]: 
          xy
x   1.000000 0.967736 
and   0.967736 1.000000
 
data.corr(method='spearman')
Out[36]: 
     xy
x   1.0 1.0 
and   1.0 1.0
 
data.corr(method='kendall')
Out[37]: 
     xy
x   1.0 1.0 
and   1.0 1.0

Because y is constructed by a function, the correlation coefficient of x and y is 1, but the experimental structure shows that the pearson coefficient has a certain error for nonlinear data.

From: https://blog.csdn.net/walking_visitor/article/details/85128461

Guess you like

Origin www.cnblogs.com/zhangzhixing/p/12742968.html