DataFrame.corr(method='pearson', min_periods=1)
Parameter Description:
method: optional values are {'pearson', 'kendall', 'spearman'}
pearson: The Pearson correlation coefficient measures whether two data sets are on a line, that is, the correlation coefficient calculation for linear data will have errors for non-linear data.
kendall: an indicator used to reflect the correlation of categorical variables, that is, correlation coefficients for unordered sequences, non-normal distribution
spearman: Correlation coefficient of non-linear, non-positive analysis data
min_periods: the minimum amount of data for the sample
Return value: DataFrame table of correlation coefficients between types.
To distinguish the difference between different parameters, our experiment is as follows:
from pandas import DataFrame import pandas as pd X = [A for A in Range (100 )] # configured a quadratic equation, the nonlinear relationship DEF y_x (X): return 2. 4 * X ** 2 + Y = [y_x (I) for I in X] data=DataFrame({'x':x,'y':y}) # View of the data structure of the data data.head () Out[34]: xy 0 0 4 1 1 6 2 2 12 3 3 22 4 4 36 data.corr() Out[35]: xy x 1.000000 0.967736 and 0.967736 1.000000 data.corr(method='spearman') Out[36]: xy x 1.0 1.0 and 1.0 1.0 data.corr(method='kendall') Out[37]: xy x 1.0 1.0 and 1.0 1.0
Because y is constructed by a function, the correlation coefficient of x and y is 1, but the experimental structure shows that the pearson coefficient has a certain error for nonlinear data.
From: https://blog.csdn.net/walking_visitor/article/details/85128461