ValueError: The internally computed table of expected frequencies has a zero element at (1,).
目录
ValueError: The internally computed table of expected frequencies has a zero element at (1,).
问题:
当前数据中含有没有频率的统计水平(level)
假设数据有四个水平:0,1,2,3,4
其中在测试集和训练集中0这个水平的样本都是0,0,则会发生如下错误
import numpy as np
from scipy.stats import chi2_contingency
d = np.array([[0,7,1], [0,21,2]])
# d = np.array([[0,8], [8,15]])
print(chi2_contingency(d))
解决:
去掉在当前数据中没有频率的统计水平(level)
import numpy as np
from scipy.stats import chi2_contingency
d = np.array([[0,7,1], [0,21,2]])
d = np.array([[7,1], [21,2]])
print(chi2_contingency(d))
(0.0, 1.0, 1, array([[ 7.22580645, 0.77419355], [20.77419355, 2.22580645]]))
整体平滑+1
import numpy as np
from scipy.stats import chi2_contingency
d = np.array([[1,8,2], [1,22,3]])
# d = np.array([[7,1], [21,2]])
print(chi2_contingency(d))
(0.7805361305361306, 0.6768754033897442, 2, array([[ 0.59459459, 8.91891892, 1.48648649], [ 1.40540541, 21.08108108, 3.51351351]]))
完整错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-179-70f4198dddcb> in <module>
5 for x in categorical:
6
----> 7 pvalue = chi2_contingency(features_X_test.loc[features_X_test.label==1][x], features_X_test.loc[features_X_test.label==0][x])[1]
8 print(x)
9 print(pvalue)
D:\anaconda\lib\site-packages\scipy\stats\contingency.py in chi2_contingency(observed, correction, lambda_)
275 zeropos = list(zip(*np.nonzero(expected == 0)))[0]
276 raise ValueError("The internally computed table of expected "
--> 277 "frequencies has a zero element at %s." % (zeropos,))
278
279 # The degrees of freedom
ValueError: The internally computed table of expected frequencies has a zero element at (1,).