sklearn的imblearn包欠采样Name ‘RandomUnderSampler‘ is not defined

Name ‘RandomUnderSampler’ is not defined

当分类数据类别比例不均衡时,需要调用imblearn来进行欠采样处理。

#欠采样
from imblearn.under_sampling import RandomUnderSampler
RandomUnderSampler.fit_sample(x,y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-140-7583303d45ee> in <module>
      1 #欠采样
      2 from imblearn.under_sampling import RandomUnderSampler
----> 3 RandomUnderSampler.fit_sample(x,y)

TypeError: fit_resample() missing 1 required positional argument: 'y'

直接使用RandomUnderSampler.fit_sample(x,y)会出现报错,原因就是在调用RandomUnderSampler的时候没有打括号,这是初学者很容易犯的错误!!!在调用sklearn中的算法时,都需要注意这个问题,下面贴上正确的代码:

from imblearn.under_sampling import RandomUnderSampler
model_RandomUnderSample=RandomUnderSampler()
x_,y_=model_RandomUnderSample.fit_sample(x,y)

下面来看一下,欠采样前后数据的变化:

x.shape,y.shape,y['status'].value_counts(normalize=True)
#欠采样之前,类别不均衡8:1,数据量为19万+
((192836, 22),
 (192836, 1),
 89    0.843064
 78    0.156936
 Name: status, dtype: float64)
x_.shape,y_.shape,y_['status'].value_counts(normalize=True)
#欠采样之后,类别不均衡1:1,数据量下降为6万左右
((60526, 22),
 (60526, 1),
 89    0.5
 78    0.5
 Name: status, dtype: float64)

猜你喜欢

转载自blog.csdn.net/zxxxlh123/article/details/108852770