以下银行卡诈骗项目中的一段代码:
from sklearn.cross_validation import KFold
def printing_Kfold_scores(x_train_data,y_train_data):
fold =KFold(len(y_train_data),5,shuffle=False)
for iteration, indices in enumerate(fold,start=1):
lr = LogisticRegression(C = c_param,penalty = 'l1')
lr.fit(x_train_data.iloc[indices[0],:],y_train_data.iloc[indices[0],:].values.ravel())
y_pred_undersample =lr.predict(x_train_data.iloc[indices[1],:].values)
recall_acc =recall_score(y_train_data.iloc[indices[1],:].values,y_pred_undersample)
recall_accs.append(recall_acc)
以上这段代码本身是没有问题的,但由于库版本的原因,有的人在运行这段代码后,出现以下错误:
ModuleNotFoundError: No module named 'sklearn.cross_validation'
为此他将from sklearn.cross_validation import KFold改为from sklearn.model_selection import KFold,再运行却发现有了新的问题:
TypeError: __init__() got multiple values for argument 'shuffle'
这是为什么呢?其实这是导入 KFold的方式不同引起的。如果你这样做:from sklearn.cross_validation import KFold,那么:
KFold(n,5,shuffle=False) # n为总数,需要传入三个参数
但如果你这样做:from sklearn.model_selection import KFold,那么:
fold = KFold(5,shuffle=False) # 无需传入n
正确代码如下:
from sklearn.model_selection import KFold
def printing_Kfold_scores(x_train_data,y_train_data):
fold = KFold(5,shuffle=False)
recall_accs = []
for iteration, indices in enumerate(fold.split(x_train_data)):
lr = LogisticRegression(C = c_param, penalty = 'l1')
lr.fit(x_train_data.iloc[indices[0],:],y_train_data.iloc[indices[0],:].values.ravel())
y_pred_undersample = lr.predict(x_train_data.iloc[indices[1],:].values)
recall_acc = recall_score(y_train_data.iloc[indices[1],:].values,y_pred_undersample)
recall_accs.append(recall_acc)
所以,导入库方式不同,会导致传入参数有所不同,一定要注意。