过采样欠采样的处理方法 - 代码天地

过采样欠采样的处理方法

其他 2021-01-30 11:27:29 阅读次数: 0

1、使用SMOTE进行过采样
使用SMOTE过采样时应先切分训练集和验证集，再对训练集进行过采样，否则将会导致严重的过拟合
https://beckernick.github.io/oversampling-modeling/
使用方法：

X_train, X_val, y_train, y_val = train_test_split(train_df[predictors], train_df[target], test_size=0.15, random_state=1234)

from imblearn.over_sampling import SMOTE
oversampler = SMOTE(ratio='auto', random_state=np.random.randint(100), k_neighbors=5, m_neighbors=10, kind='regular', n_jobs=-1)
os_X_train, os_y_train = oversampler.fit_sample(X_train,y_train)

from collections import Counter
print('Resampled dataset shape {}'.format(Counter(os_y_train)))

注意，过采样之后就不能直接把Pandas.DataFrame数据传入模型，特征名称已改变

model=XGBClassifier(
    learning_rate =0.1,
    n_estimators=1000,
    max_depth=5,
    min_child_weight=1,
    gamma=0,
    subsample=0.8,
    colsample_bytree=0.8,
    objective= 'binary:logistic',
    nthread=-1,
    scale_pos_weight=1,
    seed=27
)

model.fit(
    os_X_train,
    os_y_train,
    eval_set=[(X_val.values, y_val)],
    early_stopping_rounds=3,
    verbose=True,
    eval_metric='auc'
)

https://www.kaggle.com/ktattan/recall-97-with-smote-random-forest-tsne
2、欠采样，也叫下采样

def down_sample(df):
    """
    欠采样
    """
    df1 = df[df['acc_now_delinq'] == 1]
    df2 = df[df['acc_now_delinq'] == 0]
    df3 = df2.sample(frac=0.1)
    return pd.concat([df1, df3], ignore_index=True)

https://blog.csdn.net/u010412858/article/details/80151516

猜你喜欢

转载自blog.csdn.net/yuekangwei/article/details/111450825

过采样欠采样的处理方法

过采样和欠采样

人工智能中非平衡数据处理方法、欠采样、过采样讲解（简单易懂）

过采样和欠采样（数据不均衡处理）

数据处理过采样与欠采样 SMOTE与随机采样达到样本均衡化

python数据预处理：样本分布不均（过采样和欠采样）

python数据预处理 :样本分布不均的解决(过采样和欠采样)

过采样与欠采样&图像重采样（上采样&下采样）

深度学习中的采样：下采样，上采样，欠采样，过采样

类别不平衡问题的解决方法：过采样、欠采样和再平衡

欠采样

Python sklearn 实现过采样和欠采样

机器学习（三十）：过采样和欠采样技术

图像操作中的上采样、下采样，过采样、欠采样等

欠采样（undersampling）和过采样（oversampling）会对模型带来怎样的影响

机器学习中欠拟合和过拟合/上采样和下采样

分类中解决类别不平衡问题：imbalanced-learn、过采样、欠采样

机器学习（三十一）：深度神经网络的过采样和欠采样

采样方法

采样方法A

smote(过采样算法)

ADC过采样

采样

过采样（处理数据不平衡问题）

SMOTE过采样处理不均衡数据（imbalanced data）

处理不平衡数据的过采样技术对比总结

MAHAKIL之最新类不平衡过采样方法

浅谈SMOTE之类不平衡过采样方法

ADC欠采样以及应用案例

过采样算法之SMOTE

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)