Machine Learning Notes - First Experience of AutoML Framework LightGBM

1. Overview of the LightGBM framework

        GBDT (Gradient Boosting Decision Tree) is an enduring model in machine learning. Its main idea is to use weak classifiers (decision trees) to iteratively train to obtain the optimal model. The model has good training effect and is not easy to overfit. Etc. GBDT is not only widely used in the industry, but is usually used for tasks such as multi-classification, click-through rate prediction, search ranking, etc. It is also a deadly weapon in various data mining competitions. According to statistics, more than half of the championship schemes in Kaggle competitions are based on GBDT.

        LightGBM (Light Gradient Boosting Machine) is a framework that implements the GBDT algorithm, which uses a tree-based learning algorithm. Has the following advantages:

        1. Faster training speed and higher efficiency.

        2. Reduce memory usage.

        3. Better accuracy.

        4. Support parallel, distributed and GPU learning.

        5. Ability to process large-scale data.

        Comparative experiments on public datasets show that LightGBM outperforms existing boosting frameworks in both efficiency and accuracy, and significantly reduces memory consumption. What's more, distributed learning experiments show that LightGBM can achieve linear speedup by using multiple machines to train under specific settings.

        Use document address

Welcome to LightGBM’s documentation! — LightGBM 3.3.2.99 documentationhttps://lightgbm.readthedocs.io/en/latest/        github地址

GitHub - microsoft/LightGBM: A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. - GitHub - microsoft/LightGBM: A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.https://github.com/Microsoft/LightGBM        快速安装

pip install lightgbm

        Related papers

https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdficon-default.png?t=M276https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

2. Simple example

        The example is based on the Kaggle table playground 2022Feb competition. Refer to the game situation.

Machine Learning Notes - Kaggle Form Playground Feb 2022 Learning One , the official meaning is a competition for beginners, but here you can see the masters' thinking and handling methods, which is indeed a lot of benefits. Tabular Playground Series - Feb 2022 | Kaggle https://www.kaggle.com/c/tabular-playground-series-feb-2022/overview1, Tabular Playground Serie... https://blog.csdn.net/bashendixie5/article /details/123034023        training code

import lightgbm as lgb
import pandas as pd
import pickle

print("LGB test")
clf = lgb.LGBMClassifier(
        boosting_type='gbdt', num_leaves=55, reg_alpha=0.0, reg_lambda=1,
        max_depth=15, n_estimators=6000, objective='binary',
        subsample=0.8, colsample_bytree=0.8, subsample_freq=1,
        learning_rate=0.06, min_child_weight=1, random_state=20, n_jobs=-1
    )

X = pd.read_csv('data/train_data.csv')
label = pd.read_csv('data/train_label.csv')
y = label.target

clf.fit(X, y, callbacks=[lgb.log_evaluation(period=1, show_stdv=True)])
#pre=clf.predict(testdata)

# 保存模型
s=pickle.dumps(clf)
f=open('lightgbm_v2.model', "wb+")
f.write(s)
f.close()

        test code

print("这是lightgbm")
f2 = open('lightgbm_v2.model', 'rb')
s2 = f2.read()
model1 = pickle.loads(s2)
test_X = pd.read_csv('data/test.csv')

predictions = model1.predict(test_X)
preds = []
for pred in predictions:
    preds.append(week_day_dict[pred])

res = pd.DataFrame()
res['target'] = preds
res.to_csv("predict_lightgbm_v2.csv")

        The data has not been processed, and submitted to kaggle after training, with a score of 0.95169. The score can only be said to be unsatisfactory and needs to be adjusted.

Guess you like

Origin blog.csdn.net/bashendixie5/article/details/123554288