Pycaret 3.0 功能抢先体验

点击以下链接可以一键运行例程,无需安装配置环境 ↓ ↓ ↓

Pycaret 3.0 功能抢先体验https://www.heywhale.com/mw/project/62d378a2de3942457a52c905

Pycaret简介

PyCaret 是一个开源、低代码的Python机器学习库,仅用几行代码替换数百行代码,可自动执行机器学习工作流,是一种端到端的机器学习和模型管理工具

使用手册

官方的使用手册:Welcome to PyCaret - PyCaret Official

支持功能

Pycaret 模块支持的功能有数据处理,模型训练,参数搜索,模型可解释性,模型选择,实验日志查询

Cheat Sheet

安装Pycaret 3.0 预览版

!pip install llvmlite==0.38.1 -i https://pypi.tuna.tsinghua.edu.cn/simple --ignore-installed
!pip install pycaret==3.0.0rc3 -i https://pypi.tuna.tsinghua.edu.cn/simple #--ignore-installed

查看安装结果

import pycaret
print(pycaret.__version__)
> 3.0.0.rc3

Pycaret 内置数据集

from pycaret.datasets import get_data
all_datasets = get_data('index')
all_datasets
Dataset Data Types Default Task Target Variable 1 Target Variable 2 # Instances # Attributes Missing Values
0 anomaly Multivariate Anomaly Detection None None 1000 10 N
1 france Multivariate Association Rule Mining InvoiceNo Description 8557 8 N
2 germany Multivariate Association Rule Mining InvoiceNo Description 9495 8 N
3 bank Multivariate Classification (Binary) deposit None 45211 17 N
4 blood Multivariate Classification (Binary) Class None 748 5 N
5 cancer Multivariate Classification (Binary) Class None 683 10 N
6 credit Multivariate Classification (Binary) default None 24000 24 N
7 diabetes Multivariate Classification (Binary) Class variable None 768 9 N
8 electrical_grid Multivariate Classification (Binary) stabf None 10000 14 N
9 employee Multivariate Classification (Binary) left None 14999 10 N
10 heart Multivariate Classification (Binary) DEATH None 200 16 N
11 heart_disease Multivariate Classification (Binary) Disease None 270 14 N
12 hepatitis Multivariate Classification (Binary) Class None 154 32 Y
13 income Multivariate Classification (Binary) income >50K None 32561 14 Y
14 juice Multivariate Classification (Binary) Purchase None 1070 15 N
15 nba Multivariate Classification (Binary) TARGET_5Yrs None 1340 21 N
16 wine Multivariate Classification (Binary) type None 6498 13 N
17 telescope Multivariate Classification (Binary) Class None 19020 11 N
18 titanic Multivariate Classification (Binary) Survived None 891 11 Y
19 us_presidential_election_results Multivariate Classification (Binary) party_winner None 497 7 N
20 glass Multivariate Classification (Multiclass) Type None 214 10 N
21 iris Multivariate Classification (Multiclass) species None 150 5 N
22 poker Multivariate Classification (Multiclass) CLASS None 100000 11 N
23 questions Multivariate Classification (Multiclass) Next_Question None 499 4 N
24 satellite Multivariate Classification (Multiclass) Class None 6435 37 N
25 CTG Multivariate Classification (Multiclass) NSP None 2129 40 Y
26 asia_gdp Multivariate Clustering None None 40 11 N
27 elections Multivariate Clustering None None 3195 54 Y
28 facebook Multivariate Clustering None None 7050 12 N
29 ipl Multivariate Clustering None None 153 25 N
30 jewellery Multivariate Clustering None None 505 4 N
31 mice Multivariate Clustering None None 1080 82 Y
32 migration Multivariate Clustering None None 233 12 N
33 perfume Multivariate Clustering None None 20 29 N
34 pokemon Multivariate Clustering None None 800 13 Y
35 population Multivariate Clustering None None 255 56 Y
36 public_health Multivariate Clustering None None 224 21 N
37 seeds Multivariate Clustering None None 210 7 N
38 wholesale Multivariate Clustering None None 440 8 N
39 tweets Text NLP tweet None 8594 2 N
40 amazon Text NLP / Classification reviewText None 20000 2 N
41 kiva Text NLP / Classification en None 6818 7 N
42 spx Text NLP / Regression text None 874 4 N
43 wikipedia Text NLP / Classification Text None 500 3 N
44 automobile Multivariate Regression price None 202 26 Y
45 bike Multivariate Regression cnt None 17379 15 N
46 boston Multivariate Regression medv None 506 14 N
47 concrete Multivariate Regression strength None 1030 9 N
48 diamond Multivariate Regression Price None 6000 8 N
49 energy Multivariate Regression Heating Load Cooling Load 768 10 N
50 forest Multivariate Regression area None 517 13 N
51 gold Multivariate Regression Gold_T+22 None 2558 121 N
52 house Multivariate Regression SalePrice None 1461 81 Y
53 insurance Multivariate Regression charges None 1338 7 N
54 parkinsons Multivariate Regression PPE None 5875 22 N
55 traffic Multivariate Regression traffic_volume None 48204 8 N

>

Dataset Data Types Default Task Target Variable 1 Target Variable 2 # Instances # Attributes Missing Values
0 anomaly Multivariate Anomaly Detection None None 1000 10 N
1 france Multivariate Association Rule Mining InvoiceNo Description 8557 8 N
2 germany Multivariate Association Rule Mining InvoiceNo Description 9495 8 N
3 bank Multivariate Classification (Binary) deposit None 45211 17 N
4 blood Multivariate Classification (Binary) Class None 748 5 N
5 cancer Multivariate Classification (Binary) Class None 683 10 N
6 credit Multivariate Classification (Binary) default None 24000 24 N
7 diabetes Multivariate Classification (Binary) Class variable None 768 9 N
8 electrical_grid Multivariate Classification (Binary) stabf None 10000 14 N
9 employee Multivariate Classification (Binary) left None 14999 10 N
10 heart Multivariate Classification (Binary) DEATH None 200 16 N
11 heart_disease Multivariate Classification (Binary) Disease None 270 14 N
12 hepatitis Multivariate Classification (Binary) Class None 154 32 Y
13 income Multivariate Classification (Binary) income >50K None 32561 14 Y
14 juice Multivariate Classification (Binary) Purchase None 1070 15 N
15 nba Multivariate Classification (Binary) TARGET_5Yrs None 1340 21 N
16 wine Multivariate Classification (Binary) type None 6498 13 N
17 telescope Multivariate Classification (Binary) Class None 19020 11 N
18 titanic Multivariate Classification (Binary) Survived None 891 11 Y
19 us_presidential_election_results Multivariate Classification (Binary) party_winner None 497 7 N
20 glass Multivariate Classification (Multiclass) Type None 214 10 N
21 iris Multivariate Classification (Multiclass) species None 150 5 N
22 poker Multivariate Classification (Multiclass) CLASS None 100000 11 N
23 questions Multivariate Classification (Multiclass) Next_Question None 499 4 N
24 satellite Multivariate Classification (Multiclass) Class None 6435 37 N
25 CTG Multivariate Classification (Multiclass) NSP None 2129 40 Y
26 asia_gdp Multivariate Clustering None None 40 11 N
27 elections Multivariate Clustering None None 3195 54 Y
28 facebook Multivariate Clustering None None 7050 12 N
29 ipl Multivariate Clustering None None 153 25 N
30 jewellery Multivariate Clustering None None 505 4 N
31 mice Multivariate Clustering None None 1080 82 Y
32 migration Multivariate Clustering None None 233 12 N
33 perfume Multivariate Clustering None None 20 29 N
34 pokemon Multivariate Clustering None None 800 13 Y
35 population Multivariate Clustering None None 255 56 Y
36 public_health Multivariate Clustering None None 224 21 N
37 seeds Multivariate Clustering None None 210 7 N
38 wholesale Multivariate Clustering None None 440 8 N
39 tweets Text NLP tweet None 8594 2 N
40 amazon Text NLP / Classification reviewText None 20000 2 N
41 kiva Text NLP / Classification en None 6818 7 N
42 spx Text NLP / Regression text None 874 4 N
43 wikipedia Text NLP / Classification Text None 500 3 N
44 automobile Multivariate Regression price None 202 26 Y
45 bike Multivariate Regression cnt None 17379 15 N
46 boston Multivariate Regression medv None 506 14 N
47 concrete Multivariate Regression strength None 1030 9 N
48 diamond Multivariate Regression Price None 6000 8 N
49 energy Multivariate Regression Heating Load Cooling Load 768 10 N
50 forest Multivariate Regression area None 517 13 N
51 gold Multivariate Regression Gold_T+22 None 2558 121 N
52 house Multivariate Regression SalePrice None 1461 81 Y
53 insurance Multivariate Regression charges None 1338 7 N
54 parkinsons Multivariate Regression PPE None 5875 22 N
55 traffic Multivariate Regression traffic_volume None 48204 8 N

Pycaret 时间序列预测

导入模块

# 导入Pycaret内置数据
from pycaret.datasets import get_data
# 导入Pycaret时间序列预测模型(3.x版本新增)
from pycaret.time_series import *

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

读取数据

# 使用内置数据airline
data = get_data('airline')
data

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
Freq: M, Name: Number of airline passengers, dtype: float64

>

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
           ...  
1960-08    606.0
1960-09    508.0
1960-10    461.0
1960-11    390.0
1960-12    432.0
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64

数据探索分析

# 时间序列绘制
data.plot()

>

# 统计测试
check_stats()

>

Test Test Name Data Property Setting Value
0 Summary Statistics Transformed Length 144.0
1 Summary Statistics Transformed # Missing Values 0.0
2 Summary Statistics Transformed Mean 280.298611
3 Summary Statistics Transformed Median 265.5
4 Summary Statistics Transformed Standard Deviation 119.966317
5 Summary Statistics Transformed Variance 14391.917201
6 Summary Statistics Transformed Kurtosis -0.364942
7 Summary Statistics Transformed Skewness 0.58316
8 Summary Statistics Transformed # Distinct Values 118.0
9 White Noise Ljung-Box Transformed Test Statictic {'alpha': 0.05, 'K': 24} 1606.083817
10 White Noise Ljung-Box Transformed Test Statictic {'alpha': 0.05, 'K': 48} 1933.155822
11 White Noise Ljung-Box Transformed p-value {'alpha': 0.05, 'K': 24} 0.0
12 White Noise Ljung-Box Transformed p-value {'alpha': 0.05, 'K': 48} 0.0
13 White Noise Ljung-Box Transformed White Noise {'alpha': 0.05, 'K': 24} False
14 White Noise Ljung-Box Transformed White Noise {'alpha': 0.05, 'K': 48} False
15 Stationarity ADF Transformed Stationarity {'alpha': 0.05} False
16 Stationarity ADF Transformed p-value {'alpha': 0.05} 0.99188
17 Stationarity ADF Transformed Test Statistic {'alpha': 0.05} 0.815369
18 Stationarity ADF Transformed Critical Value 1% {'alpha': 0.05} -3.481682
19 Stationarity ADF Transformed Critical Value 5% {'alpha': 0.05} -2.884042
20 Stationarity ADF Transformed Critical Value 10% {'alpha': 0.05} -2.57877
21 Stationarity KPSS Transformed Trend Stationarity {'alpha': 0.05} True
22 Stationarity KPSS Transformed p-value {'alpha': 0.05} 0.1
23 Stationarity KPSS Transformed Test Statistic {'alpha': 0.05} 0.09615
24 Stationarity KPSS Transformed Critical Value 10% {'alpha': 0.05} 0.119
25 Stationarity KPSS Transformed Critical Value 5% {'alpha': 0.05} 0.146
26 Stationarity KPSS Transformed Critical Value 2.5% {'alpha': 0.05} 0.176
27 Stationarity KPSS Transformed Critical Value 1% {'alpha': 0.05} 0.216
28 Normality Shapiro Transformed Normality {'alpha': 0.05} False
29 Normality Shapiro Transformed p-value {'alpha': 0.05} 0.000068

寻找最佳模型

# 初始化
s = setup(data, fh = 12, session_id = 123) 
# 模型比较
best = compare_models()

Description Value
0 session_id 123
1 Target Number of airline passengers
2 Approach Univariate
3 Exogenous Variables Not Present
4 Original data shape (144, 1)
5 Transformed data shape (144, 1)
6 Transformed train set shape (132, 1)
7 Transformed test set shape (12, 1)
8 Rows with missing values 0.0%
9 Fold Generator ExpandingWindowSplitter
10 Fold Number 3
11 Enforce Prediction Interval False
12 Seasonal Period(s) Tested 12
13 Seasonality Present True
14 Seasonalities Detected [12]
15 Primary Seasonality 12
16 Target Strictly Positive True
17 Target White Noise No
18 Recommended d 1
19 Recommended Seasonal D 1
20 Preprocess False
21 CPU Jobs -1
22 Use GPU False
23 Log Experiment False
24 Experiment Name ts-default-name
25 USI 98d2
Model MASE RMSSE MAE RMSE MAPE SMAPE R2 TT (Sec)
exp_smooth Exponential Smoothing 0.5716 0.5997 16.7767 19.7954 0.0422 0.0427 0.8954 0.0400
ets ETS 0.5931 0.6212 17.4172 20.5108 0.0440 0.0445 0.8882 0.0767
et_cds_dt Extra Trees w/ Cond. Deseasonalize & Detrending 0.6602 0.7288 19.4653 24.1050 0.0484 0.0484 0.8459 0.1167
huber_cds_dt Huber w/ Cond. Deseasonalize & Detrending 0.6813 0.7866 20.0334 25.9670 0.0491 0.0499 0.8113 0.0267
arima ARIMA 0.6830 0.6735 20.0069 22.2199 0.0501 0.0507 0.8677 0.0900
lr_cds_dt Linear w/ Cond. Deseasonalize & Detrending 0.7004 0.7702 20.6084 25.4401 0.0509 0.0514 0.8215 0.0267
ridge_cds_dt Ridge w/ Cond. Deseasonalize & Detrending 0.7004 0.7703 20.6086 25.4405 0.0509 0.0514 0.8215 0.0233
lar_cds_dt Least Angular Regressor w/ Cond. Deseasonalize & Detrending 0.7004 0.7702 20.6084 25.4401 0.0509 0.0514 0.8215 0.0233
en_cds_dt Elastic Net w/ Cond. Deseasonalize & Detrending 0.7029 0.7732 20.6816 25.5362 0.0511 0.0516 0.8201 0.0267
lasso_cds_dt Lasso w/ Cond. Deseasonalize & Detrending 0.7048 0.7751 20.7373 25.6005 0.0512 0.0517 0.8193 0.0200
catboost_cds_dt CatBoost Regressor w/ Cond. Deseasonalize & Detrending 0.7106 0.8146 20.9112 26.8907 0.0505 0.0509 0.8085 0.9433
br_cds_dt Bayesian Ridge w/ Cond. Deseasonalize & Detrending 0.7112 0.7837 20.9213 25.8795 0.0515 0.0521 0.8144 0.0233
knn_cds_dt K Neighbors w/ Cond. Deseasonalize & Detrending 0.7162 0.8157 21.1613 26.9700 0.0521 0.0529 0.7811 0.0300
auto_arima Auto ARIMA 0.7181 0.7114 21.0297 23.4661 0.0525 0.0531 0.8509 1.6967
gbr_cds_dt Gradient Boosting w/ Cond. Deseasonalize & Detrending 0.7938 0.9310 23.3723 30.7344 0.0569 0.0576 0.7417 0.0367
xgboost_cds_dt Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending 0.8155 0.9591 24.0738 31.6950 0.0582 0.0592 0.7118 152.0800
lightgbm_cds_dt Light Gradient Boosting w/ Cond. Deseasonalize & Detrending 0.8156 0.9117 24.0002 30.0956 0.0575 0.0587 0.7561 86.6467
rf_cds_dt Random Forest w/ Cond. Deseasonalize & Detrending 0.8327 0.9465 24.5290 31.2635 0.0600 0.0606 0.7360 0.1400
ada_cds_dt AdaBoost w/ Cond. Deseasonalize & Detrending 0.8825 1.0292 25.9471 33.9304 0.0619 0.0637 0.6725 0.0500
llar_cds_dt Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending 0.9670 1.1915 28.4499 39.3303 0.0665 0.0693 0.5738 0.0233
theta Theta Forecaster 0.9729 1.0306 28.3192 33.8639 0.0670 0.0700 0.6710 0.0167
omp_cds_dt Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending 1.0090 1.2370 29.6294 40.8121 0.0685 0.0718 0.5462 0.0200
dt_cds_dt Decision Tree w/ Cond. Deseasonalize & Detrending 1.0429 1.2226 30.4800 40.1912 0.0726 0.0753 0.5362 0.0333
snaive Seasonal Naive Forecaster 1.1479 1.0945 33.3611 35.9139 0.0832 0.0879 0.6072 0.0133
par_cds_dt Passive Aggressive w/ Cond. Deseasonalize & Detrending 1.2472 1.3081 36.7727 43.3215 0.0935 0.0961 0.4968 0.0233
polytrend Polynomial Trend Forecaster 1.6523 1.9202 48.6301 63.4299 0.1170 0.1216 -0.0784 0.0100
croston Croston 1.9311 2.3517 56.6180 77.5856 0.1295 0.1439 -0.6281 0.0100
naive Naive Forecaster 2.3599 2.7612 69.0278 91.0322 0.1569 0.1792 -1.2216 0.6467
grand_means Grand Means Forecaster 5.5306 5.2596 162.4117 173.6492 0.4000 0.5075 -7.0462 0.5100
Processing:   0%|          | 0/125 [00:00<?, ?it/s]
# 查看最优模型
best

>

ExponentialSmoothing

ExponentialSmoothing(seasonal='mul', sp=12, trend='add')
exp_smooth = create_model('exp_smooth')
print(exp_smooth)

>

cutoff MASE RMSSE MAE RMSE MAPE SMAPE R2
0 1956-12 0.4985 0.5735 14.5584 18.7730 0.0366 0.0376 0.8853
1 1957-12 0.5088 0.5368 15.5548 18.2243 0.0420 0.0411 0.9130
2 1958-12 0.7075 0.6888 20.2167 22.3888 0.0479 0.0494 0.8879
Mean NaT 0.5716 0.5997 16.7767 19.7954 0.0422 0.0427 0.8954
SD NaT 0.0962 0.0647 2.4663 1.8475 0.0046 0.0049 0.0125
Processing:   0%|          | 0/4 [00:00<?, ?it/s]
ExponentialSmoothing(seasonal='mul', sp=12, trend='add')
# 模型预测评估
pred = predict_model(exp_smooth)

>

Model MASE RMSSE MAE RMSE MAPE SMAPE R2
0 Exponential Smoothing 0.3383 0.4576 10.3023 15.8096 0.0221 0.0216 0.9549

猜你喜欢

转载自blog.csdn.net/qazwsxpy/article/details/125845162