日萌社
人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新)
集成学习:Bagging、随机森林、Boosting、GBDT
5.1 xgboost算法原理
XGBoost(Extreme Gradient Boosting)全名叫极端梯度提升树,XGBoost是集成学习方法的王牌,在Kaggle数据挖掘比赛中,大部分获胜者用了XGBoost。
XGBoost在绝大多数的回归和分类问题上表现的十分顶尖,本节将较详细的介绍XGBoost的算法原理。
1 最优模型的构建方法
我们在前面已经知道,构建最优模型的一般方法是最小化训练数据的损失函数。
我们用字母 L表示损失,如下式:
其中,F是假设空间
假设空间是在已知属性和属性可能取值的情况下,对所有可能满足目标的情况的一种毫无遗漏的假设集合。
式(1.1)称为经验风险最小化,训练得到的模型复杂度较高。当训练数据较小时,模型很容易出现过拟合问题。
因此,为了降低模型的复杂度,常采用下式:
其中J(f)J(f)为模型的复杂度,
式(2.1)称为结构风险最小化,结构风险最小化的模型往往对训练数据以及未知的测试数据都有较好的预测 。
应用:
- 决策树的生成和剪枝分别对应了经验风险最小化和结构风险最小化,
- XGBoost的决策树生成是结构风险最小化的结果,后续会详细介绍。
2 XGBoost的目标函数推导
2.1 目标函数确定
目标函数,即损失函数,通过最小化损失函数来构建最优模型。
由前面可知, 损失函数应加上表示模型复杂度的正则项,且XGBoost对应的模型包含了多个CART树,因此,模型的目标函数为:
2.2 CART树的介绍
2.3 树的复杂度定义
2.3.1 定义每课树的复杂度
XGBoost法对应的模型包含了多棵cart树,定义每棵树的复杂度:
2.3.2 树的复杂度举例
假设我们要预测一家人对电子游戏的喜好程度,考虑到年轻和年老相比,年轻更可能喜欢电子游戏,以及男性和女性相比,男性更喜欢电子游戏,故先根据年龄大小区分小孩和大人,然后再通过性别区分开是男是女,逐一给各人在电子游戏喜好程度上打分,如下图所示:
就这样,训练出了2棵树tree1和tree2,类似之前gbdt的原理,两棵树的结论累加起来便是最终的结论,所以:
- 小男孩的预测分数就是两棵树中小孩所落到的结点的分数相加:2 + 0.9 = 2.9。
- 爷爷的预测分数同理:-1 + (-0.9)= -1.9。
具体如下图所示:
2.4 目标函数推导
3 XGBoost的回归树构建方法
3.1 计算分裂节点
在实际训练过程中,当建立第 t 棵树时,XGBoost采用贪心法进行树结点的分裂:
从树深为0时开始:
-
对树中的每个叶子结点尝试进行分裂;
-
每次分裂后,原来的一个叶子结点继续分裂为左右两个子叶子结点,原叶子结点中的样本集将根据该结点的判断规则分散到左右两个叶子结点中;
-
新分裂一个结点后,我们需要检测这次分裂是否会给损失函数带来增益,增益的定义如下:
如果增益Gain>0,即分裂为两个叶子节点后,目标函数下降了,那么我们会考虑此次分裂的结果。
那么一直这样分裂,什么时候才会停止呢?
3.2 停止分裂条件判断
情况一:上节推导得到的打分函数是衡量树结构好坏的标准,因此,可用打分函数来选择最佳切分点。首先确定样本特征的所有切分点,对每一个确定的切分点进行切分,切分好坏的标准如下:
4 XGBoost与GDBT的区别
- 区别一:
- XGBoost生成CART树考虑了树的复杂度,
- GDBT未考虑,GDBT在树的剪枝步骤中考虑了树的复杂度。
- 区别二:
- XGBoost是拟合上一轮损失函数的二阶导展开,GDBT是拟合上一轮损失函数的一阶导展开,因此,XGBoost的准确性更高,且满足相同的训练效果,需要的迭代次数更少。
- 区别三:
- XGBoost与GDBT都是逐次迭代来提高模型性能,但是XGBoost在选取最佳切分点时可以开启多线程进行,大大提高了运行速度。
5 小结
5.2 xgboost算法api介绍
1 xgboost的安装:
官网链接:https://xgboost.readthedocs.io/en/latest/
pip3 install xgboost
2 xgboost参数介绍
xgboost虽然被称为kaggle比赛神奇,但是,我们要想训练出不错的模型,必须要给参数传递合适的值。
xgboost中封装了很多参数,主要由三种类型构成:通用参数(general parameters),Booster 参数(booster parameters)和学习目标参数(task parameters)
- 通用参数:主要是宏观函数控制;
- Booster参数:取决于选择的Booster类型,用于控制每一步的booster(tree, regressiong);
- 学习目标参数:控制训练目标的表现。
2.1 通用参数(general parameters)
- booster [缺省值=gbtree]
-
决定使用哪个booster,可以是gbtree,gblinear或者dart。
- gbtree和dart使用基于树的模型(dart 主要多了 Dropout),而gblinear 使用线性函数.
-
silent [缺省值=0]
- 设置为0打印运行信息;设置为1静默模式,不打印
-
nthread [缺省值=设置为最大可能的线程数]
- 并行运行xgboost的线程数,输入的参数应该<=系统的CPU核心数,若是没有设置算法会检测将其设置为CPU的全部核心数
下面的两个参数不需要设置,使用默认的就好了
-
num_pbuffer [xgboost自动设置,不需要用户设置]
- 预测结果缓存大小,通常设置为训练实例的个数。该缓存用于保存最后boosting操作的预测结果。
-
num_feature [xgboost自动设置,不需要用户设置]
- 在boosting中使用特征的维度,设置为特征的最大维度
2.2 Booster 参数(booster parameters)
2.2.1 Parameters for Tree Booster
-
eta [缺省值=0.3,别名:learning_rate]
-
更新中减少的步长来防止过拟合。
-
在每次boosting之后,可以直接获得新的特征权值,这样可以使得boosting更加鲁棒。
- 范围: [0,1]
-
-
gamma [缺省值=0,别名: min_split_loss](分裂最小loss)
- 在节点分裂时,只有分裂后损失函数的值下降了,才会分裂这个节点。
-
Gamma指定了节点分裂所需的最小损失函数下降值。 这个参数的值越大,算法越保守。这个参数的值和损失函数息息相关,所以是需要调整的。
-
范围: [0,∞]
-
max_depth [缺省值=6]
- 这个值为树的最大深度。 这个值也是用来避免过拟合的。max_depth越大,模型会学到更具体更局部的样本。设置为0代表没有限制
- 范围: [0,∞]
-
min_child_weight [缺省值=1]
- 决定最小叶子节点样本权重和。XGBoost的这个参数是最小样本权重的和.
- 当它的值较大时,可以避免模型学习到局部的特殊样本。 但是如果这个值过高,会导致欠拟合。这个参数需要使用CV来调整。.
- 范围: [0,∞]
-
subsample [缺省值=1]
- 这个参数控制对于每棵树,随机采样的比例。
-
减小这个参数的值,算法会更加保守,避免过拟合。但是,如果这个值设置得过小,它可能会导致欠拟合。
-
典型值:0.5-1,0.5代表平均采样,防止过拟合.
- 范围: (0,1]
-
colsample_bytree [缺省值=1]
- 用来控制每棵随机采样的列数的占比(每一列是一个特征)。
- 典型值:0.5-1
- 范围: (0,1]
-
colsample_bylevel [缺省值=1]
- 用来控制树的每一级的每一次分裂,对列数的采样的占比。
- 我个人一般不太用这个参数,因为subsample参数和colsample_bytree参数可以起到相同的作用。但是如果感兴趣,可以挖掘这个参数更多的用处。
- 范围: (0,1]
-
lambda [缺省值=1,别名: reg_lambda]
- 权重的L2正则化项(和Ridge regression类似)。
- 这个参数是用来控制XGBoost的正则化部分的。虽然大部分数据科学家很少用到这个参数,但是这个参数
- 在减少过拟合上还是可以挖掘出更多用处的。.
-
alpha [缺省值=0,别名: reg_alpha]
- 权重的L1正则化项。(和Lasso regression类似)。 可以应用在很高维度的情况下,使得算法的速度更快。
-
scale_pos_weight[缺省值=1]
- 在各类别样本十分不平衡时,把这个参数设定为一个正值,可以使算法更快收敛。通常可以将其设置为负
- 样本的数目与正样本数目的比值。
2.2.2 Parameters for Linear Booster
linear booster一般很少用到。
-
lambda [缺省值=0,别称: reg_lambda]
- L2正则化惩罚系数,增加该值会使得模型更加保守。
-
alpha [缺省值=0,别称: reg_alpha]
- L1正则化惩罚系数,增加该值会使得模型更加保守。
-
lambda_bias [缺省值=0,别称: reg_lambda_bias]
- 偏置上的L2正则化(没有在L1上加偏置,因为并不重要)
2.3 学习目标参数(task parameters)
-
objective [缺省值=reg:linear]
- “reg:linear” – 线性回归
- “reg:logistic” – 逻辑回归
- “binary:logistic” – 二分类逻辑回归,输出为概率
- “multi:softmax” – 使用softmax的多分类器,返回预测的类别(不是概率)。在这种情况下,你还需要多设一个参数:num_class(类别数目)
- “multi:softprob” – 和multi:softmax参数一样,但是返回的是每个数据属于各个类别的概率。
-
eval_metric [缺省值=通过目标函数选择]
可供选择的如下所示:
- “rmse”: 均方根误差
- “mae”: 平均绝对值误差
- “logloss”: 负对数似然函数值
- “error”: 二分类错误率。
- 其值通过错误分类数目与全部分类数目比值得到。对于预测,预测值大于0.5被认为是正类,其它归为负类。
- “error@t”: 不同的划分阈值可以通过 ‘t’进行设置
- “merror”: 多分类错误率,计算公式为(wrong cases)/(all cases)
- “mlogloss”: 多分类log损失
- “auc”: 曲线下的面积
-
seed [缺省值=0]
- 随机数的种子
- 设置它可以复现随机数据的结果,也可以用于调整参数
5.3 xgboost案例介绍
1 案例背景
该案例和前面决策树中所用案例一样。
泰坦尼克号沉没是历史上最臭名昭着的沉船事件之一。1912年4月15日,在她的处女航中,泰坦尼克号在与冰山相撞后沉没,在2224名乘客和机组人员中造成1502人死亡。这场耸人听闻的悲剧震惊了国际社会,并为船舶制定了更好的安全规定。 造成海难失事的原因之一是乘客和机组人员没有足够的救生艇。尽管幸存下沉有一些运气因素,但有些人比其他人更容易生存,例如妇女,儿童和上流社会。 在这个案例中,我们要求您完成对哪些人可能存活的分析。特别是,我们要求您运用机器学习工具来预测哪些乘客幸免于悲剧。
我们提取到的数据集中的特征包括票的类别,是否存活,乘坐班次,年龄,登陆home.dest,房间,船和性别等。
数据:http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt
经过观察数据得到:
- 1 乘坐班是指乘客班(1,2,3),是社会经济阶层的代表。
- 2 其中age数据存在缺失。
2 步骤分析
- 1.获取数据
- 2.数据基本处理
- 2.1 确定特征值,目标值
- 2.2 缺失值处理
- 2.3 数据集划分
- 3.特征工程(字典特征抽取)
- 4.机器学习(xgboost)
- 5.模型评估
3 代码实现
- 导入需要的模块
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
- 1.获取数据
# 1、获取数据
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
-
2.数据基本处理
- 2.1 确定特征值,目标值
x = titan[["pclass", "age", "sex"]] y = titan["survived"]
- 2.2 缺失值处理
# 缺失值需要处理,将特征当中有类别的这些特征进行字典特征抽取 x['age'].fillna(x['age'].mean(), inplace=True)
- 2.3 数据集划分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
-
3.特征工程(字典特征抽取)
特征中出现类别符号,需要进行one-hot编码处理(DictVectorizer)
x.to_dict(orient="records") 需要将数组特征转换成字典数据
# 对于x转换成字典数据x.to_dict(orient="records")
# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]
transfer = DictVectorizer(sparse=False)
x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))
- 4.xgboost模型训练和模型评估
# 模型初步训练
from xgboost import XGBClassifier
xg = XGBClassifier()
xg.fit(x_train, y_train)
xg.score(x_test, y_test)
# 针对max_depth进行模型调优
depth_range = range(10)
score = []
for i in depth_range:
xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
xg.fit(x_train, y_train)
s = xg.score(x_test, y_test)
print(s)
score.append(s)
# 结果可视化
import matplotlib.pyplot as plt
plt.plot(depth_range, score)
plt.show()
In [1]:
# 1.获取数据
# 2.数据基本处理
# 2.1 确定特征值,目标值
# 2.2 缺失值处理
# 2.3 数据集划分
# 3.特征工程(字典特征抽取)
# 4.机器学习(xgboost)
# 5.模型评估
In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz
In [3]:
# 1.获取数据
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
In [4]:
titan
Out[4]:
row.names | pclass | survived | name | age | embarked | home.dest | room | ticket | boat | sex | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1st | 1 | Allen, Miss Elisabeth Walton | 29.0000 | Southampton | St Louis, MO | B-5 | 24160 L221 | 2 | female |
1 | 2 | 1st | 0 | Allison, Miss Helen Loraine | 2.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
2 | 3 | 1st | 0 | Allison, Mr Hudson Joshua Creighton | 30.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | (135) | male |
3 | 4 | 1st | 0 | Allison, Mrs Hudson J.C. (Bessie Waldo Daniels) | 25.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
4 | 5 | 1st | 1 | Allison, Master Hudson Trevor | 0.9167 | Southampton | Montreal, PQ / Chesterville, ON | C22 | NaN | 11 | male |
5 | 6 | 1st | 1 | Anderson, Mr Harry | 47.0000 | Southampton | New York, NY | E-12 | NaN | 3 | male |
6 | 7 | 1st | 1 | Andrews, Miss Kornelia Theodosia | 63.0000 | Southampton | Hudson, NY | D-7 | 13502 L77 | 10 | female |
7 | 8 | 1st | 0 | Andrews, Mr Thomas, jr | 39.0000 | Southampton | Belfast, NI | A-36 | NaN | NaN | male |
8 | 9 | 1st | 1 | Appleton, Mrs Edward Dale (Charlotte Lamson) | 58.0000 | Southampton | Bayside, Queens, NY | C-101 | NaN | 2 | female |
9 | 10 | 1st | 0 | Artagaveytia, Mr Ramon | 71.0000 | Cherbourg | Montevideo, Uruguay | NaN | NaN | (22) | male |
10 | 11 | 1st | 0 | Astor, Colonel John Jacob | 47.0000 | Cherbourg | New York, NY | NaN | 17754 L224 10s 6d | (124) | male |
11 | 12 | 1st | 1 | Astor, Mrs John Jacob (Madeleine Talmadge Force) | 19.0000 | Cherbourg | New York, NY | NaN | 17754 L224 10s 6d | 4 | female |
12 | 13 | 1st | 1 | Aubert, Mrs Leontine Pauline | NaN | Cherbourg | Paris, France | B-35 | 17477 L69 6s | 9 | female |
13 | 14 | 1st | 1 | Barkworth, Mr Algernon H. | NaN | Southampton | Hessle, Yorks | A-23 | NaN | B | male |
14 | 15 | 1st | 0 | Baumann, Mr John D. | NaN | Southampton | New York, NY | NaN | NaN | NaN | male |
15 | 16 | 1st | 1 | Baxter, Mrs James (Helene DeLaudeniere Chaput) | 50.0000 | Cherbourg | Montreal, PQ | B-58/60 | NaN | 6 | female |
16 | 17 | 1st | 0 | Baxter, Mr Quigg Edmond | 24.0000 | Cherbourg | Montreal, PQ | B-58/60 | NaN | NaN | male |
17 | 18 | 1st | 0 | Beattie, Mr Thomson | 36.0000 | Cherbourg | Winnipeg, MN | C-6 | NaN | NaN | male |
18 | 19 | 1st | 1 | Beckwith, Mr Richard Leonard | 37.0000 | Southampton | New York, NY | D-35 | NaN | 5 | male |
19 | 20 | 1st | 1 | Beckwith, Mrs Richard Leonard (Sallie Monypeny) | 47.0000 | Southampton | New York, NY | D-35 | NaN | 5 | female |
20 | 21 | 1st | 1 | Behr, Mr Karl Howell | 26.0000 | Cherbourg | New York, NY | C-148 | NaN | 5 | male |
21 | 22 | 1st | 0 | Birnbaum, Mr Jakob | 25.0000 | Cherbourg | San Francisco, CA | NaN | NaN | (148) | male |
22 | 23 | 1st | 1 | Bishop, Mr Dickinson H. | 25.0000 | Cherbourg | Dowagiac, MI | B-49 | NaN | 7 | male |
23 | 24 | 1st | 1 | Bishop, Mrs Dickinson H. (Helen Walton) | 19.0000 | Cherbourg | Dowagiac, MI | B-49 | NaN | 7 | female |
24 | 25 | 1st | 1 | Bjornstrm-Steffansson, Mr Mauritz Hakan | 28.0000 | Southampton | Stockholm, Sweden / Washington, DC | NaN | D | male | |
25 | 26 | 1st | 0 | Blackwell, Mr Stephen Weart | 45.0000 | Southampton | Trenton, NJ | NaN | NaN | (241) | male |
26 | 27 | 1st | 1 | Blank, Mr Henry | 39.0000 | Cherbourg | Glen Ridge, NJ | A-31 | NaN | 7 | male |
27 | 28 | 1st | 1 | Bonnell, Miss Caroline | 30.0000 | Southampton | Youngstown, OH | C-7 | NaN | 8 | female |
28 | 29 | 1st | 1 | Bonnell, Miss Elizabeth | 58.0000 | Southampton | Birkdale, England Cleveland, Ohio | C-103 | NaN | 8 | female |
29 | 30 | 1st | 0 | Borebank, Mr John James | NaN | Southampton | London / Winnipeg, MB | D-21/2 | NaN | NaN | male |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1283 | 1284 | 3rd | 0 | Vestrom, Miss Hulda Amanda Adolfina | NaN | NaN | NaN | NaN | NaN | NaN | female |
1284 | 1285 | 3rd | 0 | Vonk, Mr Jenko | NaN | NaN | NaN | NaN | NaN | NaN | male |
1285 | 1286 | 3rd | 0 | Ware, Mr Frederick | NaN | NaN | NaN | NaN | NaN | NaN | male |
1286 | 1287 | 3rd | 0 | Warren, Mr Charles William | NaN | NaN | NaN | NaN | NaN | NaN | male |
1287 | 1288 | 3rd | 0 | Wazli, Mr Yousif | NaN | NaN | NaN | NaN | NaN | NaN | male |
1288 | 1289 | 3rd | 0 | Webber, Mr James | NaN | NaN | NaN | NaN | NaN | NaN | male |
1289 | 1290 | 3rd | 1 | Wennerstrom, Mr August Edvard | NaN | NaN | NaN | NaN | NaN | NaN | male |
1290 | 1291 | 3rd | 0 | Wenzel, Mr Linhart | NaN | NaN | NaN | NaN | NaN | NaN | male |
1291 | 1292 | 3rd | 0 | Widegren, Mr Charles Peter | NaN | NaN | NaN | NaN | NaN | NaN | male |
1292 | 1293 | 3rd | 0 | Wiklund, Mr Jacob Alfred | NaN | NaN | NaN | NaN | NaN | NaN | male |
1293 | 1294 | 3rd | 1 | Wilkes, Mrs Ellen | NaN | NaN | NaN | NaN | NaN | NaN | female |
1294 | 1295 | 3rd | 0 | Willer, Mr Aaron | NaN | NaN | NaN | NaN | NaN | NaN | male |
1295 | 1296 | 3rd | 0 | Willey, Mr Edward | NaN | NaN | NaN | NaN | NaN | NaN | male |
1296 | 1297 | 3rd | 0 | Williams, Mr Howard Hugh | NaN | NaN | NaN | NaN | NaN | NaN | male |
1297 | 1298 | 3rd | 0 | Williams, Mr Leslie | NaN | NaN | NaN | NaN | NaN | NaN | male |
1298 | 1299 | 3rd | 0 | Windelov, Mr Einar | NaN | NaN | NaN | NaN | NaN | NaN | male |
1299 | 1300 | 3rd | 0 | Wirz, Mr Albert | NaN | NaN | NaN | NaN | NaN | NaN | male |
1300 | 1301 | 3rd | 0 | Wiseman, Mr Phillippe | NaN | NaN | NaN | NaN | NaN | NaN | male |
1301 | 1302 | 3rd | 0 | Wittevrongel, Mr Camiel | NaN | NaN | NaN | NaN | NaN | NaN | male |
1302 | 1303 | 3rd | 1 | Yalsevac, Mr Ivan | NaN | NaN | NaN | NaN | NaN | NaN | male |
1303 | 1304 | 3rd | 0 | Yasbeck, Mr Antoni | NaN | NaN | NaN | NaN | NaN | NaN | male |
1304 | 1305 | 3rd | 1 | Yasbeck, Mrs Antoni | NaN | NaN | NaN | NaN | NaN | NaN | female |
1305 | 1306 | 3rd | 0 | Youssef, Mr Gerios | NaN | NaN | NaN | NaN | NaN | NaN | male |
1306 | 1307 | 3rd | 0 | Zabour, Miss Hileni | NaN | NaN | NaN | NaN | NaN | NaN | female |
1307 | 1308 | 3rd | 0 | Zabour, Miss Tamini | NaN | NaN | NaN | NaN | NaN | NaN | female |
1308 | 1309 | 3rd | 0 | Zakarian, Mr Artun | NaN | NaN | NaN | NaN | NaN | NaN | male |
1309 | 1310 | 3rd | 0 | Zakarian, Mr Maprieder | NaN | NaN | NaN | NaN | NaN | NaN | male |
1310 | 1311 | 3rd | 0 | Zenn, Mr Philip | NaN | NaN | NaN | NaN | NaN | NaN | male |
1311 | 1312 | 3rd | 0 | Zievens, Rene | NaN | NaN | NaN | NaN | NaN | NaN | female |
1312 | 1313 | 3rd | 0 | Zimmerman, Leo | NaN | NaN | NaN | NaN | NaN | NaN | male |
1313 rows × 11 columns
In [5]:
titan.describe()
Out[5]:
row.names | survived | age | |
---|---|---|---|
count | 1313.000000 | 1313.000000 | 633.000000 |
mean | 657.000000 | 0.341965 | 31.194181 |
std | 379.174762 | 0.474549 | 14.747525 |
min | 1.000000 | 0.000000 | 0.166700 |
25% | 329.000000 | 0.000000 | 21.000000 |
50% | 657.000000 | 0.000000 | 30.000000 |
75% | 985.000000 | 1.000000 | 41.000000 |
max | 1313.000000 | 1.000000 | 71.000000 |
In [6]:
# 2.数据基本处理
# 2.1 确定特征值,目标值
x = titan[["pclass", "age", "sex"]]
y = titan["survived"]
In [7]:
x.head()
Out[7]:
pclass | age | sex | |
---|---|---|---|
0 | 1st | 29.0000 | female |
1 | 1st | 2.0000 | female |
2 | 1st | 30.0000 | male |
3 | 1st | 25.0000 | female |
4 | 1st | 0.9167 | male |
In [8]:
y.head()
Out[8]:
0 1
1 0
2 0
3 0
4 1
Name: survived, dtype: int64
In [9]:
# 2.2 缺失值处理
x['age'].fillna(value=titan["age"].mean(), inplace=True)
In [10]:
x.head()
Out[10]:
pclass | age | sex | |
---|---|---|---|
0 | 1st | 29.0000 | female |
1 | 1st | 2.0000 | female |
2 | 1st | 30.0000 | male |
3 | 1st | 25.0000 | female |
4 | 1st | 0.9167 | male |
In [11]:
# 2.3 数据集划分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)
In [12]:
# 3.特征工程(字典特征抽取)
In [13]:
x_train.head()
Out[13]:
pclass | age | sex | |
---|---|---|---|
649 | 3rd | 45.000000 | female |
1078 | 3rd | 31.194181 | male |
59 | 1st | 31.194181 | female |
201 | 1st | 18.000000 | male |
61 | 1st | 31.194181 | female |
In [14]:
x_train = x_train.to_dict(orient="records")
x_test = x_test.to_dict(orient="records")
In [15]:
x_train
Out[15]:
[{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 27.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 13.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 62.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 6.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 10.0, 'sex': 'female'},
{'pclass': '1st', 'age': 53.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'female'},
{'pclass': '1st', 'age': 25.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '1st', 'age': 21.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
{'pclass': '1st', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 16.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '1st', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 43.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 59.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 47.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 51.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'female'},
{'pclass': '1st', 'age': 58.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 4.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 12.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 44.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 69.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 2.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '1st', 'age': 47.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
{'pclass': '1st', 'age': 21.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 39.0, 'sex': 'male'},
{'pclass': '1st', 'age': 14.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 47.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '1st', 'age': 53.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'female'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 55.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 49.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 38.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 8.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 57.0, 'sex': 'male'},
{'pclass': '1st', 'age': 22.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 6.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 61.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 41.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'female'},
{'pclass': '1st', 'age': 34.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 39.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'female'},
{'pclass': '1st', 'age': 57.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 39.0, 'sex': 'male'},
{'pclass': '1st', 'age': 35.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 67.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 11.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '1st', 'age': 59.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 52.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 43.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '1st', 'age': 51.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 48.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 44.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 65.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 37.0, 'sex': 'female'},
{'pclass': '1st', 'age': 52.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 40.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 48.0, 'sex': 'female'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 2.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '1st', 'age': 29.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 0.9167, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 14.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '1st', 'age': 60.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 61.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 0.1667, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 15.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
{'pclass': '1st', 'age': 20.0, 'sex': 'female'},
{'pclass': '1st', 'age': 62.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 23.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 33.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 70.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 51.0, 'sex': 'female'},
{'pclass': '1st', 'age': 21.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 59.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 38.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 54.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 3.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 28.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 15.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 40.0, 'sex': 'female'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 8.0, 'sex': 'female'},
{'pclass': '1st', 'age': 63.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 43.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.0, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 38.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'male'},
{'pclass': '1st', 'age': 4.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 57.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 40.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 47.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 37.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 5.0, 'sex': 'female'},
{'pclass': '1st', 'age': 21.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 36.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 35.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 50.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'male'},
{'pclass': '1st', 'age': 11.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 26.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 49.0, 'sex': 'male'},
{'pclass': '1st', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 35.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 45.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 26.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'male'},
{'pclass': '1st', 'age': 24.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 18.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 64.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
{'pclass': '1st', 'age': 46.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 29.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
{'pclass': '1st', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 0.8333, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 58.0, 'sex': 'female'},
{'pclass': '1st', 'age': 60.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 44.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 71.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 13.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 58.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 33.0, 'sex': 'female'},
{'pclass': '1st', 'age': 33.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 48.0, 'sex': 'male'},
{'pclass': '1st', 'age': 28.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 55.0, 'sex': 'female'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '1st', 'age': 71.0, 'sex': 'male'},
{'pclass': '1st', 'age': 47.0, 'sex': 'female'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
{'pclass': '1st', 'age': 23.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
{'pclass': '1st', 'age': 54.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
{'pclass': '1st', 'age': 45.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 55.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '1st', 'age': 65.0, 'sex': 'male'},
{'pclass': '1st', 'age': 27.0, 'sex': 'male'},
{'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '2nd', 'age': 7.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 39.0, 'sex': 'female'},
{'pclass': '1st', 'age': 19.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
{'pclass': '1st', 'age': 56.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 38.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
{'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 42.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '2nd', 'age': 16.0, 'sex': 'male'},
{'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
{'pclass': '3rd', 'age': 2.0, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
{'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
{'pclass': '1st', 'age': 36.0, 'sex': 'male'},
...]
In [16]:
transfer = DictVectorizer()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
In [21]:
# 4.xgboost模型训练
# 4.1 初步模型训练
from xgboost import XGBClassifier
xg = XGBClassifier()
xg.fit(x_train, y_train)
Out[21]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
In [22]:
xg.score(x_test, y_test)
Out[22]:
0.7832699619771863
In [23]:
# 4.2 对max_depth进行调优
depth_range = range(10)
score = []
for i in depth_range:
xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
xg.fit(x_train, y_train)
s = xg.score(x_test, y_test)
print(s)
score.append(s)
0.6311787072243346
0.7908745247148289
0.7870722433460076
0.7832699619771863
0.7870722433460076
0.7908745247148289
0.7908745247148289
0.7946768060836502
0.7908745247148289
0.7946768060836502
In [25]:
# 4.3 调优结果可视化
import matplotlib.pyplot as plt
plt.plot(depth_range, score)
plt.show()
5.4 otto案例介绍 -- Otto Group Product Classification Challenge【xgboost实现】
1 背景介绍
奥托集团是世界上最大的电子商务公司之一,在20多个国家设有子公司。该公司每天都在世界各地销售数百万种产品,所以对其产品根据性能合理的分类非常重要。
不过,在实际工作中,工作人员发现,许多相同的产品得到了不同的分类。本案例要求,你对奥拓集团的产品进行正确的分分类。尽可能的提供分类的准确性。
链接:https://www.kaggle.com/c/otto-group-product-classification-challenge/overview
2 思路分析
-
1.数据获取
-
2.数据基本处理
- 2.1 截取部分数据
- 2.2 把标签纸转换为数字
- 2.3 分割数据(使用StratifiedShuffleSplit)
- 2.4 数据标准化
- 2.5 数据pca降维
-
3.模型训练
- 3.1 基本模型训练
- 3.2 模型调优
- 3.2.1 调优参数:
- n_estimator,
- max_depth,
- min_child_weights,
- subsamples,
- consample_bytrees,
- etas
- 3.2.2 确定最后最优参数
- 3.2.1 调优参数:
3 部分代码实现
-
2.数据基本处理
-
2.1 截取部分数据
-
2.2 把标签纸转换为数字
-
2.3 分割数据(使用StratifiedShuffleSplit)
# 使用StratifiedShuffleSplit对数据集进行分割 from sklearn.model_selection import StratifiedShuffleSplit sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0) for train_index, test_index in sss.split(X_resampled.values, y_resampled): print(len(train_index)) print(len(test_index)) x_train = X_resampled.values[train_index] x_val = X_resampled.values[test_index] y_train = y_resampled[train_index] y_val = y_resampled[test_index]
# 分割数据图形可视化 import seaborn as sns sns.countplot(y_val) plt.show()
-
2.4 数据标准化
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(x_train) x_train_scaled = scaler.transform(x_train) x_val_scaled = scaler.transform(x_val)
-
2.5 数据pca降维
print(x_train_scaled.shape) # (13888, 93) from sklearn.decomposition import PCA pca = PCA(n_components=0.9) x_train_pca = pca.fit_transform(x_train_scaled) x_val_pca = pca.transform(x_val_scaled) print(x_train_pca.shape, x_val_pca.shape) (13888, 65) (3473, 65)
从上面输出的数据可以看出,只选择65个元素,就可以表达出特征中90%的信息
# 降维数据可视化 plt.plot(np.cumsum(pca.explained_variance_ratio_)) plt.xlabel("元素数量") plt.ylabel("可表达信息的百分占比") plt.show()
-
-
3.模型训练
-
3.1 基本模型训练
from xgboost import XGBClassifier xgb = XGBClassifier() xgb.fit(x_train_pca, y_train) # 改变预测值的输出模式,让输出结果为百分占比,降低logloss值 y_pre_proba = xgb.predict_proba(x_val_pca)
# logloss进行模型评估 from sklearn.metrics import log_loss log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True) xgb.get_params
-
-
3.2 模型调优
-
3.2.1 调优参数:
-
n_estimator,
scores_ne = [] n_estimators = [100,200,400,450,500,550,600,700] for nes in n_estimators: print("n_estimators:", nes) xgb = XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=nes, objective="multi:softprob", n_jobs=-1, nthread=4, min_child_weight=1, subsample=1, colsample_bytree=1, seed=42) xgb.fit(x_train_pca, y_train) y_pre = xgb.predict_proba(x_val_pca) score = log_loss(y_val, y_pre) scores_ne.append(score) print("测试数据的logloss值为:{}".format(score))
# 数据变化可视化 plt.plot(n_estimators, scores_ne, "o-") plt.ylabel("log_loss") plt.xlabel("n_estimators") print("n_estimators的最优值为:{}".format(n_estimators[np.argmin(scores_ne)]))
-
-
-
-
-
max_depth,
scores_md = [] max_depths = [1,3,5,6,7] for md in max_depths: # 修改 xgb = XGBClassifier(max_depth=md, # 修改 learning_rate=0.1, n_estimators=n_estimators[np.argmin(scores_ne)], # 修改 objective="multi:softprob", n_jobs=-1, nthread=4, min_child_weight=1, subsample=1, colsample_bytree=1, seed=42) xgb.fit(x_train_pca, y_train) y_pre = xgb.predict_proba(x_val_pca) score = log_loss(y_val, y_pre) scores_md.append(score) # 修改 print("测试数据的logloss值为:{}".format(log_loss(y_val, y_pre)))
# 数据变化可视化 plt.plot(max_depths, scores_md, "o-") # 修改 plt.ylabel("log_loss") plt.xlabel("max_depths") # 修改 print("max_depths的最优值为:{}".format(max_depths[np.argmin(scores_md)])) # 修改
-
min_child_weights,
- 依据上面模式进行调整
-
subsamples,
-
consample_bytrees,
-
etas
-
-
3.2.2 确定最后最优参数
xgb = XGBClassifier(learning_rate =0.1, n_estimators=550, max_depth=3, min_child_weight=3, subsample=0.7, colsample_bytree=0.7, nthread=4, seed=42, objective='multi:softprob') xgb.fit(x_train_scaled, y_train) y_pre = xgb.predict_proba(x_val_scaled) print("测试数据的logloss值为 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
-
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
数据获取
In [2]:
data = pd.read_csv("./data/otto/train.csv")
In [3]:
data.head()
Out[3]:
id | feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | ... | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
3 | 4 | 1 | 0 | 0 | 1 | 6 | 1 | 5 | 0 | 0 | ... | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Class_1 |
4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Class_1 |
5 rows × 95 columns
In [4]:
data.shape
Out[4]:
(61878, 95)
In [5]:
data.describe()
Out[5]:
id | feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | ... | feat_84 | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 61878.000000 | 61878.00000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | ... | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 | 61878.000000 |
mean | 30939.500000 | 0.38668 | 0.263066 | 0.901467 | 0.779081 | 0.071043 | 0.025696 | 0.193704 | 0.662433 | 1.011296 | ... | 0.070752 | 0.532306 | 1.128576 | 0.393549 | 0.874915 | 0.457772 | 0.812421 | 0.264941 | 0.380119 | 0.126135 |
std | 17862.784315 | 1.52533 | 1.252073 | 2.934818 | 2.788005 | 0.438902 | 0.215333 | 1.030102 | 2.255770 | 3.474822 | ... | 1.151460 | 1.900438 | 2.681554 | 1.575455 | 2.115466 | 1.527385 | 4.597804 | 2.045646 | 0.982385 | 1.201720 |
min | 1.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 15470.250000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 30939.500000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
75% | 46408.750000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
max | 61878.000000 | 61.00000 | 51.000000 | 64.000000 | 70.000000 | 19.000000 | 10.000000 | 38.000000 | 76.000000 | 43.000000 | ... | 76.000000 | 55.000000 | 65.000000 | 67.000000 | 30.000000 | 61.000000 | 130.000000 | 52.000000 | 19.000000 | 87.000000 |
8 rows × 94 columns
In [6]:
# 图形可视化,查看数据分布
import seaborn as sns
sns.countplot(data.target)
plt.show()
由上图可以看出,该数据类别不均衡,所以需要后期处理
数据基本处理
数据已经经过脱敏,不再需要特殊处理
截取部分数据
In [7]:
new1_data = data[:10000]
new1_data.shape
Out[7]:
(10000, 95)
In [8]:
# 图形可视化,查看数据分布
import seaborn as sns
sns.countplot(new1_data.target)
plt.show()
使用上面方式获取数据不可行,然后使用随机欠采样获取响应的数据
In [9]:
# 随机欠采样获取数据
# 首先需要确定特征值\标签值
y = data["target"]
x = data.drop(["id", "target"], axis=1)
In [10]:
x.head()
Out[10]:
feat_1 | feat_2 | feat_3 | feat_4 | feat_5 | feat_6 | feat_7 | feat_8 | feat_9 | feat_10 | ... | feat_84 | feat_85 | feat_86 | feat_87 | feat_88 | feat_89 | feat_90 | feat_91 | feat_92 | feat_93 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 1 | 6 | 1 | 5 | 0 | 0 | 1 | ... | 22 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5 rows × 93 columns
In [11]:
y.head()
Out[11]:
0 Class_1
1 Class_1
2 Class_1
3 Class_1
4 Class_1
Name: target, dtype: object
In [12]:
# 欠采样获取数据
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_resampled, y_resampled = rus.fit_resample(x, y)
In [13]:
x.shape, y.shape
Out[13]:
((61878, 93), (61878,))
In [14]:
X_resampled.shape, y_resampled.shape
Out[14]:
((17361, 93), (17361,))
In [15]:
# 图形可视化,查看数据分布
import seaborn as sns
sns.countplot(y_resampled)
plt.show()
把标签值转换为数字
In [16]:
y_resampled.head()
Out[16]:
0 Class_1
1 Class_1
2 Class_1
3 Class_1
4 Class_1
Name: target, dtype: object
In [17]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_resampled = le.fit_transform(y_resampled)
In [18]:
y_resampled
Out[18]:
array([0, 0, 0, ..., 8, 8, 8])
分割数据
In [19]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)
In [20]:
x_train.shape, y_train.shape
Out[20]:
((13888, 93), (13888,))
In [21]:
x_test.shape, y_test.shape
Out[21]:
((3473, 93), (3473,))
In [22]:
# 1.数据获取
# 2.数据基本处理
# 2.1 截取部分数据
# 2.2 把标签纸转换为数字
# 2.3 分割数据(使用StratifiedShuffleSplit)
# 2.4 数据标准化
# 2.5 数据pca降维
# 3.模型训练
# 3.1 基本模型训练
# 3.2 模型调优
# 3.2.1 调优参数:
# n_estimator,
# max_depth,
# min_child_weights,
# subsamples,
# consample_bytrees,
# etas
# 3.2.2 确定最后最优参数
In [23]:
# 图形可视化
import seaborn as sns
sns.countplot(y_test)
plt.show()
In [28]:
# 通过StratifiedShuffleSplit实现数据分割
from sklearn.model_selection import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
for train_index, test_index in sss.split(X_resampled.values, y_resampled):
print(len(train_index))
print(len(test_index))
x_train = X_resampled.values[train_index]
x_val = X_resampled.values[test_index]
y_train = y_resampled[train_index]
y_val = y_resampled[test_index]
13888
3473
In [29]:
print(x_train.shape, x_val.shape)
(13888, 93) (3473, 93)
In [30]:
# 图形可视化
import seaborn as sns
sns.countplot(y_val)
plt.show()
数据标准化
In [31]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)
数据PCA降维
In [33]:
x_train_scaled.shape
Out[33]:
(13888, 93)
In [34]:
from sklearn.decomposition import PCA
pca = PCA(n_components=0.9)
x_train_pca = pca.fit_transform(x_train_scaled)
x_val_pca = pca.transform(x_val_scaled)
In [35]:
print(x_train_pca.shape, x_val_pca.shape)
(13888, 65) (3473, 65)
In [37]:
# 可视化数据降维信息变化程度
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel("元素数量")
plt.ylabel("表达信息百分占比")
plt.show()
模型训练
基本模型训练
In [38]:
from xgboost import XGBClassifier
xgb = XGBClassifier()
xgb.fit(x_train_pca, y_train)
Out[38]:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
In [39]:
# 输出预测值,一定输出带有百分占比的预测值
y_pre_proba = xgb.predict_proba(x_val_pca)
In [40]:
y_pre_proba
Out[40]:
array([[0.4893983 , 0.00375719, 0.00225278, ..., 0.06179977, 0.17131925,
0.03980364],
[0.14336601, 0.01110009, 0.01018962, ..., 0.00691424, 0.02062171,
0.7525783 ],
[0.00834821, 0.14602502, 0.65013766, ..., 0.01385602, 0.00602207,
0.00240582],
...,
[0.09568001, 0.00293341, 0.00582061, ..., 0.1031019 , 0.7587154 ,
0.02730099],
[0.40236628, 0.12317444, 0.03567632, ..., 0.18818544, 0.13276173,
0.07105519],
[0.00473167, 0.01536749, 0.02546864, ..., 0.00882399, 0.88531935,
0.00384397]], dtype=float32)
In [42]:
# logloss评估
from sklearn.metrics import log_loss
log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)
Out[42]:
0.7845457684689274
In [43]:
xgb.get_params
Out[43]:
<bound method XGBModel.get_params of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)>
模型调优
确定最优的estimators
In [44]:
scores_ne = []
n_estimators = [100, 200, 300, 400, 500, 550, 600, 700]
In [49]:
for nes in n_estimators:
print("n_estimators:", nes)
xgb = XGBClassifier(max_depth=3,
learning_rate=0.1,
n_estimators=nes,
objective="multi:softprob",
n_jobs=-1,
nthread=4,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
seed=42)
xgb.fit(x_train_pca, y_train)
y_pre = xgb.predict_proba(x_val_pca)
score = log_loss(y_val, y_pre)
scores_ne.append(score)
print("每次测试的logloss值是:{}".format(score))
n_estimators: 100
每次测试的logloss值是:0.7845457684689274
n_estimators: 200
每次测试的logloss值是:0.7163659085830947
n_estimators: 300
每次测试的logloss值是:0.6933389946023942
n_estimators: 400
每次测试的logloss值是:0.68119252278615
n_estimators: 500
每次测试的logloss值是:0.67700775120196
n_estimators: 550
每次测试的logloss值是:0.6756911007299885
n_estimators: 600
每次测试的logloss值是:0.6757532660164814
n_estimators: 700
每次测试的logloss值是:0.6778721089881976
In [50]:
# 图形化展示相应的logloss值
plt.plot(n_estimators, scores_ne, "o-")
plt.xlabel("n_estimators")
plt.ylabel("log_loss")
plt.show()
print("最优的n_estimators值是:{}".format(n_estimators[np.argmin(scores_ne)]))
最优的n_estimators值是:550
确定最优的max_depth
In [63]:
scores_md = []
max_depths = [1,3,5,6,7]
In [64]:
for md in max_depths:
print("max_depth:", md)
xgb = XGBClassifier(max_depth=md,
learning_rate=0.1,
n_estimators=n_estimators[np.argmin(scores_ne)],
objective="multi:softprob",
n_jobs=-1,
nthread=4,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
seed=42)
xgb.fit(x_train_pca, y_train)
y_pre = xgb.predict_proba(x_val_pca)
score = log_loss(y_val, y_pre)
scores_md.append(score)
print("每次测试的logloss值是:{}".format(score))
max_depth: 1
每次测试的logloss值是:0.8186777106711784
max_depth: 3
每次测试的logloss值是:0.6756911007299885
max_depth: 5
每次测试的logloss值是:0.730323661087053
max_depth: 6
每次测试的logloss值是:0.7693314501840949
max_depth: 7
每次测试的logloss值是:0.7889236364892144
In [67]:
# 图形化展示相应的logloss值
plt.plot(max_depths, scores_md, "o-")
plt.xlabel("max_depths")
plt.ylabel("log_loss")
plt.show()
print("最优的max_depths值是:{}".format(max_depths[np.argmin(scores_md)]))
最优的max_depths值是:3
依据上面模式,运行调试下面参数
min_child_weights,
subsamples,
consample_bytrees,
etas
In [69]:
xgb = XGBClassifier(learning_rate =0.1,
n_estimators=550,
max_depth=3,
min_child_weight=3,
subsample=0.7,
colsample_bytree=0.7,
nthread=4,
seed=42,
objective='multi:softprob')
xgb.fit(x_train_scaled, y_train)
y_pre = xgb.predict_proba(x_val_scaled)
print("测试数据的log_loss值为 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
测试数据的log_loss值为 : 0.5944022517380477