集成学习:XGBoost

日萌社

人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新)


集成学习:Bagging、随机森林、Boosting、GBDT

集成学习:XGBoost

集成学习:lightGBM(一)

集成学习:lightGBM(二)


5.1 xgboost算法原理

XGBoost(Extreme Gradient Boosting)全名叫极端梯度提升树,XGBoost是集成学习方法的王牌,在Kaggle数据挖掘比赛中,大部分获胜者用了XGBoost。

XGBoost在绝大多数的回归和分类问题上表现的十分顶尖,本节将较详细的介绍XGBoost的算法原理。

1 最优模型的构建方法

我们在前面已经知道,构建最优模型的一般方法是最小化训练数据的损失函数

我们用字母 L表示损失,如下式:

其中,F是假设空间

假设空间是在已知属性和属性可能取值的情况下,对所有可能满足目标的情况的一种毫无遗漏的假设集合。

式(1.1)称为经验风险最小化,训练得到的模型复杂度较高。当训练数据较小时,模型很容易出现过拟合问题。

因此,为了降低模型的复杂度,常采用下式:

其中J(f)J(f)为模型的复杂度,

式(2.1)称为结构风险最小化,结构风险最小化的模型往往对训练数据以及未知的测试数据都有较好的预测 。

应用:

  • 决策树的生成和剪枝分别对应了经验风险最小化和结构风险最小化,
  • XGBoost的决策树生成是结构风险最小化的结果,后续会详细介绍。

2 XGBoost的目标函数推导

2.1 目标函数确定

目标函数,即损失函数,通过最小化损失函数来构建最优模型。

由前面可知, 损失函数应加上表示模型复杂度的正则项,且XGBoost对应的模型包含了多个CART树,因此,模型的目标函数为:

2.2 CART树的介绍

2.3 树的复杂度定义

2.3.1 定义每课树的复杂度

XGBoost法对应的模型包含了多棵cart树,定义每棵树的复杂度:

2.3.2 树的复杂度举例

假设我们要预测一家人对电子游戏的喜好程度,考虑到年轻和年老相比,年轻更可能喜欢电子游戏,以及男性和女性相比,男性更喜欢电子游戏,故先根据年龄大小区分小孩和大人,然后再通过性别区分开是男是女,逐一给各人在电子游戏喜好程度上打分,如下图所示:

就这样,训练出了2棵树tree1和tree2,类似之前gbdt的原理,两棵树的结论累加起来便是最终的结论,所以:

  • 小男孩的预测分数就是两棵树中小孩所落到的结点的分数相加:2 + 0.9 = 2.9。
  • 爷爷的预测分数同理:-1 + (-0.9)= -1.9。

具体如下图所示:

2.4 目标函数推导

3 XGBoost的回归树构建方法

3.1 计算分裂节点

在实际训练过程中,当建立第 t 棵树时,XGBoost采用贪心法进行树结点的分裂:

从树深为0时开始:

  • 对树中的每个叶子结点尝试进行分裂;

  • 每次分裂后,原来的一个叶子结点继续分裂为左右两个子叶子结点,原叶子结点中的样本集将根据该结点的判断规则分散到左右两个叶子结点中;

  • 新分裂一个结点后,我们需要检测这次分裂是否会给损失函数带来增益,增益的定义如下:

如果增益Gain>0,即分裂为两个叶子节点后,目标函数下降了,那么我们会考虑此次分裂的结果。

那么一直这样分裂,什么时候才会停止呢?

3.2 停止分裂条件判断

情况一:上节推导得到的打分函数是衡量树结构好坏的标准,因此,可用打分函数来选择最佳切分点。首先确定样本特征的所有切分点,对每一个确定的切分点进行切分,切分好坏的标准如下:

4 XGBoost与GDBT的区别

  • 区别一:
    • XGBoost生成CART树考虑了树的复杂度,
    • GDBT未考虑,GDBT在树的剪枝步骤中考虑了树的复杂度。
  • 区别二:
    • XGBoost是拟合上一轮损失函数的二阶导展开,GDBT是拟合上一轮损失函数的一阶导展开,因此,XGBoost的准确性更高,且满足相同的训练效果,需要的迭代次数更少。
  • 区别三:
    • XGBoost与GDBT都是逐次迭代来提高模型性能,但是XGBoost在选取最佳切分点时可以开启多线程进行,大大提高了运行速度。

5 小结


5.2 xgboost算法api介绍

1 xgboost的安装:

官网链接:https://xgboost.readthedocs.io/en/latest/

pip3 install xgboost

2 xgboost参数介绍

xgboost虽然被称为kaggle比赛神奇,但是,我们要想训练出不错的模型,必须要给参数传递合适的值。

xgboost中封装了很多参数,主要由三种类型构成:通用参数(general parameters),Booster 参数(booster parameters)和学习目标参数(task parameters)

  • 通用参数:主要是宏观函数控制;
  • Booster参数:取决于选择的Booster类型,用于控制每一步的booster(tree, regressiong)
  • 学习目标参数:控制训练目标的表现

2.1 通用参数(general parameters)

  1. booster [缺省值=gbtree]
  2. 决定使用哪个booster,可以是gbtree,gblinear或者dart。

    • gbtree和dart使用基于树的模型(dart 主要多了 Dropout),而gblinear 使用线性函数.
  3. silent [缺省值=0]

    • 设置为0打印运行信息;设置为1静默模式,不打印
  4. nthread [缺省值=设置为最大可能的线程数]

    • 并行运行xgboost的线程数,输入的参数应该<=系统的CPU核心数,若是没有设置算法会检测将其设置为CPU的全部核心数

下面的两个参数不需要设置,使用默认的就好了

  1. num_pbuffer [xgboost自动设置,不需要用户设置]

    • 预测结果缓存大小,通常设置为训练实例的个数。该缓存用于保存最后boosting操作的预测结果。
  2. num_feature [xgboost自动设置,不需要用户设置]

    • 在boosting中使用特征的维度,设置为特征的最大维度

2.2 Booster 参数(booster parameters)

2.2.1 Parameters for Tree Booster

  1. eta [缺省值=0.3,别名:learning_rate]

    • 更新中减少的步长来防止过拟合。

    • 在每次boosting之后,可以直接获得新的特征权值,这样可以使得boosting更加鲁棒。

    • 范围: [0,1]
  2. gamma [缺省值=0,别名: min_split_loss](分裂最小loss)

    • 在节点分裂时,只有分裂后损失函数的值下降了,才会分裂这个节点。
    • Gamma指定了节点分裂所需的最小损失函数下降值。 这个参数的值越大,算法越保守。这个参数的值和损失函数息息相关,所以是需要调整的。

    • 范围: [0,∞]

  3. max_depth [缺省值=6]

    • 这个值为树的最大深度。 这个值也是用来避免过拟合的。max_depth越大,模型会学到更具体更局部的样本。设置为0代表没有限制
    • 范围: [0,∞]
  4. min_child_weight [缺省值=1]

    • 决定最小叶子节点样本权重和。XGBoost的这个参数是最小样本权重的和.
    • 当它的值较大时,可以避免模型学习到局部的特殊样本。 但是如果这个值过高,会导致欠拟合。这个参数需要使用CV来调整。.
    • 范围: [0,∞]
  5. subsample [缺省值=1]

    • 这个参数控制对于每棵树,随机采样的比例。
    • 减小这个参数的值,算法会更加保守,避免过拟合。但是,如果这个值设置得过小,它可能会导致欠拟合。

    • 典型值:0.5-1,0.5代表平均采样,防止过拟合.

    • 范围: (0,1]
  6. colsample_bytree [缺省值=1]

    • 用来控制每棵随机采样的列数的占比(每一列是一个特征)。
    • 典型值:0.5-1
    • 范围: (0,1]
  7. colsample_bylevel [缺省值=1]

    • 用来控制树的每一级的每一次分裂,对列数的采样的占比。
    • 我个人一般不太用这个参数,因为subsample参数和colsample_bytree参数可以起到相同的作用。但是如果感兴趣,可以挖掘这个参数更多的用处。
    • 范围: (0,1]
  8. lambda [缺省值=1,别名: reg_lambda]

    • 权重的L2正则化项(和Ridge regression类似)。
    • 这个参数是用来控制XGBoost的正则化部分的。虽然大部分数据科学家很少用到这个参数,但是这个参数
    • 在减少过拟合上还是可以挖掘出更多用处的。.
  9. alpha [缺省值=0,别名: reg_alpha]

    • 权重的L1正则化项。(和Lasso regression类似)。 可以应用在很高维度的情况下,使得算法的速度更快。
  10. scale_pos_weight[缺省值=1]

    • 在各类别样本十分不平衡时,把这个参数设定为一个正值,可以使算法更快收敛。通常可以将其设置为负
    • 样本的数目与正样本数目的比值。

2.2.2 Parameters for Linear Booster

linear booster一般很少用到。

  1. lambda [缺省值=0,别称: reg_lambda]

    • L2正则化惩罚系数,增加该值会使得模型更加保守。
  2. alpha [缺省值=0,别称: reg_alpha]

    • L1正则化惩罚系数,增加该值会使得模型更加保守。
  3. lambda_bias [缺省值=0,别称: reg_lambda_bias]

    • 偏置上的L2正则化(没有在L1上加偏置,因为并不重要)

2.3 学习目标参数(task parameters)

  1. objective [缺省值=reg:linear]

    1. reg:linear” – 线性回归
    2. “reg:logistic” – 逻辑回归
    3. binary:logistic” – 二分类逻辑回归,输出为概率
    4. multi:softmax” – 使用softmax的多分类器,返回预测的类别(不是概率)。在这种情况下,你还需要多设一个参数:num_class(类别数目)
    5. multi:softprob” – 和multi:softmax参数一样,但是返回的是每个数据属于各个类别的概率。
  2. eval_metric [缺省值=通过目标函数选择]

    可供选择的如下所示:

    1. rmse”: 均方根误差
    2. mae”: 平均绝对值误差
    3. logloss”: 负对数似然函数值
    4. error”: 二分类错误率。
      • 其值通过错误分类数目与全部分类数目比值得到。对于预测,预测值大于0.5被认为是正类,其它归为负类。
    5. error@t”: 不同的划分阈值可以通过 ‘t’进行设置
    6. merror”: 多分类错误率,计算公式为(wrong cases)/(all cases)
    7. mlogloss”: 多分类log损失
    8. auc”: 曲线下的面积
  3. seed [缺省值=0]

    • 随机数的种子
  • 设置它可以复现随机数据的结果,也可以用于调整参数

5.3 xgboost案例介绍

1 案例背景

该案例和前面决策树中所用案例一样。

泰坦尼克号沉没是历史上最臭名昭着的沉船事件之一。1912年4月15日,在她的处女航中,泰坦尼克号在与冰山相撞后沉没,在2224名乘客和机组人员中造成1502人死亡。这场耸人听闻的悲剧震惊了国际社会,并为船舶制定了更好的安全规定。 造成海难失事的原因之一是乘客和机组人员没有足够的救生艇。尽管幸存下沉有一些运气因素,但有些人比其他人更容易生存,例如妇女,儿童和上流社会。 在这个案例中,我们要求您完成对哪些人可能存活的分析。特别是,我们要求您运用机器学习工具来预测哪些乘客幸免于悲剧。

案例:https://www.kaggle.com/c/titanic/overview

我们提取到的数据集中的特征包括票的类别,是否存活,乘坐班次,年龄,登陆home.dest,房间,船和性别等。

数据:http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt

经过观察数据得到:

  • 1 乘坐班是指乘客班(1,2,3),是社会经济阶层的代表。
  • 2 其中age数据存在缺失。

2 步骤分析

  • 1.获取数据
  • 2.数据基本处理
    • 2.1 确定特征值,目标值
    • 2.2 缺失值处理
    • 2.3 数据集划分
  • 3.特征工程(字典特征抽取)
  • 4.机器学习(xgboost)
  • 5.模型评估

3 代码实现

  • 导入需要的模块
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
  • 1.获取数据
# 1、获取数据
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
  • 2.数据基本处理

    • 2.1 确定特征值,目标值
    x = titan[["pclass", "age", "sex"]]
    y = titan["survived"]
    
    • 2.2 缺失值处理
    # 缺失值需要处理,将特征当中有类别的这些特征进行字典特征抽取
    x['age'].fillna(x['age'].mean(), inplace=True)
    
    • 2.3 数据集划分
    x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
    
  • 3.特征工程(字典特征抽取)

特征中出现类别符号,需要进行one-hot编码处理(DictVectorizer)

x.to_dict(orient="records") 需要将数组特征转换成字典数据

# 对于x转换成字典数据x.to_dict(orient="records")
# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]

transfer = DictVectorizer(sparse=False)

x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))

  • 4.xgboost模型训练和模型评估
# 模型初步训练
from xgboost import XGBClassifier
xg = XGBClassifier()

xg.fit(x_train, y_train)

xg.score(x_test, y_test)
# 针对max_depth进行模型调优
depth_range = range(10)
score = []
for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    s = xg.score(x_test, y_test)
    print(s)
    score.append(s)
# 结果可视化
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()


In [1]:

# 1.获取数据
# 2.数据基本处理
# 2.1 确定特征值,目标值
# 2.2 缺失值处理
# 2.3 数据集划分
# 3.特征工程(字典特征抽取)
# 4.机器学习(xgboost)
# 5.模型评估

In [2]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz

In [3]:

# 1.获取数据
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")

In [4]:

titan

Out[4]:

  row.names pclass survived name age embarked home.dest room ticket boat sex
0 1 1st 1 Allen, Miss Elisabeth Walton 29.0000 Southampton St Louis, MO B-5 24160 L221 2 female
1 2 1st 0 Allison, Miss Helen Loraine 2.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN NaN female
2 3 1st 0 Allison, Mr Hudson Joshua Creighton 30.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN (135) male
3 4 1st 0 Allison, Mrs Hudson J.C. (Bessie Waldo Daniels) 25.0000 Southampton Montreal, PQ / Chesterville, ON C26 NaN NaN female
4 5 1st 1 Allison, Master Hudson Trevor 0.9167 Southampton Montreal, PQ / Chesterville, ON C22 NaN 11 male
5 6 1st 1 Anderson, Mr Harry 47.0000 Southampton New York, NY E-12 NaN 3 male
6 7 1st 1 Andrews, Miss Kornelia Theodosia 63.0000 Southampton Hudson, NY D-7 13502 L77 10 female
7 8 1st 0 Andrews, Mr Thomas, jr 39.0000 Southampton Belfast, NI A-36 NaN NaN male
8 9 1st 1 Appleton, Mrs Edward Dale (Charlotte Lamson) 58.0000 Southampton Bayside, Queens, NY C-101 NaN 2 female
9 10 1st 0 Artagaveytia, Mr Ramon 71.0000 Cherbourg Montevideo, Uruguay NaN NaN (22) male
10 11 1st 0 Astor, Colonel John Jacob 47.0000 Cherbourg New York, NY NaN 17754 L224 10s 6d (124) male
11 12 1st 1 Astor, Mrs John Jacob (Madeleine Talmadge Force) 19.0000 Cherbourg New York, NY NaN 17754 L224 10s 6d 4 female
12 13 1st 1 Aubert, Mrs Leontine Pauline NaN Cherbourg Paris, France B-35 17477 L69 6s 9 female
13 14 1st 1 Barkworth, Mr Algernon H. NaN Southampton Hessle, Yorks A-23 NaN B male
14 15 1st 0 Baumann, Mr John D. NaN Southampton New York, NY NaN NaN NaN male
15 16 1st 1 Baxter, Mrs James (Helene DeLaudeniere Chaput) 50.0000 Cherbourg Montreal, PQ B-58/60 NaN 6 female
16 17 1st 0 Baxter, Mr Quigg Edmond 24.0000 Cherbourg Montreal, PQ B-58/60 NaN NaN male
17 18 1st 0 Beattie, Mr Thomson 36.0000 Cherbourg Winnipeg, MN C-6 NaN NaN male
18 19 1st 1 Beckwith, Mr Richard Leonard 37.0000 Southampton New York, NY D-35 NaN 5 male
19 20 1st 1 Beckwith, Mrs Richard Leonard (Sallie Monypeny) 47.0000 Southampton New York, NY D-35 NaN 5 female
20 21 1st 1 Behr, Mr Karl Howell 26.0000 Cherbourg New York, NY C-148 NaN 5 male
21 22 1st 0 Birnbaum, Mr Jakob 25.0000 Cherbourg San Francisco, CA NaN NaN (148) male
22 23 1st 1 Bishop, Mr Dickinson H. 25.0000 Cherbourg Dowagiac, MI B-49 NaN 7 male
23 24 1st 1 Bishop, Mrs Dickinson H. (Helen Walton) 19.0000 Cherbourg Dowagiac, MI B-49 NaN 7 female
24 25 1st 1 Bjornstrm-Steffansson, Mr Mauritz Hakan 28.0000 Southampton Stockholm, Sweden / Washington, DC NaN   D male
25 26 1st 0 Blackwell, Mr Stephen Weart 45.0000 Southampton Trenton, NJ NaN NaN (241) male
26 27 1st 1 Blank, Mr Henry 39.0000 Cherbourg Glen Ridge, NJ A-31 NaN 7 male
27 28 1st 1 Bonnell, Miss Caroline 30.0000 Southampton Youngstown, OH C-7 NaN 8 female
28 29 1st 1 Bonnell, Miss Elizabeth 58.0000 Southampton Birkdale, England Cleveland, Ohio C-103 NaN 8 female
29 30 1st 0 Borebank, Mr John James NaN Southampton London / Winnipeg, MB D-21/2 NaN NaN male
... ... ... ... ... ... ... ... ... ... ... ...
1283 1284 3rd 0 Vestrom, Miss Hulda Amanda Adolfina NaN NaN NaN NaN NaN NaN female
1284 1285 3rd 0 Vonk, Mr Jenko NaN NaN NaN NaN NaN NaN male
1285 1286 3rd 0 Ware, Mr Frederick NaN NaN NaN NaN NaN NaN male
1286 1287 3rd 0 Warren, Mr Charles William NaN NaN NaN NaN NaN NaN male
1287 1288 3rd 0 Wazli, Mr Yousif NaN NaN NaN NaN NaN NaN male
1288 1289 3rd 0 Webber, Mr James NaN NaN NaN NaN NaN NaN male
1289 1290 3rd 1 Wennerstrom, Mr August Edvard NaN NaN NaN NaN NaN NaN male
1290 1291 3rd 0 Wenzel, Mr Linhart NaN NaN NaN NaN NaN NaN male
1291 1292 3rd 0 Widegren, Mr Charles Peter NaN NaN NaN NaN NaN NaN male
1292 1293 3rd 0 Wiklund, Mr Jacob Alfred NaN NaN NaN NaN NaN NaN male
1293 1294 3rd 1 Wilkes, Mrs Ellen NaN NaN NaN NaN NaN NaN female
1294 1295 3rd 0 Willer, Mr Aaron NaN NaN NaN NaN NaN NaN male
1295 1296 3rd 0 Willey, Mr Edward NaN NaN NaN NaN NaN NaN male
1296 1297 3rd 0 Williams, Mr Howard Hugh NaN NaN NaN NaN NaN NaN male
1297 1298 3rd 0 Williams, Mr Leslie NaN NaN NaN NaN NaN NaN male
1298 1299 3rd 0 Windelov, Mr Einar NaN NaN NaN NaN NaN NaN male
1299 1300 3rd 0 Wirz, Mr Albert NaN NaN NaN NaN NaN NaN male
1300 1301 3rd 0 Wiseman, Mr Phillippe NaN NaN NaN NaN NaN NaN male
1301 1302 3rd 0 Wittevrongel, Mr Camiel NaN NaN NaN NaN NaN NaN male
1302 1303 3rd 1 Yalsevac, Mr Ivan NaN NaN NaN NaN NaN NaN male
1303 1304 3rd 0 Yasbeck, Mr Antoni NaN NaN NaN NaN NaN NaN male
1304 1305 3rd 1 Yasbeck, Mrs Antoni NaN NaN NaN NaN NaN NaN female
1305 1306 3rd 0 Youssef, Mr Gerios NaN NaN NaN NaN NaN NaN male
1306 1307 3rd 0 Zabour, Miss Hileni NaN NaN NaN NaN NaN NaN female
1307 1308 3rd 0 Zabour, Miss Tamini NaN NaN NaN NaN NaN NaN female
1308 1309 3rd 0 Zakarian, Mr Artun NaN NaN NaN NaN NaN NaN male
1309 1310 3rd 0 Zakarian, Mr Maprieder NaN NaN NaN NaN NaN NaN male
1310 1311 3rd 0 Zenn, Mr Philip NaN NaN NaN NaN NaN NaN male
1311 1312 3rd 0 Zievens, Rene NaN NaN NaN NaN NaN NaN female
1312 1313 3rd 0 Zimmerman, Leo NaN NaN NaN NaN NaN NaN male

1313 rows × 11 columns

In [5]:

titan.describe()

Out[5]:

  row.names survived age
count 1313.000000 1313.000000 633.000000
mean 657.000000 0.341965 31.194181
std 379.174762 0.474549 14.747525
min 1.000000 0.000000 0.166700
25% 329.000000 0.000000 21.000000
50% 657.000000 0.000000 30.000000
75% 985.000000 1.000000 41.000000
max 1313.000000 1.000000 71.000000

In [6]:

# 2.数据基本处理
# 2.1 确定特征值,目标值
x = titan[["pclass", "age", "sex"]]
y = titan["survived"]

In [7]:

x.head()

Out[7]:

  pclass age sex
0 1st 29.0000 female
1 1st 2.0000 female
2 1st 30.0000 male
3 1st 25.0000 female
4 1st 0.9167 male

In [8]:

y.head()

Out[8]:

0    1
1    0
2    0
3    0
4    1
Name: survived, dtype: int64

In [9]:

# 2.2 缺失值处理
x['age'].fillna(value=titan["age"].mean(), inplace=True)

In [10]:

x.head()

Out[10]:

  pclass age sex
0 1st 29.0000 female
1 1st 2.0000 female
2 1st 30.0000 male
3 1st 25.0000 female
4 1st 0.9167 male

In [11]:

# 2.3 数据集划分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)

In [12]:

# 3.特征工程(字典特征抽取)

In [13]:

x_train.head()

Out[13]:

  pclass age sex
649 3rd 45.000000 female
1078 3rd 31.194181 male
59 1st 31.194181 female
201 1st 18.000000 male
61 1st 31.194181 female

In [14]:

x_train = x_train.to_dict(orient="records")
x_test = x_test.to_dict(orient="records")

In [15]:

x_train

Out[15]:

[{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 13.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 10.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 43.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 51.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 4.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 12.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 69.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 14.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 57.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 41.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 67.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 11.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 59.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 44.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 65.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 37.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 0.9167, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 14.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.1667, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 20.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 70.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 63.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 4.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 47.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 5.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 11.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 71.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 13.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 71.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 65.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 7.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 ...]

In [16]:

transfer = DictVectorizer()

x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)

In [21]:

# 4.xgboost模型训练
# 4.1 初步模型训练
from xgboost import XGBClassifier

xg = XGBClassifier()

xg.fit(x_train, y_train)

Out[21]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [22]:

xg.score(x_test, y_test)

Out[22]:

0.7832699619771863

In [23]:

# 4.2 对max_depth进行调优

depth_range  = range(10)
score = []

for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    
    s = xg.score(x_test, y_test)
    
    print(s)
    score.append(s)

0.6311787072243346
0.7908745247148289
0.7870722433460076
0.7832699619771863
0.7870722433460076
0.7908745247148289
0.7908745247148289
0.7946768060836502
0.7908745247148289
0.7946768060836502

In [25]:

# 4.3 调优结果可视化
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()


5.4 otto案例介绍 -- Otto Group Product Classification Challenge【xgboost实现】

1 背景介绍

奥托集团是世界上最大的电子商务公司之一,在20多个国家设有子公司。该公司每天都在世界各地销售数百万种产品,所以对其产品根据性能合理的分类非常重要。

不过,在实际工作中,工作人员发现,许多相同的产品得到了不同的分类。本案例要求,你对奥拓集团的产品进行正确的分分类。尽可能的提供分类的准确性。

链接:https://www.kaggle.com/c/otto-group-product-classification-challenge/overview

2 思路分析

  • 1.数据获取

  • 2.数据基本处理

    • 2.1 截取部分数据
    • 2.2 把标签纸转换为数字
    • 2.3 分割数据(使用StratifiedShuffleSplit)
    • 2.4 数据标准化
    • 2.5 数据pca降维
  • 3.模型训练

    • 3.1 基本模型训练
    • 3.2 模型调优
      • 3.2.1 调优参数:
        • n_estimator,
        • max_depth,
        • min_child_weights,
        • subsamples,
        • consample_bytrees,
        • etas
      • 3.2.2 确定最后最优参数

3 部分代码实现

  • 2.数据基本处理

    • 2.1 截取部分数据

    • 2.2 把标签纸转换为数字

    • 2.3 分割数据(使用StratifiedShuffleSplit)

      # 使用StratifiedShuffleSplit对数据集进行分割
      from sklearn.model_selection import StratifiedShuffleSplit
      
      sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
      for train_index, test_index in sss.split(X_resampled.values, y_resampled):
          print(len(train_index))
          print(len(test_index))
      
          x_train = X_resampled.values[train_index]
          x_val = X_resampled.values[test_index]
      
          y_train = y_resampled[train_index]
          y_val = y_resampled[test_index]
      
      # 分割数据图形可视化
      import seaborn as sns
      
      sns.countplot(y_val)
      
      plt.show()
      
    • 2.4 数据标准化

      from sklearn.preprocessing import StandardScaler
      
      scaler = StandardScaler()
      scaler.fit(x_train)
      
      x_train_scaled = scaler.transform(x_train)
      x_val_scaled = scaler.transform(x_val)
      
    • 2.5 数据pca降维

      print(x_train_scaled.shape)
      # (13888, 93)
      
      from sklearn.decomposition import PCA
      
      pca = PCA(n_components=0.9)
      x_train_pca = pca.fit_transform(x_train_scaled)
      x_val_pca = pca.transform(x_val_scaled)
      
      print(x_train_pca.shape, x_val_pca.shape)
      (13888, 65) (3473, 65)
      

      从上面输出的数据可以看出,只选择65个元素,就可以表达出特征中90%的信息

      # 降维数据可视化
      plt.plot(np.cumsum(pca.explained_variance_ratio_))
      
      plt.xlabel("元素数量")
      plt.ylabel("可表达信息的百分占比")
      
      plt.show()

  • 3.模型训练

    • 3.1 基本模型训练

      from xgboost import XGBClassifier
      
      xgb = XGBClassifier()
      xgb.fit(x_train_pca, y_train)
      
      # 改变预测值的输出模式,让输出结果为百分占比,降低logloss值
      y_pre_proba = xgb.predict_proba(x_val_pca)
      
      # logloss进行模型评估
      from sklearn.metrics import log_loss
      log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)
      
      xgb.get_params
      
  • 3.2 模型调优

    • 3.2.1 调优参数:

      • n_estimator,

        scores_ne = []
        n_estimators = [100,200,400,450,500,550,600,700]
        
        for nes in n_estimators:
            print("n_estimators:", nes)
            xgb = XGBClassifier(max_depth=3, 
                                learning_rate=0.1, 
                                n_estimators=nes, 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_ne.append(score)
            print("测试数据的logloss值为:{}".format(score))
        
        # 数据变化可视化
        plt.plot(n_estimators, scores_ne, "o-")
        
        plt.ylabel("log_loss")
        plt.xlabel("n_estimators")
        print("n_estimators的最优值为:{}".format(n_estimators[np.argmin(scores_ne)]))
        

      • max_depth,

        scores_md = []
        max_depths = [1,3,5,6,7]
        
        for md in max_depths:  # 修改
            xgb = XGBClassifier(max_depth=md, # 修改
                                learning_rate=0.1, 
                                n_estimators=n_estimators[np.argmin(scores_ne)],   # 修改 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_md.append(score)  # 修改
            print("测试数据的logloss值为:{}".format(log_loss(y_val, y_pre)))
        
        # 数据变化可视化
        plt.plot(max_depths, scores_md, "o-")  # 修改
        
        plt.ylabel("log_loss")
        plt.xlabel("max_depths")  # 修改
        print("max_depths的最优值为:{}".format(max_depths[np.argmin(scores_md)]))  # 修改
        
      • min_child_weights,

        • 依据上面模式进行调整
      • subsamples,

      • consample_bytrees,

      • etas

    • 3.2.2 确定最后最优参数

      xgb = XGBClassifier(learning_rate =0.1, 
                          n_estimators=550, 
                          max_depth=3, 
                          min_child_weight=3, 
                          subsample=0.7, 
                          colsample_bytree=0.7, 
                          nthread=4, 
                          seed=42, 
                          objective='multi:softprob')
      xgb.fit(x_train_scaled, y_train)
      
      y_pre = xgb.predict_proba(x_val_scaled)
      
      print("测试数据的logloss值为 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
      

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

数据获取

In [2]:

data = pd.read_csv("./data/otto/train.csv")

In [3]:

data.head()

Out[3]:

  id feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 ... feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93 target
0 1 1 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 Class_1
1 2 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 Class_1
2 3 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 Class_1
3 4 1 0 0 1 6 1 5 0 0 ... 0 1 2 0 0 0 0 0 0 Class_1
4 5 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 1 0 0 0 Class_1

5 rows × 95 columns

In [4]:

data.shape

Out[4]:

(61878, 95)

In [5]:

data.describe()

Out[5]:

  id feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 ... feat_84 feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93
count 61878.000000 61878.00000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 ... 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000 61878.000000
mean 30939.500000 0.38668 0.263066 0.901467 0.779081 0.071043 0.025696 0.193704 0.662433 1.011296 ... 0.070752 0.532306 1.128576 0.393549 0.874915 0.457772 0.812421 0.264941 0.380119 0.126135
std 17862.784315 1.52533 1.252073 2.934818 2.788005 0.438902 0.215333 1.030102 2.255770 3.474822 ... 1.151460 1.900438 2.681554 1.575455 2.115466 1.527385 4.597804 2.045646 0.982385 1.201720
min 1.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 15470.250000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 30939.500000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 46408.750000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 ... 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 61878.000000 61.00000 51.000000 64.000000 70.000000 19.000000 10.000000 38.000000 76.000000 43.000000 ... 76.000000 55.000000 65.000000 67.000000 30.000000 61.000000 130.000000 52.000000 19.000000 87.000000

8 rows × 94 columns

In [6]:

# 图形可视化,查看数据分布
import seaborn as sns

sns.countplot(data.target)

plt.show()

由上图可以看出,该数据类别不均衡,所以需要后期处理

数据基本处理

数据已经经过脱敏,不再需要特殊处理

截取部分数据

In [7]:

new1_data = data[:10000]
new1_data.shape

Out[7]:

(10000, 95)

In [8]:

# 图形可视化,查看数据分布
import seaborn as sns

sns.countplot(new1_data.target)

plt.show()

使用上面方式获取数据不可行,然后使用随机欠采样获取响应的数据

In [9]:

# 随机欠采样获取数据
# 首先需要确定特征值\标签值

y = data["target"]
x = data.drop(["id", "target"], axis=1)

In [10]:

x.head()

Out[10]:

  feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 feat_7 feat_8 feat_9 feat_10 ... feat_84 feat_85 feat_86 feat_87 feat_88 feat_89 feat_90 feat_91 feat_92 feat_93
0 1 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 1 0 0 1 6 1 5 0 0 1 ... 22 0 1 2 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0

5 rows × 93 columns

In [11]:

y.head()

Out[11]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [12]:

# 欠采样获取数据
from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(random_state=0)

X_resampled, y_resampled = rus.fit_resample(x, y)

In [13]:

x.shape, y.shape

Out[13]:

((61878, 93), (61878,))

In [14]:

X_resampled.shape, y_resampled.shape

Out[14]:

((17361, 93), (17361,))

In [15]:

# 图形可视化,查看数据分布
import seaborn as sns

sns.countplot(y_resampled)

plt.show()

把标签值转换为数字

In [16]:

y_resampled.head()

Out[16]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [17]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_resampled = le.fit_transform(y_resampled)
 

In [18]:

y_resampled

Out[18]:

array([0, 0, 0, ..., 8, 8, 8])

分割数据

In [19]:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)

In [20]:

x_train.shape, y_train.shape

Out[20]:

((13888, 93), (13888,))

In [21]:

x_test.shape, y_test.shape

Out[21]:

((3473, 93), (3473,))

In [22]:

# 1.数据获取

# 2.数据基本处理

    # 2.1 截取部分数据
    # 2.2 把标签纸转换为数字
    # 2.3 分割数据(使用StratifiedShuffleSplit)
    # 2.4 数据标准化
    # 2.5 数据pca降维

# 3.模型训练
    # 3.1 基本模型训练
    # 3.2 模型调优
        # 3.2.1 调优参数:
            # n_estimator,
            # max_depth,
            # min_child_weights,
            # subsamples,
            # consample_bytrees,
            # etas
        # 3.2.2 确定最后最优参数
    

In [23]:

# 图形可视化
import seaborn as sns

sns.countplot(y_test)
plt.show()

In [28]:

# 通过StratifiedShuffleSplit实现数据分割

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

for train_index, test_index in sss.split(X_resampled.values, y_resampled):
    print(len(train_index))
    print(len(test_index))
    
    x_train = X_resampled.values[train_index]
    x_val = X_resampled.values[test_index]
    
    y_train = y_resampled[train_index]
    y_val = y_resampled[test_index]

13888
3473

In [29]:

print(x_train.shape, x_val.shape)

(13888, 93) (3473, 93)

In [30]:

# 图形可视化
import seaborn as sns

sns.countplot(y_val)
plt.show()

数据标准化

In [31]:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(x_train)

x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)

数据PCA降维

In [33]:

x_train_scaled.shape

Out[33]:

(13888, 93)

In [34]:

from sklearn.decomposition import PCA

pca = PCA(n_components=0.9)

x_train_pca = pca.fit_transform(x_train_scaled)
x_val_pca = pca.transform(x_val_scaled)

In [35]:

print(x_train_pca.shape, x_val_pca.shape)

(13888, 65) (3473, 65)

In [37]:

# 可视化数据降维信息变化程度
plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel("元素数量")
plt.ylabel("表达信息百分占比")

plt.show()

模型训练

基本模型训练

In [38]:

from xgboost import XGBClassifier

xgb = XGBClassifier()
xgb.fit(x_train_pca, y_train)

Out[38]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [39]:

# 输出预测值,一定输出带有百分占比的预测值
y_pre_proba = xgb.predict_proba(x_val_pca)

In [40]:

y_pre_proba

Out[40]:

array([[0.4893983 , 0.00375719, 0.00225278, ..., 0.06179977, 0.17131925,
        0.03980364],
       [0.14336601, 0.01110009, 0.01018962, ..., 0.00691424, 0.02062171,
        0.7525783 ],
       [0.00834821, 0.14602502, 0.65013766, ..., 0.01385602, 0.00602207,
        0.00240582],
       ...,
       [0.09568001, 0.00293341, 0.00582061, ..., 0.1031019 , 0.7587154 ,
        0.02730099],
       [0.40236628, 0.12317444, 0.03567632, ..., 0.18818544, 0.13276173,
        0.07105519],
       [0.00473167, 0.01536749, 0.02546864, ..., 0.00882399, 0.88531935,
        0.00384397]], dtype=float32)

In [42]:

# logloss评估
from sklearn.metrics import log_loss

log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)

Out[42]:

0.7845457684689274

In [43]:

xgb.get_params

Out[43]:

<bound method XGBModel.get_params of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)>

模型调优

确定最优的estimators

In [44]:

scores_ne = []
n_estimators = [100, 200, 300, 400, 500, 550, 600, 700]

In [49]:

for nes in n_estimators:
    print("n_estimators:", nes)
    xgb = XGBClassifier(max_depth=3,
                        learning_rate=0.1, 
                        n_estimators=nes, 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_ne.append(score)
    
    print("每次测试的logloss值是:{}".format(score))

n_estimators: 100
每次测试的logloss值是:0.7845457684689274
n_estimators: 200
每次测试的logloss值是:0.7163659085830947
n_estimators: 300
每次测试的logloss值是:0.6933389946023942
n_estimators: 400
每次测试的logloss值是:0.68119252278615
n_estimators: 500
每次测试的logloss值是:0.67700775120196
n_estimators: 550
每次测试的logloss值是:0.6756911007299885
n_estimators: 600
每次测试的logloss值是:0.6757532660164814
n_estimators: 700
每次测试的logloss值是:0.6778721089881976

In [50]:

# 图形化展示相应的logloss值
plt.plot(n_estimators, scores_ne, "o-")

plt.xlabel("n_estimators")
plt.ylabel("log_loss")
plt.show()

print("最优的n_estimators值是:{}".format(n_estimators[np.argmin(scores_ne)]))

最优的n_estimators值是:550

确定最优的max_depth

In [63]:

scores_md = []
max_depths = [1,3,5,6,7]

In [64]:

for md in max_depths:
    print("max_depth:", md)
    xgb = XGBClassifier(max_depth=md,
                        learning_rate=0.1, 
                        n_estimators=n_estimators[np.argmin(scores_ne)], 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_md.append(score)
    
    print("每次测试的logloss值是:{}".format(score))

max_depth: 1
每次测试的logloss值是:0.8186777106711784
max_depth: 3
每次测试的logloss值是:0.6756911007299885
max_depth: 5
每次测试的logloss值是:0.730323661087053
max_depth: 6
每次测试的logloss值是:0.7693314501840949
max_depth: 7
每次测试的logloss值是:0.7889236364892144

In [67]:

# 图形化展示相应的logloss值
plt.plot(max_depths, scores_md, "o-")

plt.xlabel("max_depths")
plt.ylabel("log_loss")
plt.show()

print("最优的max_depths值是:{}".format(max_depths[np.argmin(scores_md)]))

最优的max_depths值是:3

依据上面模式,运行调试下面参数

min_child_weights,

subsamples,

consample_bytrees,

etas

In [69]:

xgb = XGBClassifier(learning_rate =0.1, 
                    n_estimators=550, 
                    max_depth=3, 
                    min_child_weight=3, 
                    subsample=0.7, 
                    colsample_bytree=0.7, 
                    nthread=4, 
                    seed=42, 
                    objective='multi:softprob')

xgb.fit(x_train_scaled, y_train)

y_pre = xgb.predict_proba(x_val_scaled)

print("测试数据的log_loss值为 : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))

测试数据的log_loss值为 : 0.5944022517380477
发布了352 篇原创文章 · 获赞 116 · 访问量 19万+

猜你喜欢

转载自blog.csdn.net/zimiao552147572/article/details/104658658