Kaggle项目实战一：Titanic: Machine Learning from Disaster - 代码天地

Kaggle项目实战一：Titanic: Machine Learning from Disaster

其他 2018-05-14 18:58:50 阅读次数: 2

项目地址

https://www.kaggle.com/c/titanic

项目介绍：

除了乘客的编号以外，还包括下表中10个字段，构成了数据的所有特征

Variable	Definition	Key
survival	是否存活	0 = No, 1 = Yes
pclass	票的等级	1 = 1st, 2 = 2nd, 3 = 3rd
sex	性别
Age	年龄
sibsp	同乘配偶或兄弟姐妹
parch	同乘孩子或父母
ticket	票号
fare	乘客票价
cabin	客舱号码
embarked	登船港口	C = Cherbourg, Q = Queenstown, S = Southampton

导入数据

train_df = pd.read_csv("..\train.csv")
test_df = pd.read_csv("..test.csv")

查看数据整体缺失情况

结果如下：存在null值得字段有Age、Fare和Cabin，其中Cabin缺失最为严重，缺失率77.1%

train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object

连续型变量分布情况

train_df.describe()
       PassengerId    Survived      Pclass         Age       SibSp  \
count   891.000000  891.000000  891.000000  714.000000  891.000000   
mean    446.000000    0.383838    2.308642   29.699118    0.523008   
std     257.353842    0.486592    0.836071   14.526497    1.102743   
min       1.000000    0.000000    1.000000    0.420000    0.000000   
25%     223.500000    0.000000    2.000000   20.125000    0.000000   
50%     446.000000    0.000000    3.000000   28.000000    0.000000   
75%     668.500000    1.000000    3.000000   38.000000    1.000000   
max     891.000000    1.000000    3.000000   80.000000    8.000000   
            Parch        Fare  
count  891.000000  891.000000  
mean     0.381594   32.204208  
std      0.806057   49.693429  
min      0.000000    0.000000  
25%      0.000000    7.910400  
50%      0.000000   14.454200  
75%      0.000000   31.000000  
max      6.000000  512.329200

离散变量情况

猜你喜欢

转载自www.cnblogs.com/bethansy/p/9037513.html

Kaggle项目实战一：Titanic: Machine Learning from Disaster

Titanic: Machine Learning from Disaster

kaggle _Titanic: Machine Learning from Disaster

kaggle笔记01： Titanic: Machine Learning from Disaster(一)

【kaggle入门题一】Titanic: Machine Learning from Disaster

Kaggle Titanic: Machine Learning from Disaster(入门尝试)

kaggle笔记02： Titanic: Machine Learning from Disaster(二)

【Kaggle】竞赛入门：Titanic: Machine Learning from Disaster

kaggle入门学习demo——Titanic: Machine Learning from Disaster

Python机器学习/数据挖掘项目实战泰坦尼克号Titanic生存预测 Kaggle入门比赛Titanic : Machine Learning from Disaster 随机森林分类器

数据分析与挖掘入门练习1——kaggle比赛_Titanic: Machine Learning from Disaster

挑战排行前4%--Titanic: Machine Learning from Disaster--kaggle入门（40）

Re:从零开始的机器学习 - Titanic: Machine Learning from Disaster

Kaggle案例（一）Titanic: Machine Learning from Disaste

Machine Learning From Scratch: Logistic Regression

【Kaggle】Intro to Machine Learning 第一次提交 Titanic

Machine Learning 笔记一

Machine Learning (一)

Machine Learning 笔记 (一)

machine learning(kaggle)机器学习。

machine learning 线性回归实战

cp1_Journey from Statistics to Machine Learning

Machine Learning（一）KNN算法

Kaggle练习002--Predict survival on the Titanic(Titanic Disaster)

Kaggle Machine Learning 教程学习（六）

Kaggle Machine Learning 教程学习（五）

Machine Learning

Deep Learning - Machine Learning

Learning Path for Machine Learning

机器学习实战 [Machine learning in action]

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)