[machine learning] GBDT实现-源码阅读 - 代码天地

[machine learning] GBDT实现-源码阅读

其他 2018-07-31 08:57:04 阅读次数: 0

这次看的源码为ctr预估竞赛Kaggle-criteo Display Advertising Challenge 的参赛队”3 Idiots“ 开源的代码，没记错的话当时应该是第一名，网上直接搜“kaggle-2014-criteo

”就可以查到，其git在https://github.com/guestwalk/kaggle-2014-criteo

在这里只说GBDT的部分，而其整个思路的解析打算专门写一篇，只捡重要的：

（1）要生成树的数据

包括稠密特征和稀疏特征，稠密特征即普通的代表特征值的特征，稀疏特征只保存位置, 有值的地方为1,没值的地方为0, 其将所有的训练数据读进内存，并且对稠密特征按各个特征值进行了由小至大的排序，方便进行左右分支的划分点选择

（2）树最优分支特征的选择

使用方式没有见过，教材上都是说使用GINI系数，在这里未采用可能因为效率问题，这里的方法叫什么不知道，只描述下算法，如果要拟合的值是[y1,y2,y3......yn], 使S=sum(yi^2), 遍历样本，找到划分点，得到[y1, y2...ym], [ym+1, ym+2....yn], 使分割后的S‘=sum(yi^2) + sum(yi'^2) 最大，我想其基本思想是若样本能正确划分，则在划分点左右其拟合值的符合相反，则会有S’最大，而在其余的分割点则均会小于S‘

（3）对稠密特征

稠密特征即按照上述思路，对某个节点处，属于该节点的样本找到最优的特征，最优的分割点

（4）对稀疏特征

由于稀疏特征只有0和1, 则若进行划分，则只有将0和1分别划一堆，同样按照上述的分支特征选择准则

扫描二维码关注公众号，回复： 2479643 查看本文章

（5）由于树的深度及树的个数有限，没有进行剪枝操作

猜你喜欢

转载自www.cnblogs.com/lidouer/p/9393914.html

[machine learning] GBDT实现-源码阅读

Machine Learning

论文阅读：Dual Learning for Machine Translation

Deep Learning阅读笔记：Chapter 5—Machine Learning Basics(2)

Deep Learning阅读笔记：Chapter 5—Machine Learning Basics(1)

Deep Learning - Machine Learning

Learning Path for Machine Learning

[Machine Learning]PCA 算法 python 实现

[Machine Learning]Kmeans 算法 python 实现

Machine Learning 算法实现,分类与对比

Machine learning/Deep Learning Resources

【论文阅读】Neural Machine Translation by Jointly Learning to Align and Translate

阅读笔记：A Few useful things to Know About machine Learning

Multimodal Machine Learning:A Survey and Taxonomy 综述阅读笔记

论文阅读笔记|NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

【论文阅读】SketchML: Accelerating Distributed Machine Learning with Data Sketches

论文阅读 | BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

论文阅读：《Neural Machine Translation by Jointly learning to Align and Translation》

Neural Machine Translation by Jointly Learning to Align and Translate阅读笔记

【综述阅读】A Survey of Machine Learning for Computer Architectureand Systems

【RNNsearch】neural machine translation by jointly learning to align and translate阅读与思考

Neural Networks for Machine Learning

Machine Learning Overview

Tutorials on topics in machine learning

Stanford University --- Machine Learning

machine learning相关会议

Statistical Methods for Machine Learning

Azure Machine Learning

[Machine Learning] Linear regression

DataSet in Machine Learning

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)