From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

Big Data technologies in the past 10 years has changed dramatically in the enterprise data storage, processing and analysis of way. Today, Big Data technology matures, covering computing, storage, digital storage, data integration, visualization, NOSQL, OLAP analysis, machine learning and other rich areas. In the future, big data engine technology but also to the container, large data machine learning, and other aspects of the data continue to extend the lake.

Recently, Tencent Big Data technologies Sharon first stop --Angel Session held in Shenzhen, Tencent large data team detailed disclosure of the ten-year development process of large data Tencent, Tencent and comprehensive picture of the third generation of full-stack machine learning platform Angel large model data technical capacity of training, deep learning, map calculation, but also deeply shared application case in the micro-channel payment, the effect of advertising, public banks and other micro scene.

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data


At the meeting, Liu Yuhong, head of Tencent large data officially released the "Spark Program", he said:. "Tencent Big Data decades of development, we continue to pursue technological innovation, cluster number from 30 to break 35,000 units in 2016 we broke the calculation Olympics said the Sort Benchmark 4 world records, the performance of the world's leading Tencent development of big data to benefit from open source in adhering to the open spirit of sharing, today we have also introduced technology sharing 'Spark Program', hope big data can help nurture and ecological prosperity. "

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

Tencent big data development "Trilogy"

As big data forefront of the field explorer, Tencent large data beginning in 2009, through off-line computing, real-time computing and machine learning in three stages, and accumulated a wealth of experience in practice.

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

据刘煜宏介绍,2009年之前,腾讯主要使用传统的关系型数据库。2009年开始,传统的单机数据库所提供的服务,在系统可扩展性、性价比方面已不再适用腾讯业务爆发式的增长。面对这种变化,腾讯大数据转向分布式,基于开源的Hadoop体系,构建了腾讯第一代大数据平台,并建设离线计算平台,主要发力规模化。腾讯大数据由此进入第一阶段。三年里,腾讯实现了从关系型数据库到自建大数据平台的全面迁移,到2012年,腾讯大数据的单集群规模突破了4400台。

2012年,移动互联网爆发,应对业务数据统计及时性、快速性的需求,腾讯大数据从Hadoop转向Spark和Storm体系,在吸收开源技术的基础上,结合腾讯自身的需求进行重写,探索流式计算、秒级采集系统的建设,构建企业级的实时数据分析体系,腾讯大数据发展进入第二阶段。

2015年至今,腾讯大数据迈入了第三阶段。随着数据挖掘、数据应用的深入,腾讯大数据再次自我迭代,于2016年推出了自研机器学习平台Angel,专攻复杂计算场景,可进行大规模的数据训练,支撑内容推荐、广告推荐等AI应用场景。它由腾讯与北京大学联合研发,兼顾了工业界的高可用性和学术界的创新性。不仅支撑腾讯自身业务需求,在行业上也具有里程碑意义。

从海量业务中来,专注图计算场景


作为面向机器学习的第三代高性能计算平台,腾讯Angel在稀疏数据高维模型的训练上具有独特优势,擅长推荐模型和图网络模型相关领域。当前业界主流的大规模图计算系统主要有Facebook的Big Graph、Power graph、Data bricks的 Spark GraphX等,但这些系统并不都支持图挖掘、图表示学习、图神经网络的三大类型算法。

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

据腾讯Angel开发负责人肖品介绍,腾讯Angel从腾讯海量业务场景中而来,是超大样本和超高维度的机器学习平台。从性能上来看,Angel优于现有图计算系统,能够支持十亿级节点、千亿级边的传统图挖掘算法,百亿边的图神经网络算法需求。它可运行于多任务集群以及公有云环境,具备高效容错恢复机制,也更容易支持新算法,同时,Angel能够较好支持图挖掘、图表示、图神经网络算法,具备图学习的能力。

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

如今,Angel已在QQ、微信支付、腾讯广告、腾讯视频等腾讯旗下产品中广泛应用,并向微众银行等行业合作伙伴全面开放,普遍适用于智能推荐、金融风险评估等图计算业务场景。

到开源中去,积极贡献社区。


发布仅一年时间,2017年,腾讯Angel就正式开源。2018年8月,腾讯将Angel捐赠给Linux旗下专注人工智能的LF AI基金会,结合基金会成熟的运营,全面升级的 Angel与国际开源社区深入互动,致力于让机器学习技术更易于上手研究及应用。

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

“Angel在2018年加入LF AI基金会进行孵化后,一直按照开源社区的模式进行运营,增长速度非常快,增加了特征工程、自动机器学习等很多新的功能,在Github上增加了超过2000个Star。” Linux Foundation APAC大中华区总监杨轩表示: “Angel是LF AI基金会下最活跃的项目之一,相信不久的将来,Angel将进入LF AI的顶级项目之列。”

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

目前,Angel在GitHub上Star数已超过5300,Fork数超过1300,总共有39位代码贡献者,提交了超过2336个commit。

面向未来,大数据、AI和云的深度融合


Tencent Angel in depth learning and evolution of computing power, but also fit with the direction of development of big data industry. According to Liu Yuhong, the future, Tencent will continue making large data data lake, stream uniform batch (batch computing, streamlined computing integration), AI + big data, cloud computing + four main directions of big data.

From relational databases to distributed machine learning, Secret decades of development history, Tencent large data

Liu Yuhong expressed: "AI, cloud computing and big data are inseparable, Angel from the big data platform to grow full-stack machine learning platform, also verify the future direction of the industry we will Tencent large capacity and technical data, and AI. , the depth of integration with the cloud to make further landing big data value, better help partners and customers. "
It is understood that Tencent big data Spark Program was initiated by Tencent, technology sharing system for big data enthusiasts, based on Tencent large 10 years of research data and operating experience to the open source community, online courses, offline salon, technology Summit and other forms, fully open Tencent ten years of accumulated technology in the field of big data. The Angel is the first special line activities Spark Program.

Guess you like

Origin blog.51cto.com/14579587/2443716