Tencent's big data overall architecture diagram, open to the public!

b8f24b626e32fcd9920ed861c8c2a940.gif

Guide: As one of the largest Internet companies in China, Tencent’s business covers all aspects of users’ daily life. Faced with such a huge amount of business data, if the data cannot be professionally processed and stored, managed, and used efficiently and orderly, if If data cannot generate its due value, then data assets will become data garbage and a burden on society and enterprises.

As one of Tencent's underlying infrastructures, the big data platform must handle tens of millions of offline data tasks and tens of trillions of real-time calculations every day, otherwise it cannot meet the needs of hundreds of millions of daily data analysis and calculations.

This article mainly introduces the construction concept and overall architecture of Tencent Big Data.

01

The construction concept of Tencent big data

When the project was established, we had a heated discussion on whether to develop independently or use open source, "To be, or not to be: that is the question". At that time, the business demand was relatively urgent. In the first half of 2009, QZone introduced the "Happy Farm" business and started a crazy growth model. Colleagues in the business department smiled at the almost vertical growth curve, but we couldn't laugh when we looked at the curve. . How to quickly build a new data warehouse to meet the computing needs of rapid business growth, we are trying to find the answer.

From 2008 to 2009, open source has not yet become popular in China, and many programmers have a prejudice that using open source has no technical content. Almost all programmers have a dream and pursuit in their hearts, hoping to realize a top-notch system by themselves, so as to become famous in the software industry in China and the world. However, after taking stock of business needs and comparing the capabilities of the team and the manpower that could be deployed at that time, we found that implementing such a system was tantamount to reaching the sky. Completely independent research and development of a new generation of data warehouse is a difficult mountain to climb.

This way is blocked, so we can only change to the open source route. In fact, open source has many benefits. It has rich community resources and community ecology, and has a large number of code contributors. Using an open source system is equivalent to utilizing the resources of the world and the wisdom of programmers all over the world. Using open source projects, you can quickly build a platform that adapts to business needs.

But open source is not easy for us. First of all, the technology stack is different. Our original C/C++ technology stack was used for billing systems, and the big data open source is basically based on Java, which needs to be learned from scratch. Fortunately, the difference in language is not difficult to overcome. While learning, I recruited developers with big data experience, and started to do it slowly; in addition, the big data ecosystem is very large, and each project is not enough to meet the needs of the enterprise level, and each project requires a lot of optimization. In order to meet our usability needs.

From the initial toddler to the present, Tencent Big Data has gone through more than ten years and experienced three generations of technological evolution. The first generation is "bringing doctrine" , which can be used immediately, but some systems, such as HDFS (Hadoop Distributed File System, Hadoop Distributed File System), Hive, etc., cannot meet the needs of performance and functions, so we have customized the core modules The second generation is the stage of limited independent research and development . We conduct reference independent research and development on some core platforms, reconstruct the real-time acquisition system, and rewrite the underlying real-time computing engine Storm using Java, etc.; the third generation is In the stage of purely independent research and development, the third-generation core platform—the high-performance distributed machine learning platform Angel , is jointly developed by Tencent, Peking University and other universities, and has complete intellectual property rights.

We have always been beneficiaries of open source, from Hadoop to Spark to Storm... Our development is inseparable from the community. When we were weak, we relied on the open source community, and when we grew up, we actively gave back to the community. In fact, as early as 2014, we open sourced Tencent's own version of Hive, which is very popular for its compatibility with Oracle syntax. Angel, our third-generation core high-performance distributed machine learning platform, was open sourced in 2017, and was further donated to the Linux Foundation in 2018. In 2019, we open sourced four major platforms in one go: real-time data collection platform TubeMQ (donated to the Apache community), resource management platform TKEStack, distributed database TBase, and Tencent's version of OpenJDK—Kona JDK. We have dozens of project PMCs and committers and an even larger number of contributors contributing code to the community every day.

Technical collaboration through open source can gather talents. A good project can attract many excellent developers, which is conducive to the formation of an excellent technical ecology and the promotion of technological progress. That's why we chose open source.

Coming from open source, giving back to open source, and adhering to open source, this can be said to be the technical concept . Another technical concept is: everything should be used by the business .

We stubbornly believe that technology is worthless if it cannot be used by the business. Our self-developed Angel project started from the fact that there was no machine learning platform that met our business needs in the open source community at that time. The self-developed project was because it was valuable to the business, not because it was technically challenging and we wanted to prove our own technology. Very good. Angel has been used by more than 100 companies and organizations since it was open-sourced in 2017, including Huawei, Xiaomi, OPPO, Sina Weibo, Pinduoduo, etc., giving full play to the value of Angel outside of Tencent.

02

Overall Architecture of Tencent Big Data

As mentioned earlier, the development of Tencent's big data for more than ten years has experienced three generations of technological evolution, as shown in Figure 1.

412c0a648aacf9a8b9753bce018ea07a.png

▲Figure 1 Tencent’s three generations of big data technology evolution

From 2009 to 2011, the first-generation architecture mainly carried offline computing tasks , as shown in Figure 2.

TDW is mainly built on the basis of Hadoop. We have mainly optimized in two aspects: first, we have expanded the cluster scale , including enhanced cluster scalability, optimized scheduling performance, enhanced disaster recovery capabilities, and reduced storage capacity through differentiated storage. The second is to use the surrounding ecology to lower the application threshold , build a supporting scheduling and development platform, compatible with Oracle syntax, and integrate PostgreSQL database to improve the analysis performance of small data volumes. The first-generation platform can be summed up as follows: Technically, it mainly meets the needs of offline computing. The technical challenge is mainly to continuously expand and optimize the cluster scale. The scale of a single cluster ranges from dozens to hundreds, and then to several thousand.

6e03e0c300921a46beb57ec5e35c78ca.png

▲Figure 2 Architecture of the first generation offline computing platform

From 2012 to 2014, the second-generation architecture extended platform capabilities to support real-time computing requirements on the basis of carrying offline computing , as shown in Figure 3.

833c8c41a8a28fb0c682264710a3f2ca.png

▲Figure 3 Architecture of the second-generation real-time computing platform

Based on the first-generation offline computing platform, we built the second-generation real-time computing platform by integrating Storm and Spark. The main evolution is as follows.

1) Integrated with Spark , the performance of offline computing is higher than that of Hadoop.

2) Introduce Storm to support streaming computing tasks at the second/millisecond level.

3) The real-time collection system TDBank has been established , and the data collection has achieved a leap from the day level (T+1) to the second level.

4) In terms of resource and task scheduling support , the platform supports offline and online mixed deployment, task containerization, and the dimension of resource management supports CPU, memory, network and I/O, which further improves the lightweight, agility and flexibility of the platform. Greatly improved platform utilization and reduced costs.

From 2015 to 2019, the third-generation architecture began to support AI scenarios such as machine learning and deep learning in addition to general big data computing. Big Data and AI were gradually integrated at the platform level, as shown in Figure 4.

70b735b14627348b5c42bf4b12f264cb.png

▲Figure 4 The third-generation machine learning computing platform

On the basis of the second-generation real-time computing platform, Angel, a machine learning platform, was independently developed, and the third-generation machine learning computing platform ecosystem was built with Angel as the core. The main evolution is as follows.

1) We cooperated with Peking University to independently develop a high-performance distributed machine learning platform . The platform supports one billion to ten billion dimensional models, supports data parallelism and model parallelism, and supports online training. At the same time, in addition to supporting traditional machine learning, it also expands to support deep learning, graph computing and other functions, and has full-stack AI capabilities. It has a friendly programming interface, a rich algorithm library, and builds a one-stop development and operation environment on the upper layer, supporting a variety of popular computing frameworks in the industry. Angel was fully open-sourced in June 2017 and donated to the Linux Foundation in 2018. On December 20, 2019, it officially graduated from the LF AI Foundation (Linux Foundation Artificial Intelligence Foundation), the top foundation in the AI ​​field under the Linux Foundation, and became a Chinese The first open source project to graduate from the LF AI Foundation means that Angel has been recognized by global technical experts and has become one of the world's top AI open source projects.

2) At the resource management level, in addition to CPU, it also supports heterogeneous devices such as GPU and FPGA . We are the first to implement GPU virtualization in China and the technology is relatively leading (see our paper "GaiaGPU: Sharing GPUs in Container Clouds" published at IEEE ISPA2018).

3) Big data is closely integrated with the database , using PostgreSQL-based distributed database PGXZ (later renamed TBase, and open sourced in 2019), supporting HTAP (Hybrid Transaction and Analytical Processing, hybrid transaction and analytical processing), making TDW more Well support OLTP (On-Line Transaction Processing, online transaction processing) calculation.

As of 2019, Tencent's big data has gone through ten years and is still evolving. We are exploring the road to next-generation computing platforms, we are exploring batch flow integration, we are exploring cloud-native big data, and we are also trying AI, The combination of big data and cloud computing and the combination of software and hardware, we are still researching cutting-edge technologies such as data lakes and privacy computing... Big data, artificial intelligence and cloud computing are becoming the infrastructure supporting business development, and the next generation will be even more exciting.

This article is excerpted from "The Way of Tencent Big Data Construction", (ISBN: 9787111710769).

5d44b29938126273cb7ab39899b9ad27.png

Recommendation: Officially produced by Tencent! Tencent's big data construction method disclosed for the first time! Tencent's big data platform has honed its sword for ten years and implemented the implementation plan of "Technology for Good"

34ab6c500664723f74d9e49f76277ce6.gif

More exciting reviews

书讯 |8月书讯(上) | 重磅新书来袭!书讯 |8月书讯(下) | 重磅新书来袭!资讯 |《Java核心技术》基于Java 17全面升级!干货 |再见了Java8,Java17:我要取代你干货 | 李三红:Java版本升级需要纳入到可持续性维度
干货 |市面上的大前端岗位到底是做什么的?新书 |全球首本系统介绍对偶学习理论、算法、应用的著作

7d37e984182e757df01a45ac138aa29c.gif

Guess you like

Origin blog.csdn.net/hzbooks/article/details/126434389