Goose plant open source pioneer, calculate the average daily volume of over 3.0 trillion effort to break the data wall

Open source, open source, open source.

This is the Tencent 2019 FIELD most intuitive change.

After the latest incident on behalf of, from Tencent's first open-source AI project Angel, complete version 3.0 of evolution, globally recognized technology expert, graduated from the Open Source Foundation LF AI, AI has become the industry's top open source projects.

This is China's first get this project approved, is triggered praise from the news that hot. However, this is only one open Tencent past achievements footnote year.

Over the past year, Tencent open source momentum become more ferocious. As of December, foreign open source project more than 92, covering all BG (Business Group), micro-letters, Tencent cloud, big data, game, AI, security and other services in which the total access to over 270,000 starred, won a reputation.

Tencent has become one of the world's open source manufacturers. In this one, large data Tencent no small contribution.

Technical support side of the business Tencent, Tencent large series of data will be a core component of their open source push towards China Tencent large data fields the most comprehensive open source vendors.

Tencent data platform, general manager of the Ministry, Jiang Jie, general manager of AMS platform, general manager of retail wisdom of strategic cooperation that will continue to promote the next, all the stuff the entire Big Data platform all open source.

Why should such a "radical" open source? Specific to the business, how they develop? And how about the open-source logic?

Jiang Jie to sum up the past year's answer can be given with reference to the answer.

General Manager of the Ministry of Tencent data platform, AMS Jiang Jie, general manager of platform

As a pioneer in open source Tencent, Tencent answer big data, but also convey the entire open-source strategy and logic Tencent. So how open source works goose, it might look Tencent big data.

Open source emboldened: Daily calculate the amount of data over 30 trillion

2019, big data platform Tencent was founded in the tenth year, has become a key support of the entire group's business from zero:

There are 15 million daily task analysis, real-time calculation of the amount of 30 trillion times, and every day the number of data access up to 35 trillion data. Distributed machine learning-based Tencent cloud platform, capable of supporting data train 1 trillion dimension.

Why is able to do this? From the strong technical strength. Tencent official statement, has experienced 10 years of development, has established a big data platform "big data + AI" twin-engine technology infrastructure, conduct in the first echelon of the industry.

In particular, one of the core projects Tencent third-generation computing platforms Angel, after the development of version 3.0, has been able to support trillion dimensional data, is also compatible with Spark, PyTorch, TensorFlow ecological, further reducing the threshold for the use, can be expanded compatibility sex.

Although only a few words can describe the system overview, but wants to build such a system, it is not simple.

"Throughout the process, you will encounter a bottleneck card, memory bottlenecks, including lost data problems," Jiang Jie said, "the system is by doing step on a pile pit, bloody lesson, a bunch of failure, only Slow slow honed. "

Jiang Jie explained that for companies like Tencent body mass, as well. Its ability to open out and technology, but also experienced a lot of challenges.

The reason is very simple, others in which you step on the pit, and you will have to trust it? "We want to be a leader, but not martyrs," Jiang Jie said.

How to do? He gives one of the words: "value-driven."

Development platform not behind closed doors, but to follow the development of business development, to drive the evolution of value-based data. The whole process is dependent on the technology business growth, technology go back and nurturing business development.

This is the path Tencent 10 years of development of big data.

From the introduction to the study and then from open source: Tencent large data usher in a turning point

Beginning in 2009, big data platform Tencent experienced off-line computing, real-time computing and machine learning three stages.

The first stage, based on the open source Hadoop system, off-line computing platform, the main force scale. The main business-oriented alternative to traditional data warehouse-based services to do the report.

This phase lasts for three years, to achieve a comprehensive migration from a relational database to self-built large data platform.

But by about 2012, mobile Internet began to hot up, further enrich the user data and user features portraits of respect.

Electricity supplier product recommendations, news recommendation algorithms for data platform put forward higher requirements, only the first phase of the report T + 1 is clearly not enough, we need to level hours, minutes, grade, second grade real-time monitoring.

Therefore, the original steering Hadoop and Spark Storm system in the absorption of open source technology, combined with Tencent own needs to be rewritten to provide real-time reporting, real-time access, real-time monitoring support. And began to explore the real-time data analysis system construction flow calculation, second-class collection system, to build enterprise-class.

This phase also lasted three years. Jiang Jie said, after this phase is completed, Tencent large data capacity has been located in the domestic first echelon.

By 2015, the amount of data for further growth, population characteristics dimensions more, there was a certain advertising recommended system bottlenecks. Big Data platform to the third stage of development, construction machine learning platform, Tencent support the business needs of data mining.

And in 2016 launched a self-developed machine learning platform Angel, specializing in complex computing scenarios, the data can be carried out large-scale training and support content recommendation, advertising and other AI recommended scenarios, established a "big data + AI" twin-engine technology architecture .

Tencent whole process big data clusters to enhance the scalability, relative to the native scheduler performance 150 times, 2016, Tencent Sort Benchmark broke four world records, marking the count force has reached world advanced level.

From business to business to form a large loop Tencent data platform technology iterations.

Thanks to open source, open source feedback, even Tencent Tencent large data circulating in the technology community, is one of the driving forces that continue to promote open source.

Efforts to break the data wall, open source iteration to the fourth generation of big data platform

2019, Tencent is also a big data platform to upgrade the fourth generation of the first year.

Jiang Jie introduced, said Tencent is studying a batch flow integration, ABC integration, as well as the research direction of the next generation of big data platform data and the Federal lake learning.

Similarly, the iteration of this platform is also from the business value drivers - Data wider coverage greater data security and privacy protection has become a new problem.

In things, the application of artificial intelligence technology, and cloud computing platform needs to have a hybrid deployment, cross-domain data sharing edges and computing capabilities.

It also hidden behind a large obstacle to the industry's largest data: data wall.

"The sharing of data out, in fact, put his back to the others, who do not want to, this is the biggest problem." Jiang Jie said.

This is the situation so dictates, over the past year, data leakage swept all walks of life, from finance, insurance, education, health care, science and technology to government spared, even up to the size of more than a billion.

After the other hand, Europe issued "General Data Protection Regulation", the entire industry on the importance of data protection is increasing.

"Can not be shared to share the case, learning is a federal direction. We hope that through hybrid deployment, drift calculations, plus the entire federal study, build strict safety control system, break the data wall."

It has made it clear business value, big data platform Tencent also commenced action. Jiang Jie said that next year will be the implementation of the federal study into the scene. At the same time, relevant research results will be out of sync open source.

Open source goose factory busy, big data platform for the Pioneer

So Tencent He Kaiyuan? Tencent large data 2019, part of the answer can be given:

First of all, thanks to the early development of large data Tencent open source project Hadoop from the first stage to the second stage of the Spark, etc., open source projects have provided help.

Secondly, large data Tencent in the development process, the strength of rapid technological development, technical strength can come up with, and give more people to use, so that the community will not be repeated pit-create the wheel tread.

It is considered the concrete operational level, but increase the angle of view, Tencent even put the entire industry as a whole, but also to get answers from different angles.

Tencent open source collaboration is one of the most important contemporary technology strategy.

For Tencent, internal open-source collaboration, in fact, is the most common underlying technology and the ability of a comb and pull through, on the one hand to reduce repeat-create the wheel, on the other hand enhance the company's R & D effectiveness and operational efficiency.

On the basis of internal coordination to build on the bottom of Tencent in promoting more and more heavy technology opening up, and constantly improve the open source governance, creating an ecological developer build of.

2019, led by Tencent large data collaboration teams Oteam, to build a large number called "sky" of data items, the data Tencent six large business groups of related systems made uniform, hoping to fight a unification has technology stack of big data platform company-wide system.

Jiang Jie said: "A good open source technology development, the need often backed by a strong company, with a certain economic strength and good business development Tencent has a strong business support, which allows us to go into. research and development of the best technology, walk in the forefront of the industry.

Currently, Tencent has an internal collaborative projects across the various technical fields, through massive user authentication. Tencent is a steady stream of high-quality open source project output to the open source community.

In August this year, Tencent Ma for the first time outside talking about open source, further demonstrates Tencent treat open source attitude:

Tencent hope to put more power in the field of scientific research, the "science and technology to perform good deeds" into the company's new mission and vision. We will internal and external open source, etc., to actively participate to build "the world's science and technology community".

Of course, for Tencent, open source is also reflected in the strategy, not only in the implementation of "science and good deeds" vision, more industrial internet layout considerations.

By worthwhile open source projects, it will attract more users to join Tencent ecology, promote the wider use of machine learning and artificial intelligence.

Tencent Tencent open source is also closely integrated with the cloud to provide more convenient basic services, tools, and open source projects for developers.

Currently, Tencent has the ability PaaS network, storage, database and other IaaS capabilities, big data, machine learning, and the upper image, voice, NLP, BI and other SaaS capabilities, opening the cloud by Tencent.

Conflicts in small things, the big fight war potential, starting in 2010, Tencent open strategy, in the year 2020 approaches, has become increasingly mature, Tencent pattern becomes bigger and bigger.

Published 129 original articles · won praise 328 · views 290 000 +

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/103849650