The core methodology for the systematic construction of data centers

foreword

Hello everyone, I am Wang Laoshi. I have been busy with work recently and have not updated my blog. Taking the opportunity to share and communicate with the experts at Station C, I forced myself to spend time writing some content. Today, I will share the content shared at the exchange meeting. Share it with everyone.

Getting to know the data center

Speaking of China Taiwan, the concept was first extended by Ali in 2015 in the strategy of "big front, small middle Taiwan". The inspiration comes from a Finnish game company, superCell, which is a company that has successively produced popular games such as Clash of Clans and Clash Royale. Generally, 5-7 people in the company are organized into an independent development team. By integrating the public and common game materials and algorithms in the company's development process, and accumulating very scientific R&D tools and framework systems in the process, a A strong middle ground. In this way, a small team can be quickly supported to develop a new game in a short period of time. If the market observation is not good, it can also be quickly cut off. Reduce trial and error costs.

Before the existence of the middle platform, the general business support was through the foreground and the backstage. The front desk is generally an interactive system directly facing the user, such as Douyin, WeChat, etc. The background is generally the management platform for the internal professional titles of the enterprise, and the system of the core competence of enterprise management, such as CRM, ERP, etc.
insert image description here
The front desk is generally user-oriented and needs to respond quickly to user needs, innovate and iterate quickly. The backend is internal to the enterprise. In order to support more business in the foreground, the backend needs to be continuously built, and the system is constantly expanding. Therefore, the background system needs to be stable and cannot be changed at will.
insert image description here
Therefore, in the process of enterprise development, as the business continues to increase, systems such as organizational structure and hierarchy will continue to expand. There is a situation where each business unit organization occupies the top of the mountain. Walls of various business departments, business walls, and data walls have emerged, resulting in mutual closure between departments. Resource utilization is not high. The originally shared ability results in reinventing the wheel.

Rather than saying that Zhongtai is a kind of structure, it is better to say that in the process of continuous development and expansion of the company's business, it is a solution to solve the company's bloated organizational structure and resource integration for the next step of development, so that it can quickly replicate and respond to market trends. Variety.
insert image description here
After talking about Zhongtai for so long, I believe everyone has some ideas about Zhongtai, so what is Data Zhongtai?

The data center has slowly entered the field of vision of data people in 2018, and it will become more and more popular in 2020. Before talking about it, let's take a look at the difference between it and data warehouses, data lakes, and big data platforms.

In the 1990s, in order to make it easier for them to make operational analysis decisions, companies began to integrate the data of various operating stores, but because data analysis needs to aggregate data from multiple dimensions, and to save historical data and perform large-scale range queries. Traditional databases can no longer meet the data analysis scenarios, so the emergence of data warehouses is exhausted. A data warehouse is a subject-oriented, integrated, time-related, and unmodifiable collection of data in enterprise management and decision-making.

Later, with the entry of the Internet era, data began to grow explosively. And the types of data began to change. Internet spoilers represented by Google and Amazon first carried out relevant explorations. Google first published three papers, laying the foundation for data technology in the era of big data. They proposed a new, oriented A unified calculation and storage method for massive heterogeneous data for data analysis. Until the emergence of hadoop in 2005, big data technology became popular. The emergence of the data lake behind is a sign that Hadoop is becoming more mature.

With the popularization of big data technology, everyone began to pay attention to the efficiency of data development. Big data platforms are starting to emerge. By following the usage process and scenarios of big data, start the data pipeline development mode from data acquisition, data cleaning, data development, data modeling, data testing, data operation and maintenance.

Around 2016, the Internet told the development of more and more data demands and more and more application scenarios. Data and operations are called an inseparable part. In order to meet the data development needs of the business, the chimney-style development in the early stage led to the fragmentation of the data of different product lines in the enterprise. And in the process of actually using the data, it was found that in different systems of the same business line, the results of the same indicator data are different. This brings about a big problem, that is, severe data fragmentation, a large number of repeated calculations, low research and development efficiency, waste of storage resources, difficulty in connecting data across systems, and difficulty in integrating enterprise data assets if the upper layer wants to control it. More often the boss starts talking about the value of data. This year, Ali took the lead in proposing the slogan of data in Taiwan.

The core of the data platform is to avoid double counting of data, improve data sharing capabilities and empower data applications through data service. The speed of data application is no longer limited by the speed of data development, and intermediate data is no longer difficult to share or accumulated. Incubate more data applications through the data center to make them generate value.
insert image description here

The core methodology of the data center

OneData and OneService

In 2016, Ali chanted the slogan of data center, and proposed the core methodology of data center construction OneData and OneService. How to understand these definitions?

These methodologies are not clearly defined, but in our practice, we should pay more attention to what kind of problems these methodologies solve. In the process of building the middle platform, I understand it like this

OneData: All data is only processed once, and the data indicators come from a unified source.
It mainly includes the following cores:
insert image description here
formulating corresponding plans and processes for data delivery. To improve the quality of data delivery.
OneService: All data unified services, providing unified data export, and the data in the data center should be accessed in the form of API.

Data services should have the following capabilities:
insert image description here
We can look at the panorama of Alibaba's data center capabilities. Capability division: It is to establish a OneData system according to business division data, and support upper-level data applications through the unified external service middleware OneService.
insert image description here
Through the above methodology, enterprises should aim at their own current problems and summarize the capabilities and direction of the data center to achieve the purpose of unified output and governance.

Technical Support

At the same time, with the support of methodology, good tools and technologies are also needed as support. At present, our main infrastructure is as follows:
insert image description here
the data processing link is as follows:

Organization

Anything that involves the construction of the middle office is inseparable from the adjustment of the organizational structure. If you have not done data before, start from 0, and the historical burden may be relatively small. If the company is large in scale and there are multiple data islands, separate decimal warehouses. Then there needs to be a collaborative team in the organization to promote the construction of the capabilities of the data center system. The data center provides the ability to share data across departments. Therefore, the department responsible for the construction of the middle stage must also be a department independent of the business line, and the person in charge of the report must also be a high-level company, so as to ensure the stable operation of the project.

Overview of data center capacity building

In fact, the construction method of the data center is different according to the situation of each company. What we need more is to clarify what problems exist in the current data, focus on the priority of the pain point data problem, and abstract the public capabilities to build the capacity of the data platform, and carry out decentralized governance of the data to ensure that the data can come in and be managed. , well cured, visible, controllable, and shareable to increase the value of data.

Big data system construction panorama:
insert image description here
In fact, every capability in the data center is a topic, such as metadata management, data quality, data cost, and data governance. Get here first. I hope that everyone has a general understanding of the data platform today, and if there is a chance, I can give you a detailed introduction according to the capabilities of the platform.

Guess you like

Origin blog.csdn.net/b379685397/article/details/126685859