How to build an enterprise big data platform?

Before the word big data appeared, we often used some relational databases like SQL server, MySQL, Oracle, etc., for this processing and analysis of daily data. The traditional T-level data processing of these databases is already the same as these databases. At the limit, facing this kind of P-level and E-level data volume, it is basically powerless.

 

  Until 2005, the Hadoop project that provided the basic capabilities of big data came out, building a technical platform for fast and reliable analysis of unstructured and complex data and becoming a reality. From this time on, big data has become a reality. High-frequency hot words in Internet information technology;

 

 

 

  2. What is big data and what are the characteristics of big data

 

 

 

  Regardless of whether we are big data professionals or not, in this information age, we have to understand some big data concepts. From small stores to large countries, we are talking about big data, but people who really understand what big data are. Not too much

 

 

 

  For the concept of big data, we quote the description of McKinsey, a world-renowned consulting company:

 

 

 

  What is big data?

 

 

 

  McKinsey's definition: "A data collection that is large enough to obtain, store, manage, and analyze, which is far beyond the capabilities of traditional database software tools, and has 4V characteristics."

 

 

 

  What is 4V?

 

 

 

  Volume massive scale;

 

 

 

  Velocity rapid circulation

 

 

 

  Variety diverse types

 

 

 

  Value Low-density value

 

 

 

  2. How to formulate an enterprise's big data strategy

 

 

 

  Strategy is the guide of our work. There must be a correct strategy to execute tactically. If the strategy is wrong, then all tactics are equal to 0. Here is a summary of the six big data strategies;

 

 

 

  1. Decision-making strategy

 

 

 

  First understand the background of the company: whether the company is a private company, a state-owned company, or a listed company, how big it is, how many employees it has, whether big data is just the icing on the cake or has its value been specifically played out; in deciding whether the company is going to be a big data project When and how much to invest, these issues need to be considered clearly;

 

 

 

  2. Timing strategy

 

 

 

  Is when companies start to invest in big data construction

 

 

 

  3. Talent strategy

 

 

 

  4. Selection strategy

 

 

 

  Whether to build a self-built IDC data center, self-built private cloud, or choose Alibaba Cloud (Tencent Cloud, etc.) shared cloud platform

 

 

 

  5. Platform strategy

 

 

 

  It is the question whether we choose to build a platform or implement an application first.

 

 

 

  One principle: the closer you are to the money, the sooner you must do it;

 

 

 

  6. Management strategy

 

 

 

  Regarding the issue of whether data is renewable: how to collect data, how to store data, how to use data, data security, and the protection of user privacy and security issues;

 

 

 

  3. How companies build big data platforms

 

 

 

  Whether from the perspective of helping companies marketing or improving efficiency, from the perspective of saving corporate costs, big data has great value. When big data is done, it can promote the rapid growth of the company's business; to realize the value of this big data, To really make big data contribute to the creation of enterprises, we must first accumulate big data and collect daily business and user behavior data. As we said earlier, some data are renewable resources, but more are non-renewable resources. This requires us to manage our data assets, to build a data platform, responsible for data collection, regulation, calculation, storage, application, display, etc.;

 

 

 

  1. The big data platform is composed of three platforms plus one service

 

 

 

  (1) Tool platform, including

 

 

 

  -Operation and maintenance platform

 

 

 

  -Data acquisition platform

 

 

 

  (2) Basic platform of big data warehouse

 

 

 

  (3) Big data portal, including

 

 

 

  -Big data analysis platform

 

 

 

  -Product Application Platform

 

 

 

  (4) Service

 

 

 

  The operation and maintenance platform is mainly responsible for the business scheduling, task monitoring, metadata management, authority management, etc. of the big data platform. It is mainly composed of the systems shown in the figure; the second is the data collection platform, which is mainly responsible for collecting data into big data In the warehouse platform, the big data source of enterprises mainly obtains data from three aspects, from business systems, log collection systems, and external data sources. The sources of each aspect include several channels, as shown in the figure;

 

 

 

  Big data basic platform, traditionally also called big data warehouse platform, this part is the core of the whole big data platform;

 

 

 

  The following is a big data portal, which is an integrated platform for integrating data results, including a big data analysis platform and a big data application platform; the big data portal, as the window of the entire big data, all data research results will be displayed in this data portal Among them, this greatly facilitates the use of data by company functional personnel;

 

 

 

  User service: The people who use the data mainly include management personnel, analysts, operations personnel, product managers, technical engineers, and related parties invested by the company, or the company's external data services, which we reflect through the API interface;

 

 

 

  2. How to build a big data basic platform

 

 

 

  The basic big data platform is the core of the entire big data platform. It is the place where enterprise big data is processed, calculated, and stored. Data from various sources that were originally messy will follow certain standards after entering the basic platform. Standardized storage and processing. The big data basic platform has three core technical points. The first is a topic model, the second is a hierarchical model, and the third is a calculation model. I will give you some brief introductions below;

 

 

 

  (1) Theme model

 

 

 

  Main model detailed attachment chart

 

 

 

  Precautions for the main model design:

 

 

 

  The big theme can have several sub-themes

 

 

 

  There should be no overlap between themes, and the same characteristics should be placed in the same theme;

 

 

 

  The main body should be fully covered, able to cover all the business of the enterprise, and be able to support all application and analysis needs

 

 

 

  (A) Have completeness

 

 

 

  (B) Independence of the subject

 

 

 

  (C) Hierarchical

 

 

 

  (2) Hierarchical model

 

 

 

  The hierarchical model usually consists of 4 levels, as shown below:

 

 

 

  (A) ODL layer (operational data layer)

 

 

 

  The function is to store the data extracted from the business system. The data structure and the logical relationship of the data are basically consistent with the business system. Some solidification processing of the perspective field is realized here, such as member registration. , Registration time, and some basic data cleaning, such as some filtering of dirty data, some processing of dimensions, etc., finally generated this incremental data

 

 

 

  (B) BDL layer (basic data layer)

 

 

 

  The main function of this layer is to complete data integration based on the division of subject domains and provide a unified data basic platform. In this layer, we will complete some functions such as data cleaning, defined classification, etc.;

 

 

 

  (C) IDL layer (interface data layer)

 

 

 

  The application-oriented, unified application interface access platform, and the unified customer view are all realized at this level. The focus of this level is to realize the correlation calculation of this data across subject areas; in practice, two types of models are involved, one The class is to get data easier, we will create some denormalized topic models, we often see this wide table model, and the other is for us to achieve fast query, analysis and establishment of this more standardized The multi-dimensional analysis model, which is composed of multiple dimension tables;

 

 

 

  (D) ADL layer (application data layer)

 

 

 

  Provide differentiated data services to meet the needs of business parties. At this level, we can achieve some requirements for reports, data mining, product applications, etc.;

 

 

 

  In the traditional database era, the ADL layer is mainly implemented in RAC (ORACLE real application cluster). In the era of big data, we usually use the hbase layer for data storage;

 

 

 

  In our work, in order to reduce the responsibility of the dimensional big data platform, we usually compress the 4 layers to 3 layers. We usually merge the ODL layer and the BDL layer. We merged some things that were originally implemented in the two layers. Go to the first floor to achieve; as shown in the following figure:

 

 

 

  3. How to build a big data portal

 

 

 

  The enterprise big data portal is an integrated platform for enterprise applications. The big data portal, as a window of enterprise services, will be displayed in the big data portal in addition to the data research results, which greatly facilitates our business Personnel use and utilize this data;

 

 

 

  Enterprise big data portal includes:

 

 

 

  Mainly by precision marketing, personalized recommendation, etc.

 

 

 

  Responsible for the visual display of business data, intelligent reports, analysis of temporary access, and some models of multi-dimensional data analysis, such as user portraits, business key indicator monitoring, and some monitoring of data mining models.

Guess you like

Origin blog.csdn.net/wx_15323880413/article/details/108557980