How to start from zero and build real-time data of enterprises in Taiwan?

I have said many times before the data table to calculate vector data sets include hadoop, MPP and stream processing engine , but you will find three types of calculated data carrier carrying the connotation is not the same.

Now we all rush to the scene looking for big data, in fact, you just put any scene within a traditional offline marketing enterprises to increase a real-time dimension, it is possible to create new value point, which is the traditional large enterprise data-enabled low-hanging fruit business.

But the reality is the real-time application scenarios to achieve the threshold a little higher, this business if you can not use simple SQL to quickly implement a real-time application scenarios, the need to use 3-6 months to complete the construction of a project of real-time applications, so a lot of exploration or innovative gone.

Many companies of the three big data platform technology components hadoop, MPP and streaming have many years ago, but why would not be able to do real-time applications like access as flourishing?

Because such as IBM STREAM other stream processing engine has a certain development threshold, such as a few years ago, we do not even have a team of data stream processing developer.

Our large-scale enterprises should be able to use real-time data, it is necessary to establish a real-time data sets, real-time application data allows developers to just write a simple SQL.

How to start from zero and build real-time data of enterprises in Taiwan?

Data table

So, real-time data in the table how to do?

The following is a table of real-time data in a logical architecture, allowing you to understand, in fact, the most critical is that a layer of real-time model.

How to start from zero and build real-time data of enterprises in Taiwan?

 

1, real-time access : Different types of data require different access methods, flume + kafka is now standard, as well as other documents, databases and so on DSG technology. For example, B-domain operators have to order, call, position O domain, Internet and other types of real-time data.

2, computing framework : Here, only one of the traditional Lambda architecture, developers only need to face a framework, the difficulty of developing, testing and operation and maintenance are relatively Kappa architecture for real-time / offline integration business development capabilities, based on relative It is small, and can give full play to Flink computing framework that performs streaming, high throughput, millisecond response, batch flow characteristics of fusion.

Dividing the flow of computing components such as real-time data slice, the batch components to provide offline data model (resident memory), two kinds of data in the stream association achieve batch process.

3, real-time model : with data warehouse models, real-time model is certainly first and foremost business-oriented, such as operators have traffic operators, service reminders, to deal with competition, new pull put away, the Office of the store drainage, voice consumption, operational assessment, real-time care , real-time early warning, real-time insight, real-time recommendations and a series of real-time scenarios, you always have to based on your real-time traffic extract with common data model elements.

For example, telephone numbers of migrant workers pull a new real-time marketing, which is likely to trigger a scene for diffuse into a transportation hub and 10 minutes or more users reside marketing launch, "when a long position resides" The public elements may be the real-time model reusable.

Real-time vertical model can be divided into two DWD and DW, DWD models do actually do the operation named for all types of real-time data standardization and filter fields, facilitate standardization of data management, DW model here divided into three categories: Dynamic model, event model and time series models , each model for different scenarios, and need storage format with adaptation.

  • Dynamic model : real-time data summary statistics, suitable for real-time analysis of statistical indicators, such as real-time business transaction volume, and can be stored in Kafka and Hbase.
  • Event model : the real-time data abstracted into a series of business events, such as recording the user's location from where the log trajectory change events, which can trigger LBS location marketing, the following is a typical location of the event model design, and can be stored in the MQ and Redis :
How to start from zero and build real-time data of enterprises in Taiwan?

 

You can also design a sliding window model, such as saving the sliding window location-the-minute latest one hour:

How to start from zero and build real-time data of enterprises in Taiwan?

 

  • Timing Model : space-time online information of the user to save the main location may be based on the need for fast calculation of various business scenarios, such as a very long dwell convenient calculation, or stored in Hbase TSDb of (database sequence):
How to start from zero and build real-time data of enterprises in Taiwan?

 

4, real-time services

With real-time model is not enough data in the table also you need to provide a graphical, process-oriented, Stitchable data development tools, real-time data in order to truly reduce development costs. However, due to the different technical means off-line and real-time data processing, resulting mostly hosted in different platforms for data development and management of these two types.

For example, before our offline data model by DACP platform management, but the real-time data is free in the DACP platform, which is often part of the application itself, the application need to native data consumption and processing stream processing engine by writing a particular script this threshold process not only high, but also very serious waste of resources, each real-time applications are actually silos stream data.

Standing application point of view, the business is actually needed is the development of a unified data management platform, off-line and real-time data should be managed as a unified object, such as the ability to mix layout, mixing, etc. associated with a simple SQL-customized output various types of data required for applications to provide real-time and efficient foreign / off-line data services.

How to start from zero and build real-time data of enterprises in Taiwan?

 

5, real-time applications

If the data in the table can support fast scheduling real-time data, according to our estimates, the real-time data to develop scenarios of application, testing, deployment cycle will be reduced from 0.5 months to 1-2 days, and efficiency is very high.

Terms of the operators, because the real-time data enough, rich enough scene, the need for real-time data in a table set up is still very high.

I remember three years ago when we began to engage in real-time on campus marketing, always 3-6 months in advance to do the planning and construction of real-time applications, however, demand must be changed every year, and then the application would have to reinvent the wheel, but no knowledge preserved.

With the deepening of internal and external operations of big data, we find that more and more demand, you will be surprised to find that many times you demand along with strengthening technical capacity increases, very often, technology is the primary productive force. Many of us is responsible for the realization of products, operations manager should be a profound understanding.

Since that time, I was thinking, we can build a true real-time data sets, to create vast amounts of real-time applications quickly and efficiently, so as to enhance the management of large data and application level to a new stage, and finally we now come on this road.

Guess you like

Origin www.cnblogs.com/laoA188/p/11361935.html