Streaming big data real-time processing - technology, platform and application

Editor's note: Chen Chun, a computer application expert, a professor at the School of Computer Science and Technology, Zhejiang University, and an academician of the Chinese Academy of Engineering. He is the first batch of selected experts in the "Cross-Century Excellent Talents Training Program" of the State Education Commission, and the winner of the 3rd China Youth Science and Technology Award. He is currently the director of the National Train Intelligent Engineering Technology Research Center and a member of the Discipline Evaluation Group of the Academic Degrees Committee of the State Council. Professor Chen Chun has been engaged in cutting-edge research work in the field of computer applications for a long time. He has published more than 160 papers in famous international academic journals and conferences. He has won 1 second prize of the National Technology Invention Award, 2 second prizes of the National Science and Technology Progress Award, and the National Science and Technology Award. 1 third prize for progress and 6 first prizes for provincial and ministerial science and technology awards.

 

Today CNCC 2016 was grandly opened in Taiyuan, Shanxi Province. In the opening speech, CCF Fellow, Academician of Chinese Academy of Engineering, Professor Chen Chun from Zhejiang University gave a report entitled " Real-time Processing Technology , Platform and Application of Streaming Big Data". The following is the content of the report Refinement.

 

CNCC 2016

 

Streaming Big Data

Streaming big data From this perspective, big data can be divided into two: one is batch big data, and the other is streaming big data.

 

For example

If we regard data as a reservoir, the water in the reservoir is batch big data, and the incoming water is streaming big data.

10 years ago, from the traditional three-carriage to the present, 60 or 70 related huge ecological circles have been formed. The key point we can see is that since 2012, we have only paid attention to streaming big data, which is the mode of data flow. In the past, all big data algorithms and systems were batch big data, and it has only been dedicated to the formation of streaming big data since 2012.

 

Due to the processing of data streams , there are two main application scenarios:

 

One is the Internet, the
other is the mobile Internet

 

The mobile Internet and personalized services of the Internet have high real-time requirements to continuously improve the user experience. Generally, it is necessary to respond to samples, and the sensor data of the Internet is used for business decision-making through intelligent analysis. Before the sharing of big data, we can divide it into post-event risk and traceability, and more importantly, apply in-event analysis and processing.

 

CNCC 2016

 

A clustered and distributed solution, but its real-time response is relatively slow.

 

The other is to form streaming big data, that is, in-memory computing, but its real-time corresponding data scale is limited.

However, there are four main problems in the processing technology of big data. they are, respectively:

 

1. Running computing based on distributed memory

2. There may be many computers, and each computer has multiple CPUs. If you go on a task and perform memory calculations on the computer at the same time, it can achieve distributed storage.

3. High-performance analysis of massive historical data

4. When you flow into the reservoir in real time, not only process the data of the flow, but also build up the data that you have in the database. Because of this time window, the problem has to be recalculated, and there are complex increments of massive data to calculate.

After the data stream comes in, how to calculate the stream data together with the historical data?

The so-called big data, streaming must be calculated, and the solution can be started on an incremental basis.

 

Use models to solve practical problems

 

Teacher Zhang said that like statistical models and rule-based models, these models can be well combined. Therefore, the analytical models that implement the processing should be separated so that they can be calculated for different problems.

 

These four are the four most important issues. Our current research results - the real-time calculation of the Stream Cube, add the data time window, calculation indicators and the most core incremental calculation, that is, to solve the performance of distributed storage, and Memory-based computing is better combined.

 

flow cube

Now let's introduce the real-time processing platform for streaming big data. We know that this platform is not just a streaming cube computing engine. Combined with big data, the computing engine equivalent to Flow Cube needs to extract parts from more than 60 components to form such a platform, and at the same time add distributed storage, database, cloud processing platform for big data, and other places to form This platform, actually this platform is a very powerful system.

 

The following introduces the application, which is the framework of the flow cube application.

 

The red lines are all computational and statistical indicators that exist in streaming big data. On the left is an analysis and processing model, which can be guided by mathematical models. So, when you put a problem to solve, say:

 

If you want to learn to play Go, you can pass in the chess record of playing chess. Computing can be performed on this platform, which is an external application system.

 

It can be applied in many ways. On the original basic system, the FlowCube real-time platform can perform real-time detection of a parallel system, and conduct real-time analysis through professional knowledge and models.

 

Here are a few specific cases:

 

Financial risk control and anti-fraud

CNCC 2016

Nowadays, in electronic payment, except for Ant Financial and WeChat Pay, which are self-made risk control, basically all systems are based on Flow Cube.

Anti-reptile system

CNCC 2016

The application prospects are very broad: finance, telecommunications, transportation, public security, customs, and the Internet can all be applied.

 

experience

Real-time processing of streaming data

Real-time analysis of streaming data must have rules and models. Combining complex analysis and calculation with real-time, if done well, it will definitely accelerate the application of big data in various industries.

 

Big Data

We now have big data or sell data, and it is very important to compare the data with different analysis afterwards.

But the most important thing for our application now is to combine different spatial data to implement stream data analysis. This requires a platform to experience and improve all data (Internet, mobile Internet and Internet+).

 

For more big data and analysis related industry information, solutions, cases, tutorials, etc., please click to view >>>

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326710220&siteId=291194637