Streaming computing

1. Timeliness of data

In daily work, we generally store the data in a table first, and then process and analyze the data in this table. Since the data is to be stored in the table, there is the concept of timeliness.

If we are dealing with year-level data, such as population analysis and macroeconomic analysis, it doesn't matter if the latest date of the data is a week or two, or even a month or two, tonight.

If we are dealing with day-level data, such as user preference analysis of major websites and retail supply and marketing analysis, it is generally possible to be a few days later, that is, T+N update.

If it is hourly data, the timeliness requirements are even higher. For example, financial risk control involves the safety of funds, and there must be an hourly data.

So is there anything more demanding? Of course there is. For example, for risk monitoring, the website must have a real-time monitoring system. Once there is an attack, you must take immediate measures. During Double Eleven or anniversaries, all major e-commerce platforms are experiencing severe traffic tests and must be The system conducts real-time monitoring. In addition, the real-time personalized recommendation of websites and search engines also have extremely high requirements for real-time performance.

In this scenario, the traditional data processing process-collecting data first, putting it in the DB, and then taking it out for analysis-cannot meet such high real-time requirements. Next door, there is a kind of "streaming computing" Approach.

2. Streaming calculation and batch calculation

What I just said: The traditional process of collecting data-putting it in the DB-taking it out for analysis is called batch calculation. As the name suggests, the data is stored and calculated in batches.

And streaming computing, like the name, is to perform real-time computing on data streams. It is not a faster batch computing, it can be said that it is a completely different processing idea.

By comparing with batch calculation, the principle is introduced:

(1) Accumulate numbers slowly like batch calculation

Guess you like

Origin blog.csdn.net/luolan_hust/article/details/113726529