Real-time calculation of pv/uv Demo

This article is shared by Deng Xiaoyong (Jingxing), a senior technical expert of Alibaba. It mainly uses Demo to demonstrate how to calculate pv/uv in real-time through Flink. The content will be carried out from the following parts:

  1. App calculate pv/uv scene
  2. Implementation plan (From Flink-1.11)
  3. DDL
  4. DML
  5. Practice

First, I will show you a relatively simple pv/uv scene. Take the APP shown in the following figure as an example, the entire business architecture requires several entrances, including user access entrance, author entrance, and operator entrance. Go in at the operator entrance to view some indicators of the system, such as the pv/uv of the app.

Before starting to introduce how to calculate real-time pv/uv, you can first understand the 10 fields in the above figure and their corresponding meanings. Through these fields, it can be understood that any operation of the user on the APP will leave a corresponding record in the database, and all records are the operation flow of the user on the APP.

So how to calculate pv/uv in real time?

There are two options.

Option 1, MySQL change data is synchronized to Kafka for real-time calculation. Since Flink has a duality of flow and table at the beginning of its design, after Flink version 1.1, Flink can process Kafka changed data, including processing some modifications, deletions and other operations. The processed results will be placed in Alibaba Cloud Hologress, which is convenient for users to query and analyze big data.

Solution two, from the figure above, you can see that solution one has only one more Kafka than solution two. After Flink version 1.11, you can directly connect to MySQL through Debezium, and then perform Flink real-time calculations to complete the same function.

Both schemes can be realized, so how to choose? Mainly depends on the business. If the data is only temporarily stored, the log needs to be displayed or needs to be used downstream, and needs to be saved to Kafka; if the log does not need to be backtracked, or there is no downstream use, then Option 2 is more suitable.

Actual demonstration

As shown in the figure below, we choose option two (MySQL-CDC source table method) to demonstrate.

First open the real-time computing Flink platform, click on the SQL editor on the left, and then set the 10 fields mentioned above through the Create Table method. This defines the source of the data.

After defining the source, the next step is to build the target table. As shown in the figure below, when constructing the target table, the blackhole_pv_uv table is defined, and a target end without actual storage is constructed to act as a debugging function. First, run the logic through, and then go to the target end to write code. Blackhole will absorb the output result data and deal with the source and calculation problems first.

 

The above table will fall into the Flink Catalog, and the preparation for real-time calculation of pv/uv upstream and downstream tables is complete. If you need to adjust the table, it can also be done through DDL SQL statements.

After making preparations to build the table, how to calculate the goal you want to achieve in real time? Demonstrate in the simplest way.

First write the data into Blackhole, and then calculate the 4 field values, such as cuurenttime, event_hour, etc.

 

Through the code shown in the figure above, you can calculate when the data was input, the pv/uv value of the data, and so on.

Run the job just written:

Then click Create SQL Job,

After the creation is complete, click Start.

After startup, you can click Flink UI to view the running status. When viewing, you can see that there are 8 records of data displayed at the bottom:

Back to the database, you can also see the corresponding 8 data:

How to write actual results into holo?

The core logic is the same as the logic of the real-time calculation above. The only difference is that the result of the calculation should be output to holo_pv_uv and also to the backhole, which means that the same result should be output in two copies. This is a situation often encountered in stream computing. There are even cases where different business logic or calculation results in the same job have to be output to different targets.

Open the SQL editor of the real-time calculation Flink page, create a temporary view in the input box, and record the data in the blackhole and holo.

In order to achieve this goal, it is necessary to add a grammar called begin statement set and end. This actually defines a calculation logic so that the logic tasks between them will run at the same time.

Then after completing the deployment, creating the job and starting, you can see that the calculation logic has been successful.

 

Author: Deng Xiaoyong (static line), Alibaba senior technical experts

 

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/weixin_43970890/article/details/112982078