Real-time computing large-screen display based on kafka and flink in hadoop or docker environment

Chapter 1 Overall Demand

1.1. Topic background

A stock trading institution has launched an online trading platform. The platform has nearly 10 million registered users and accepts transaction requests submitted by branch users from all over the country every day. In view of the company's development and platform management requirements, it is planned to entrust the development of an online real-time big data system that can observe stock trading big data information in real time and display some important performance data.

1.2.Data source

In order to provide a more realistic testing environment, the company's technical department has commissioned relevant personnel to design a stock trading data simulator, which can simulate and generate information about customers placing orders on the platform. The data will be automatically stored in a text file in a designated folder.

The simulator allows you to adjust the number of processes and simulate different levels of concurrency to fully test the performance of the system. The specific field descriptions of the data are detailed in the table below:

1.3.Requirements

Using real-time computing technology, different data access and real-time computing methods are used to build a big data dashboard for real-time stock trading to achieve the following functions: 

(1) Mature data dashboard open source components can be used (a license is required, such as Alibaba’s DataV platform), or a local display platform can be developed independently; the interface is required to be refreshed once per second;

(2) The interface should be beautiful and contain concise information;

(3) The information displayed should at least include the following:

a) The processing speed of orders, in "items/second";

b) The total transaction amount and number of transactions accumulated in the past minute and the day;

c) The cumulative buying and selling volume in the past minute and the day;

d) The top 10 stock information in the last minute and the cumulative transaction amount of the day;

e) The top 10 trading platforms in terms of cumulative trading volume in the past minute and the day;

f) Display the cumulative number of ordering customers across the country (by province) and display it visually on the map;

g) Display the trading volume distribution of different stock types;

h) [Optional] Provide early warning for explosive growth in trading volume of a single stock

(4) Data statistical errors (data loss, statistical errors) should not exceed 1%, and experiments should be designed to calculate the data error rate;

(5) The delay in displayed data should not exceed 30 seconds, and the latest time of the acquired data should be displayed each time it is refreshed;

(6) Test the maximum load capacity of the system, which is the maximum number of orders that your system can handle per second;

(7) Special functions, special functions added according to business scenarios and display needs.

  • case analysis

This article combines real-time computing related technologies to formulate two solutions to meet the project requirements.

2.1.Option 1

The architecture of Option 1 is shown in Figure 1 . Use kafka to directly read the data generated by the stock data simulator, and then use Strom as the stream computing platform to directly store the statistical information into the mysql database. datav directly reads the data in the mysql cloud database and displays it on the big screen .

Figure 1 Solution 1

The advantages of this program are:

  1.  Kafka is a high-throughput, low-latency distributed message queue that can quickly process and deliver large amounts of real-time data. By writing the data generated by the stock data simulator directly into Kafka, real-time data stream processing can be achieved, ensuring the timeliness of the data.
  2.  Storm is a distributed, fault-tolerant real-time computing system that supports fast and reliable processing of large-scale data streams. By using Storm as a stream computing platform, real-time statistics and calculations can be performed on stock data read from Kafka, providing instant data analysis and prediction functions.
  3.  Using Alibaba Cloud's database, you can directly connect to Datav without any other operations.

2.2.Option 2

Figure 2 Solution 2

The architecture of option 2 is shown in Figure 2. Use kafka to directly read the data generated by the stock data simulator, and then use Flink as the stream computing platform to directly store the statistical information into the mysql database using flinkjdbc. datav directly reads the data in the mysql cloud database and displays it on the big screen.

The advantages of this program are:

  1.  Flink has relatively complete support for window transactions and comes with its own window aggregation method to implement data statistics;
  2.  The integrated connection method provided by Flink with the MySQL database;
  3.  Flink provides an event-driven stream processing model, which can achieve millisecond-level low-latency processing and has high throughput, which is suitable for processing real-time data streams.
  • overall plan

The overall architecture of the solution is shown in Figure 3 , which mainly consists of five major components: data source, message middleware, stream computing system, real-time data storage and real-time data application.

The stock data simulator continuously generates stock data, and the message middleware is Kafka. Kafka reads the data generated by the stock simulator in sequence, reads the excel table once per second, and sends the newly produced data to the kafka consumer. The stream computing system chooses flink, which consumes the data produced by kafka through flink, and uses multi-threading to save the calculation results in Alibaba Cloud's Mysql database. Finally, datav is used to directly connect to Alibaba Cloud's mysql database.

image 3

  • unit implementation

4.1.Data collection

Since the stock data simulator will save the data to Excel in real time, this article chooses to establish a connection with the Excel table in the Kafka producer , read the latest data in sequence, and send the read data to the Kafka topic in topic.

Figure 4 Producer configuration information

Figure 5 Looping to read each csv file

4.2. Data distribution and subscription

After Kafka producers produce data from the stock data simulator, they use flink as a consumer to consume the content produced by kafka. The configuration of the Flink consumer is shown in Figure 6, and then Flink's sink is used to perform real-time calculations on the consumed data, as shown in Figure 7.

Figure 6 flink configuration

Guess you like

Origin blog.csdn.net/qq_63042830/article/details/135091224