XL-LightHouse and Flink and ClickHouse streaming big data statistics system

A Flink task can only process one or a few data streams in parallel, while a XL-LightHouse task can process tens of thousands or hundreds of thousands of data streams in parallel;

A Flink task can only implement one or a few data indicators, while a single XL-LightHouse task can support large batches of tens of thousands of data indicators.

1、XL-LightHouse :

  •  1. There is no need to use Flink, Spark, ClickHouse or bloated and cumbersome solutions based on Redis to run data;
  •  2. You no longer need to be tired of dealing with data statistics needs that are not of great benefit to personal value improvement. It can help you get rid of trivial and repetitive data statistics needs, so as to focus on data that are more valuable to personal improvement and business development. matter;
  •  3. Easily help you achieve any fine-grained monitoring indicators, which is a good helper for you to monitor service operation status and troubleshoot various business data fluctuations and indicator anomalies;
  •  4. Cultivate data thinking, assist you in establishing a data indicator system for the work you are engaged in, quantify work output, become a professional and rigorous workplace person, and create greater personal value;

2. Although streaming statistics is a calculation form of streaming computing,

        Streaming statistics are nothing more than Count operation, Sum operation, Bitcount operation (count distinct), Max operation, Min operation, Avg operation, Seq operation (time series data), Dimens operation (dimension division), Limit operation (topN/lastN)

3. Flink has flaws when used for streaming statistics.

3-1. Low resource utilization rate

Flink's low resource utilization can be viewed from two perspectives. One is the topology of cluster operation, and the other is the characteristics of Flink task execution.

3-2. Low computing performance

3-3. Access costs are high

(1) Flink is aimed at professional big data R&D personnel. The realization of a large number of statistical indicators requires a large amount of R&D costs.
(2) Since Flink’s basic functions in the field of streaming statistics are not perfect, in many scenarios, developers need to make specific optimizations based on factors such as the amount of data in the statistical task, the granularity of the statistical cycle, and the data skew. Therefore, Flink is used to implement many similar functions. Due to differences in data volume and statistical periods, the implementation methods of the program may also be completely different.

3-4. High operation and maintenance costs and high computing resource costs

Compared with XL-LightHouse, Flink's operation and maintenance costs are higher, which is reflected in several aspects:
(1) To achieve the same streaming statistics requirements, the Flink cluster size is significantly larger than that of XL-LightHouse, resulting in increased operation and maintenance costs.
(2) Since the Flink cluster is oriented to professional R&D personnel, the operation of the Flink cluster is jointly participated by cluster maintainers and Flink task R&D personnel. If the cluster needs to perform version upgrades, cluster expansion, daily maintenance, data migration and other operations, it is required Communicate with the R&D personnel in advance and reach a tacit understanding. Many operations similar to version upgrades will involve the upgrading and transformation of related tasks. If the cluster is large in scale, involves R&D personnel, and has many related tasks, then this process will inevitably cost a lot of maintenance.

4. ClickHouse has flaws when used for streaming statistics.

  • Characteristics of ClickHouse applicable scenarios
    (1) A single or a small number of application scenarios, and each application scenario has a massive amount of data;
    (2) Business scenarios have a large number of dimension fields, which may need to be divided into dozens or even dozens of fields. Dimensions can be combined at will to perform multi-dimensional ad hoc query operations;
    (3) Business scenarios require detailed queries;
    (4) There may be requirements for join queries between different data sources;

  • Disadvantages of ClickHouse
    (1) Since each query requires traversing massive amounts of data, concurrency support is limited;
    (2) Since the system stores massive amounts of detailed data, the cluster size is large, the structure is complex, and the maintenance cost is high;
    (3) Each query Each query must traverse the data and perform real-time statistical calculations, which requires a large amount of memory and CPU resources;
    (4) Data access needs to be optimized at various levels, with a high threshold for use, and is intended for professional big data R&D personnel;
    (5) The access cost is high, the maintenance cost is high, the server cost is high, the usage threshold is high, and it is not friendly to small and medium-sized enterprises;

5. Features of XL-LightHouse

(1) Can support high concurrent query statistical results

(2) Detailed query is not supported. If you want to support detailed query, you need to use other tools to achieve it.

(3) Detailed query is not supported. If you want to support detailed query, you need to use other tools to achieve it.

6. Application scenario statistics

Clicks:
1. Every 5 minutes_clicks
2, every 5 minutes_each ICON_clicks
3, every hour_clicks
4, every hour_each ICON_clicks
5, every day_total clicks
6, every day_ Each Tab_Total clicks
7, Every day_Each ICON_Total clicks

Click UV:
1. Every 5 minutes_Click UV
2. Every hour_Click UV
3. Every hour_Each ICON_Click UV
4. Every day_Total click UV
5. Every day_Each ICON_Total click UV

Statistics on successful payment orders

Order volume:
1. Every 10 minutes_Order volume
2. Every 10 minutes_Each merchant_Order volume
3. Every 10 minutes_Each province_Order volume
4. Every 10 minutes_Each city_Order volume
5. Every hour_Orders Volume
6, order volume per day
7, order volume per merchant per day
8, order volume per province per day
9, order volume per city per day
10, order volume per price range per day 11, order volume
per application per day Scenario_Order volume

Transaction amount:
1. Every 10 minutes_Transaction amount
2. Every 10 minutes_Each merchant_Transaction amount top100
3. Every 10 minutes_Each province_Transaction amount 4.
Every 10 minutes_Each city_Transaction amount
5. Every hour_ Transaction amount
6, hourly_each merchant_transaction amount
7, every day_transaction amount
8, every day_each merchant_transaction amount 9
, every day_each province_transaction amount
10, every day_each city_transaction amount
11, every day_every merchant Application Scenario_Transaction Amount

Number of users placing orders:
1. Every 10 minutes_Number of users placing orders
2. Every 10 minutes_Each merchant_Number of users placing orders
3. Every 10 minutes_Each province_Number of users placing orders
4. Every 10 minutes_Each city_ The number of ordering users per
hour is 5. The number of ordering users per hour is
6. The number of ordering users per day is
7. The number of ordering users per day per merchant is
8. The number of ordering users per day per province is 9. The number of ordering users
per day per city is 9. Number of ordering users
10, per day_each price range_number of ordering users
11, per day_each application scenario_number of ordering users

project address:

https://github.com/xl-xueling/xl-lighthouse

https://github.com/xl-xueling/xl-lighthouse.git

https://gitee.com/mirrors/XL-LightHouse.git

Reference documentation:

1. Project introduction
2. Git address
3. Communication community
4. Project design
5. One-click deployment
6. Use of XL-Formula
7. Web service operation instructions
8、Hello World
9. Applicable scenarios
10. Copyright statement
11. Use feedback
12. Dependent components

Guess you like

Origin blog.csdn.net/ejinxian/article/details/132775981