Kafka+Storm+HDFS Integration Practice-Building a Big Data Real-time Analysis and Processing System

In many application scenarios based on the Hadoop platform, we need to perform offline and real-time analysis of data. Offline analysis can easily use Hive to achieve statistical analysis, but Hive is not suitable for real-time requirements. Real-time application scenarios can use Storm, which is a real-time processing system that provides a computing model for real-time processing applications that can be easily programmed. In order to unify offline and real-time computing, in general, we hope to unify the collection of offline and real-time computing data sources as input, and then analyze and process the data flow through the real-time system and the offline analysis system respectively. We can consider connecting the data source (such as using Flume to collect logs) directly to a message middleware, such as Kafka, which can integrate Flume+Kafka, Flume as the producer of the message, and publish the produced message data (log data, business request data, etc.) Then, by subscribing, use Storm's Topology as the message consumer to process the following two demand scenarios in the Storm cluster:

directly use Storm's Topology for real-time analysis and processing of data,
integrate Storm+HDFS, and After the message is processed, it is written to HDFS for offline analysis and processing. For
real-time processing, you only need to develop a topology that meets business needs. Here, we mainly configure practices from the installation and configuration of Kafka and Storm, as well as the integration of Kafka+Storm, Storm+HDFS, and Kafka+Storm+HDFS, to meet some of the above requirements.

Kafka+Storm+HDFS integration practice

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326892279&siteId=291194637