Big data stream processing and real-time analysis: in-depth understanding of stream processing computing models and principles

Author: Zen and the Art of Computer Programming

1 Introduction

1.1 Overview of big data stream processing

Big data is becoming more and more popular in enterprises, finance, government and other fields, causing the amount of data to grow day by day and exerting a greater influence on business and decision-making. How to process, analyze, mine, and predict big data efficiently and quickly has become an important topic. Stream processing frameworks such as Apache Flink, Storm, and Spark Streaming have received widespread attention. Among them, Apache Flink is the most well-known stream processing framework. It is an open source distributed stream processing framework that can provide high throughput and low-latency real-time computing capabilities based on Apache Hadoop. However, there are still many inconveniences in developing stream processing applications based on Flink, such as complex development, low operating efficiency, and difficulty in debugging. At the same time, data integrity, consistency, and real-time performance cannot be achieved based on stream processing calculations. Therefore, stream processing systems need to be combined with batch processing systems and other technical means to improve processing efficiency and achieve true real-time performance. On the other hand, data lake applications developed based on batch processing systems also face the problem of being unable to meet real-time needs. Since batch processing systems lack real-time processing capabilities for event stream data, and real-time computing relies on the low latency and high throughput of stream processing systems, a balance point needs to be found in the field of big data, especially in the field of real-time computing. . This article will introduce the stream processing computing models and principles, and explain their advantages, limitations, applicable scenarios, as well as the design ideas and key technologies of the stream processing system. Readers can learn from this: What is stream processing computing? What are its features and advantages? Why use stream processing computing? What are the key components of a stream processing system? How to perform stream processing calculations? Related technologies covered in this article include data flow programming languages, message queues, microservice architecture, container orchestration tools, etc.

1.2 Introduction to knowledge structure

This article mainly includes the following three parts:

Chapter One

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132914109