Flink’s theoretical basis, usage, architectural design and future development direction

Author: Zen and the Art of Computer Programming

1 Introduction

In April 2017, the Apache Foundation announced the open source Apache Flink, which is a distributed computing framework that can effectively perform data analysis in various application scenarios such as stream processing, batch processing, machine learning, and graph processing. Its architecture and functions are developing rapidly. I believe that with the vigorous development of cloud computing and big data, Flink will become one of the more noteworthy computing engines after Hadoop MapReduce and Storm. However, when Flink was first launched, many companies and developers thought it was a flashy product, and even claimed that it was just an improved version of Kafka or Storm. This article will elaborate on the theoretical basis, usage, architectural design and future development direction of Flink.

2. Explanation of basic concepts and terms

  1. Definition and Introduction
  • What is Flink?
    Flink is an open source distributed computing framework initiated by the Apache Foundation. It provides a distributed environment that supports data analysis work in various application scenarios such as stream processing, batch processing, machine learning, and graph processing. It was originally invented by Storm and contributed to the Apache Foundation in 2015. Currently, Apache Flink has exceeded 30,000 stars on GitHub and has also received widespread attention in China.
  • Flink architecture and features
    Flink’s architecture is divided into two modules: JobManager and TaskManager. JobManager is a task scheduler, responsible for receiving jobs submitted by users and assigning them to each node for execution. TaskManager is an independent process running on each node, responsible for receiving and executing tasks assigned by JobManager. On this basis, Flink provides a rich API to implement various advanced features, such as window calculation, state management, stream processing, batch processing, machine learning, etc.
    The main features of Flink are as follows:
  • Supports high throughput and low latency event-driven stream processing: based on

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132114554