Flink Stream Batch Integrated Computing (4): Flink Functional Module

Table of contents

Flink Functional Architecture

Flink input and output


Flink Functional Architecture

Flink is a distributed computing engine with a layered architecture. The implementation of each layer depends on the services provided by the lower layer, while providing abstract interfaces and services for the upper layer to use.

Flink architecture can be divided into 4 layers, including Deploy deployment layer, Core core layer, API layer and Library layer

  1. Deployment layer: mainly involves the deployment mode of Flink. Flink supports multiple deployment modes, such as local (local), cluster (Standalone/YARN), cloud server (GCE/EC2).

You can start a single JVM, let Flink run Flink in local mode, or run in Standalone cluster mode, and also support Flink ON YARN. Flink applications are directly submitted to YARN to run. Flink can also run on GCE (Google Cloud Services) and EC2 (Amazon Cloud Services)

  1. Core layer: Provides all core implementations that support Flink computing, such as supporting distributed stream processing, mapping from JobGraph to ExecutionGraph, scheduling, etc., and provides basic services for upper-layer APIs.

The Core layer (Runtime) provides two sets of core APIs on top of the Runtime, DataStream API (stream processing) and DataSet API (batch processing)

Stateful stream processing layer: The bottom layer of abstraction only provides stateful data streams, which are embedded in the data stream API (DataStream API) through processing functions. Users can freely process single stream or multiple streams through it, and maintain consistency and fault tolerance. At the same time, users can register event time and processing time callbacks to implement complex calculation logic

  1. API layer: It mainly realizes unbounded Stream-oriented stream processing and Batch-oriented batch processing API, among which, stream-oriented processing corresponds to DataStream API, and batch-oriented processing corresponds to DataSet API.
  2. Library layer: This layer can also be called "application framework layer". It is a computing implementation framework built on top of the API layer to meet specific applications according to the division of the API layer. It also corresponds to stream processing and batch processing respectively. Two categories. Stream-oriented processing supports complex event processing (Complex Event Processing, CEP), SQL-like-based operations (Table-based relational operations); batch-oriented processing supports FlinkML (machine learning library), Gelly (graph processing).

SQL can run on both DataStreamAPI and DataSet API.

Flink input and output

The most suitable application scenario for Flink is the low-latency data processing (Data Processing) scenario: a high-concurrency pipeline processes data with millisecond-level latency and reliability.

As a member of the big data ecosystem, Flink can be used in combination with other components in the ecosystem in addition to itself. In general terms, there are input and output aspects.

The block diagrams on the left and right sides of the following figure show that the green background is the scene of stream processing, and the blue background is the scene of batch processing.

Enter Connectors on the left

Stream processing methods: including Kafka (message queue), AWS kinesis (real-time data stream service), RabbitMQ (message queue), NIFI (data pipeline), Twitter (API)

Batch processing method: including HDFS (distributed file system), HBase (distributed columnar database), Amazon S3 (file system), MapR FS (file system), ALLuxio (memory-based distributed file system)

Output Connectors on the right

Stream processing methods: including Kafka (message queue), AWS kinesis (real-time data stream service), RabbitMQ (message queue), NIFI (data pipeline), Cassandra (NOSQL database), ElasticSearch (full-text search), HDFS rolling file (rolling file )

Batch processing method: including HBase (distributed columnar database), HDFS (distributed file system)

Guess you like

Origin blog.csdn.net/victory0508/article/details/131322662