Road Flink Learning (a) Introduction Flink

First, what is Flink?

Apache Flink is an open source computing platform for distributed data stream processing and batch data processing, providing support for streaming and batch processing are two types of application functionality.

Two, Flink Features

1, a conventional open computing solutions, will flow and batch as two different types of applications: streaming generally require low latency support, Exactly-Once guaranteed, and generally support the batch high throughput, efficient handling

2, Flink is full support for streaming, that is treated as input data stream processing flow is unbounded; while the batch was as a special stream processing, but its input data stream is defined as bounded.

Features:

1, stream processing characteristics

Support high throughput, low latency, high-performance streaming

Support window operation with event times

Exactly-Once supports semantic state calculation

It supports highly flexible window operation, support based on time, count, session, and data-driver's window operation

Support for six models with Backpressure Function

Lightweight support fault tolerance distributed snapshot (Snapshot) implemented

Support iterative calculation

Support automatic optimization procedure: to avoid the case where the characteristics Shuffle, sorting and other operations, necessary buffer intermediate results

Flink inside the JVM implements its own memory management

 

Three, Flink technology stack

1, from the deployment perspective, Flink support Local mode, the cluster mode (standalone mode or Yarn mode), cloud deployment (GCE, EC2)

2, Runtime data is the main processing engine, which in the form of API JobGraph receiver. JobGraph is a simple parallel data streams, comprising a series of tasks, each task comprising input and output (source and sink exceptions).

3, DataStream API and DataSet API and batch stream are application program interface, when the program is compiled, JobGraph. After compilation, depending on the API, the optimizer (batch or flow) will produce different execution plans. Depending on the deployment, JobGraph be submitted to the executors optimized to perform.

 

Guess you like

Origin www.cnblogs.com/yfb918/p/11351108.html