Flink say a few good spark but do not make a scene

Fans often ask me the option flink and spark streaming?

Selection for the novice business is a very difficult thing, very simple for experienced and often think of people.

Selection of time to prepare personal knowledge:

1. depth understanding of the framework.

2. In-depth understanding of the surrounding ecological framework.

3. In-depth understanding of your own business scenarios.

Take flink and spark streaming for instance, if the understanding of its design inspiration will be very simple to understand that finding out:

spark started the batch is done, and then create a stream in the form of micro-batch process. Use scene is very obvious, allow a little delay, batch processing, throughput priority, and spark streaming contributor to so many is still very stable.

flink stream processing is started, and then stream processing of inspiration to create a batch. It is very suitable for the high real-time scene. At present there is still a bug.

Such seemingly still very abstract, with regard to the specific scene for instance, flink do good while doing well spark streaming:

1. The global deduplication, global aggregation operations, such as distinct, uv and other business scenarios. Suitable flink, spark streaming done much trouble, to the latter by means of a state or a third party operator storage, such redis, alluxio like.

2. require multiple windowing and outputting the same window. This can trigger flink's, spark streaming too much trouble.

3. Only one treatment. spark streaming implement only one process mostly depends on the output of the idempotent. And flink, may be realized in the nature of things sink through its distributed checkpoint, ie distributed two-phase commit protocol. Of course, flink may also be implemented using only one process is idempotent sink.

4. easier to achieve complete support sql ddl, dml, etc., then proceed to complete sql for business development, similar to blink. spark streaming needs of micro-batch rdd into a table, a small table is a temporary, not globally.

5. state management. flink can easily use the back-end document management to achieve a large state, but it can cause frequent episodes bug linux operating system files. Of course, spark streaming flexibility to use third-party interfaces such as alluxio are also very convenient.

6. dots. Asynchronous the IO, the output measure, the iteration streams. A strong business-related.

What scenarios need to be supplemented, think of it then.

Guess you like

Origin www.cnblogs.com/niejingsong/p/11469975.html