Flink source code analysis

sequence

Flink work done by batch and streaming for some time, feeling just look at the document or forums is not a good way to debug or optimize. In fact, I did not how carefully studied the documents and forums, just feel the effect is not very good, look at the code the programmer or the most simple and direct.

For example, we built a Flink-SQL in SQL Engine, based on the non-technical people can understand SQL using SQL programmer instead of direct implementation Application, and on this basis, plus some drag on the screen, do not understand SQL non-technical personnel using the drag implement Application. the company's very large data sources, distribution channels is also very rich, we implemented a wide range of Table source (data sources) in the SQL Engine Lane, Table Sink (data distribution) and UDF (calculator), the user can really simple, Toto pulled pulled the operation of large data, calculation model, as well as publishing and on the line.

But the background is not so simple, data skew things often happen. For example, a large data source and data source to do a little Inner Join, if the data item is most big data source (such as 50%) use a very few join key, Flink optimizer will optimize SQL join into Hash Join, the end result is that no matter how many you achieve assigned TaskSlots, 50% of the data have come to a certain TaskSlot, the slow run until this Slot deplete a resource. This case is preferably a small set of data broadcast to all of the slot, the original large data sets in parallel by fragmentation. However, there is no way to specify the SQL standard joinhint, Flink sql does not support this, only to see where some changes can do to solve this problem through debug flink. We are in the last chapter, from Flink client, flink optimizer, flink run-time (job manager, task manager) step by step in setting a breakpoint in the source, debug, data will flow through it again to see what this program can be small data sets broadcast together.

In order to smooth some of the paper to read, let me through several chapters outline the Flink.

Flink source structure

Flink architecture

Flink DAG Graph data stream and

Flink Cluster Environment

 Debug Flink

Guess you like

Origin www.cnblogs.com/nightbreeze/p/10942536.html