Reference, see actuators paper from Weld optimization techniques
issues that need resolving,
Current data analysis applications, will use a lot of libraries, such as Numpy, Pandas, TensorFlow, Spark, etc.
Interfaces and data structures of these libraries are not the same, so if you want to improve the performance of your application, you can only go one by one to enhance the performance of each libaries, but this is difficult to optimize the
The lack of an end to end in an optimized manner,
Weld is to do this, of course, Weld only consider performing in stand-alone memory optimization
The general idea, Weld provides a set of IR, IR set according to each bank needs to rewrite the operator, then the execution time is lazy, and only when really needed will really perform
Each library will not really go inside to perform, but to return to IR out, Weld has a RunTime, IR will collect all the libraries, the formation of Combined IR, so it does not matter, and those libraries
Weld there optimizer, Combined IR will be optimized End to End, and finally generate machine code
The idea is good,
But the premise is that every library should porting to Weld on the job, this should be difficult to achieve
weld IR
IR design is critical, as is the relational database relational algebra, is the cornerstone of the whole
IR basic element contains,
Datatype
Computation
Operator, here called the builder
Here some called builder, and some called merger, there is no essential difference
All builder support, three interfaces
merge, which add data to the builder
result, builder of the results obtained, once called result, destroy the builder is only called once
for, for parallel
IR can see this is really very simple,
Look at an example, is one such,
Benefits Weld IR, is able to express those libraries operator previously mentioned, mentioned here fuse, because the loop and separated from the builder, it is easier to fuse
Look at an example,
For the above map itself contains the loop, so call the two map, we need to loop twice
But with Weld IR, for a trip, only one loop
Weld Runtime
Stressed that the lazy, IR each of the libraries will be executed, submit it by Runtime API
Listed below, RuntimeAPI, and examples of the use of a
Only time will really execute the call to Evaluate
Weld Optimizer
We can see the optimizer database optimizer and more like
And it is divided into a rule-based adaptive
Rule-based optimization
The main thing is Fusion, the implementation of optimization is the most common word
fusion, there are two, one of which is pipelining, can be seen after pipelining do not need to produce v1
There is also a horizontal Fusion, the same input, output different
There is a more crucial optimization, it is to quantify
Adaptive optimization, somewhat similar to the CBO, but in fact we do not get here too much data to determine
Mentioned here, adaptive predication and adaptive data structures