background
mr engine 2 will be abandoned in the hive. The official recommended tez such as engines or spark.
select
also
Use directed acyclic graph. Memory computing.
spark
Simultaneously as batch and stream processing engine, reduce learning costs.
&& inconvenient problem
also:
Use of union or join operations in the hive sql
tez will task segmentation, each small task, create a file folder, as follows:
This will cause a very serious problem, if this table below, use this table to no avail tez, but the use of spark or mr,
Both engines are not traverse the contents of subfolders under the. Check out data is 0. And it is difficult constraints, others use the same engine,
So tez discarded in use. We select the most spark engine.