The Road to Big Data, Alibaba Big Data Practice Reading Notes --- Chapter 13, Computing Management

  • At present, there are more than 2 million tasks in the internal MaxCompute cluster, and storage resources and computing resources are consumed every day. How to reduce the consumption of resources, improve the performance of task execution, and increase the output time of tasks are the goals pursued by the computing platform and ETL development engineers;

 

1. System optimization

  • Distributed computing systems such as Hadoop generally evaluate resources in a static manner based on the amount of input data. Map tasks are used to process inputs. For ordinary Map tasks, the evaluation generally meets expectations;

  • For the Reduce task, the input comes from the Map output. At that time, it can generally only be evaluated based on the Map task input. It often differs greatly from the actual number of resources required, so when the task is stable, you can consider giving the task The historical implementation of the resource evaluation, that is, the use of HBO (History-Based Optimizer, history-based optimizer);

  • When it comes to CBO (Cost-Based Optimizer, cost-based optimizer), first of all think of Oracle's CBO. Oracle will receive statistical information such as tables, partitions, and indexes to calculate the cost of each execution method (Cost), and then choose the best execution method among them;

  • HBO

    • HBO allocates more reasonable resources, including memory, CPU, and number of instances, based on the task's historical execution.

    • HBO is an optimization of cluster resource allocation, which can be summarized as: task execution history + cluster status information + optimization rules-> better execution configuration;

    • 1. Background

      • MaxCompute original resource allocation strategy

        • Under the default instance algorithm, small tasks waste resources, but large tasks lack resources;

      • HBO's proposal

        • Through data analysis, it is found that there are a large number of periodically scheduled scripts (the physical plan is stable) in the system, and the input of these scripts

Guess you like

Origin blog.csdn.net/u012965373/article/details/105478292