Database System Execution Model

  • The volcano model is an iterator-based query execution model, which abstracts each relational algebra operation into an operator, constructs the entire SQL statement into an operator tree, and then recursively calls the next() function from top to bottom to obtain and process data. The advantage of the volcano model is simplicity and flexibility. Each operator can implement logic independently without caring about the details of other operators. The downside of the volcano model is performance inefficiency because only one piece of data is processed at a time, resulting in a large number of function calls and data movement¹². Most relational databases use the volcano model, such as SQLite, MongoDB, Impala, DB2, SQLServer, Greenplum, PostgreSQL, Oracle, MySQL, etc.
  • The materialized model is a batch-based query execution model that processes all input data at once for each operator and outputs all results at once. The advantage of the materialized model is that it reduces the overhead of function calls and data movement, and improves CPU utilization. The downside of the materialized model is increased memory usage and latency, as each operator needs to wait for the upstream operator to complete before it can start executing. The materialized model is more suitable for OLTP workloads. These queries only access small-scale data each time and require only a small number of function calls.
  • The hybrid model is a query execution model that combines the advantages of the volcano model and the materialized model. It processes a batch of data (rather than one or all) at a time for each operator, and then outputs the batch of data to downstream operators. Mixture models are also known as vectorized or batch models¹². The advantage of the hybrid model is that it not only reduces the overhead of function calls and data movement, but also reduces the memory usage and delay, and can also use the vectorized instructions of the CPU to accelerate calculations. The hybrid model is more suitable for OLAP queries because it greatly improves query execution performance. Databases such as Presto, Snowflake, SQLServer, Amazon Redshift support this processing mode. The SQL engine of Spark 2.x also began to support the vectorized execution model.

Guess you like

Origin blog.csdn.net/weixin_47895938/article/details/132688529