(Sister school to teach database system) (ix) query execution

hello Hello everyone, long time no see! Today, our "sister school to teach database system" to learn the database system query execution: sorting, selecting, de-duplication, gather and set difference operation . Sister school to teach database, never seen such a cool title, right? "Language does not startle," Yes, the title is so cool.

My little sister buried 18-year-old campus goddess of existence, the outstanding universal sport, personality gentle and virtuous. However, only I know, everybody's eyes shining little buried in the past is a hamster dressed in cloak, rolling on the floor, ate and slept and play super house women. And all this change, since the day is night.

Since then, the small underground often let me help her homework. Today she would like to know the database query execution system. This tutorial by the way I buried with little dialogue to talk about sort, select, de-duplication, gather and set difference operation .

Sorting operation

Here Insert Picture Description

Create a merge segment (CreateRuns)

The relationship between the number R segments merge divided as follows: \ lceil B R ) M \frac{B(R))}{M} \ rceil

process:

for R的每M块 do
	将这M块读入缓冲池中M页
	对这M页中的元组按排序键(sortkey)进行排序,形成归并段(run)
	将该归并段写入文件

Multiple merge

will \ lceil B R ) M \frac{B(R))}{M} \ rceil in a merged segment merge tuple

process:

将每个归并段的第1块读入缓冲池的1页
repeat
	找出所有输入缓冲页中最小的排序键值v
	将所有排序键值等于v的元组写入输出缓冲区
	任意输入缓冲页中的元组若归并完毕,则读入其归并段的下一页
until 所有归并段都已归并完毕

Analysis of Algorithms

Do not consider the results of the analysis algorithm output operations

  • Generated when the output I / O is not included in the Algorithm I / O consideration. The output may be used directly as input for subsequent operations, without having to write the file

  • Output buffer is not included in the available memory pages. If the output is directly input as a subsequent operation, then the output buffer will be included in a subsequent operation of memory pages available

I / O consideration: 3B (R)

  • When creating a merge segment, R 1 each reading, total B (R) sub-I / O
  • Each segment written to the file merge, total B (R) sub-I / O. In the merge phase, each segment 1 scan merge, total B (R) sub-I / O

Available memory pages required: B (R) ≤ M 2 M^2

  • Each segment does not exceed M merge page
  • Up to M segment merge

Multi-Core merge sort multiple times

若B( R )> M 2 M^2 , you need to perform multi-pass multi-channel external memory merge sort

  • I / O at the expense of (2m-1) B (R), where m is the number of times an algorithm executed
    Here Insert Picture Description

Multiple sorting merge optimization : When a page buffer merge all tuples have been completed, the DBMS need to read it at this time a merge segment, merging process is suspended (Suspend), until the I / O completion.
Here Insert Picture Description

Double buffering (DoubleBu ff ering)

Assign each merged segment as input a plurality of page buffer memory, and form a ring (Circular)

  • When the current buffer pages merge all tuples have been completed, DBMS just start merging tuples next buffer page

  • At the same time, DBMS will merge under a section of the file read free buffer page

Here Insert Picture Description

Select Options

Here Insert Picture Description

Block number (external memory), that is, the number of pages (pool)
B (R & lt) / V (R & lt, K) refers to the average number of blocks each different values

Scanning selection algorithm (Scanning-basedSelection) based

for R的每一块P do
	将P读入缓冲池
	for P中每条元组t do
		if t满足选择条件
		then 将t写入输出缓冲区

Here Insert Picture Description

Analysis of Algorithms

I / O Consideration: B (R) (R using cluster storage)

  • R tuples successively stored in a file
  • Each of R 1 readonly

I / O consideration: T (R) (R clustered storage is not used)

  • R tuple is not stored in the consecutive files
  • The worst case, R tuples are on different pages

Pages available memory requirements: M≥1

  • At least as a buffer, for reading each block of R

Hash-based selection algorithm (Hash-basedSelection)

algorithm

  1. Results tuples where determination tub according hash (v)

  2. In the tub page search key tuple equals v, the output buffer and writes the tuple

Here Insert Picture Description

Analysis of Algorithms

  1. I / O consideration ≈ B (R) / V (R, K)
  • K attributes have different values ​​V (R, K)
  • Each bucket average of B (R) / V (R, K) pages (very accurate estimate)
  1. Pages available memory requirements: M≥1
  • Each page requires at least one as a buffer for reading barrel

Selection algorithm index (Index-basedSelection) based on

Prerequisites for the algorithm

  • Selection condition or form K = v l≤K≤u
  • K has a property index on the relationship R

On meeting the selection criteria index search tuples, tuples and written to the output buffer

Analysis of Algorithms

  1. I / O consideration ≈B (R) / V (R, K) (if the index is a clustered index)
  • Results tuples successively stored in a file
  • K attributes have different values ​​V (R, K)
  • Results about tuple B (R) / V (R, K) pages (very accurate estimate)
  1. I / O consideration ≈T (R) / V (R, K) (if the index is non-clustered index)
  • About T (R) / V (R, K) results tuple (very accurate estimate)
  • The results are not necessarily consecutive tuples stored in a file
  • The worst case, all the result tuples are on different pages
  1. Available memory pages requires: M ≥ 1
  • At least as a buffer, for reading the B + tree nodes

Deduplication operation

Do not bring heavy projection algorithm

Here Insert Picture Description

Analysis of Algorithms

  1. I / O Consideration: B (R) (R using cluster storage)
  • R tuples successively stored in a file
  • Each of R 1 readonly
  1. I / O consideration: T (R) (R clustered storage is not used)
  • R tuple is not stored in the consecutive files
  • The worst case, R tuples are on different pages
  1. Available memory pages requires: M ≥ 1
  • At least as a buffer, for reading each block of R

Deduplication trip algorithm (One-PassDuplicateElimination)

algorithm

 for R的每一块P do
      将P读入缓冲池
      for P中每条元组t do
	      if未见过 tthen
	      	将t写入输出缓冲区

Here Insert Picture Description

Analysis of Algorithms

The selection algorithm is the same algorithm on the data access patterns based on scanning

  1. I / O Consideration: B (R) (R using cluster storage)
  • R tuples successively stored in a file
  • Each of R 1 readonly
  1. I / O consideration: T (R) (R clustered storage is not used)
  • R tuple is not stored in the consecutive files
  • The worst case, R tuples are on different pages
  1. Available memory pages required: B (δ (R)). 1 ≤M-
    R & lt mutually different tuple δ (R) must be kept available at page M-1

To re-sort algorithm (Sort-basedDuplicateElimination) based

Essentially based on the same algorithm to re-sort and merge sort multiplexer (multiwaymergesort) algorithm, the following two differences:

  • When you create a merge segment (run), sorted by the entire tuple

  • In the merge phase, the same tuple outputs only one, discarding all others

Analysis of Algorithms

  1. I / O consideration: 3B (R)
  • When creating a merge segment, R 1 each reading, total B (R) sub-I / O
  • Each segment written to the file merge, total B (R) sub-I / O
  • In the merge phase, each segment 1 scan merge, total B (R) sub-I / O
  1. Available memory pages required: B (R) ≤M ^ 2
    for each merged segment M pages does not exceed
    a maximum of M segments merge

Hash-based deduplication algorithm (Hash-basedDuplicateElimination)

Here Insert Picture Description

By weight of a barrel reason to go: some elements may be repeated in a bucket, hash process has been separated

Here Insert Picture Description
Ri weight of each bucket to put together the results obtained to the weight result R
Here Insert Picture Description

Examples

Here Insert Picture Description

Analysis of Algorithms

  1. I / O consideration: 3B (R)

When hash bucket minutes, R 1 each reading, total B (R) sub-I / O

Each bucket file write
Here Insert Picture Description
execution algorithm trip to I weight on each bucket Ri / O at the expense of B (Ri)

  1. Available memory pages required: B (R) ≤ ( M 1 ) 2 (M−1)^2
  • M-1 buckets were

  • Each bucket does not exceed M-1 block thus performed on each bucket algorithm when the train to weight, to meet the requirements of available memory pages

Gather operations

Gather operation and to re-execute the same operation on nature

  • Method 1: a trip aggregation algorithm (One-passAggregation)

  • Method 2: ordered aggregation algorithm (Sort-basedAggregation) based

  • Method 3: Hash aggregation algorithm (Hash-basedAggregation) based on

Set difference operation

Trip set difference algorithm (One-PassSetDi ff erence)

Here Insert Picture Description

algorithm

Here Insert Picture Description

Examples

Here Insert Picture Description

Analysis of Algorithms

  1. I / O Consideration: B (R) + B (S)

In constructing (Build) phase, S read only once each, total B (S) secondary I / O
in the probe (Probe) phase, R read only once each, total B (R) sub-I / O

  1. Available memory pages required: B (S) ≤ M-1
  • Find memory structure accounts for B (S) page

to sum up

We play go play, go downtown downtown, Do not take a joke learning.

This introduction of the five "magic" inquiry do: sort, select, de-duplication, gather and set difference operation . To focus on the principles and spirit of learning to seize the major operations, such as operations and to re-gather operations are essentially the same, the operation can be re-learned to comprehend by analogy, to know how to perform aggregation operations. Each operation of several algorithms, the algorithm according to the results of the analysis will be apparent and the advantages and disadvantages of different algorithms match the scene.

Published 123 original articles · won praise 1525 · Views 280,000 +

Guess you like

Origin blog.csdn.net/JAck_chen0309/article/details/105356500