hello Hello everyone, long time no see! Today, our "sister school to teach database system" to learn the database system query execution: sorting, selecting, de-duplication, gather and set difference operation . Sister school to teach database, never seen such a cool title, right? "Language does not startle," Yes, the title is so cool.
My little sister buried 18-year-old campus goddess of existence, the outstanding universal sport, personality gentle and virtuous. However, only I know, everybody's eyes shining little buried in the past is a hamster dressed in cloak, rolling on the floor, ate and slept and play super house women. And all this change, since the day is night.
Since then, the small underground often let me help her homework. Today she would like to know the database query execution system. This tutorial by the way I buried with little dialogue to talk about sort, select, de-duplication, gather and set difference operation .
Sorting operation
Create a merge segment (CreateRuns)
The relationship between the number R segments merge divided as follows:
process:
for R的每M块 do
将这M块读入缓冲池中M页
对这M页中的元组按排序键(sortkey)进行排序,形成归并段(run)
将该归并段写入文件
Multiple merge
will in a merged segment merge tuple
process:
将每个归并段的第1块读入缓冲池的1页
repeat
找出所有输入缓冲页中最小的排序键值v
将所有排序键值等于v的元组写入输出缓冲区
任意输入缓冲页中的元组若归并完毕,则读入其归并段的下一页
until 所有归并段都已归并完毕
Analysis of Algorithms
Do not consider the results of the analysis algorithm output operations
-
Generated when the output I / O is not included in the Algorithm I / O consideration. The output may be used directly as input for subsequent operations, without having to write the file
-
Output buffer is not included in the available memory pages. If the output is directly input as a subsequent operation, then the output buffer will be included in a subsequent operation of memory pages available
I / O consideration: 3B (R)
- When creating a merge segment, R 1 each reading, total B (R) sub-I / O
- Each segment written to the file merge, total B (R) sub-I / O. In the merge phase, each segment 1 scan merge, total B (R) sub-I / O
Available memory pages required: B (R) ≤
- Each segment does not exceed M merge page
- Up to M segment merge
Multi-Core merge sort multiple times
若B( R )> , you need to perform multi-pass multi-channel external memory merge sort
- I / O at the expense of (2m-1) B (R), where m is the number of times an algorithm executed
Multiple sorting merge optimization : When a page buffer merge all tuples have been completed, the DBMS need to read it at this time a merge segment, merging process is suspended (Suspend), until the I / O completion.
Double buffering (DoubleBu ff ering)
Assign each merged segment as input a plurality of page buffer memory, and form a ring (Circular)
-
When the current buffer pages merge all tuples have been completed, DBMS just start merging tuples next buffer page
-
At the same time, DBMS will merge under a section of the file read free buffer page
Select Options
Block number (external memory), that is, the number of pages (pool)
B (R & lt) / V (R & lt, K) refers to the average number of blocks each different values
Scanning selection algorithm (Scanning-basedSelection) based
for R的每一块P do
将P读入缓冲池
for P中每条元组t do
if t满足选择条件
then 将t写入输出缓冲区
Analysis of Algorithms
I / O Consideration: B (R) (R using cluster storage)
- R tuples successively stored in a file
- Each of R 1 readonly
I / O consideration: T (R) (R clustered storage is not used)
- R tuple is not stored in the consecutive files
- The worst case, R tuples are on different pages
Pages available memory requirements: M≥1
- At least as a buffer, for reading each block of R
Hash-based selection algorithm (Hash-basedSelection)
algorithm
-
Results tuples where determination tub according hash (v)
-
In the tub page search key tuple equals v, the output buffer and writes the tuple
Analysis of Algorithms
- I / O consideration ≈ B (R) / V (R, K)
- K attributes have different values V (R, K)
- Each bucket average of B (R) / V (R, K) pages (very accurate estimate)
- Pages available memory requirements: M≥1
- Each page requires at least one as a buffer for reading barrel
Selection algorithm index (Index-basedSelection) based on
Prerequisites for the algorithm
- Selection condition or form K = v l≤K≤u
- K has a property index on the relationship R
On meeting the selection criteria index search tuples, tuples and written to the output buffer
Analysis of Algorithms
- I / O consideration ≈B (R) / V (R, K) (if the index is a clustered index)
- Results tuples successively stored in a file
- K attributes have different values V (R, K)
- Results about tuple B (R) / V (R, K) pages (very accurate estimate)
- I / O consideration ≈T (R) / V (R, K) (if the index is non-clustered index)
- About T (R) / V (R, K) results tuple (very accurate estimate)
- The results are not necessarily consecutive tuples stored in a file
- The worst case, all the result tuples are on different pages
- Available memory pages requires: M ≥ 1
- At least as a buffer, for reading the B + tree nodes
Deduplication operation
Do not bring heavy projection algorithm
Analysis of Algorithms
- I / O Consideration: B (R) (R using cluster storage)
- R tuples successively stored in a file
- Each of R 1 readonly
- I / O consideration: T (R) (R clustered storage is not used)
- R tuple is not stored in the consecutive files
- The worst case, R tuples are on different pages
- Available memory pages requires: M ≥ 1
- At least as a buffer, for reading each block of R
Deduplication trip algorithm (One-PassDuplicateElimination)
algorithm
for R的每一块P do
将P读入缓冲池
for P中每条元组t do
if未见过 tthen
将t写入输出缓冲区
Analysis of Algorithms
The selection algorithm is the same algorithm on the data access patterns based on scanning
- I / O Consideration: B (R) (R using cluster storage)
- R tuples successively stored in a file
- Each of R 1 readonly
- I / O consideration: T (R) (R clustered storage is not used)
- R tuple is not stored in the consecutive files
- The worst case, R tuples are on different pages
- Available memory pages required: B (δ (R)). 1 ≤M-
R & lt mutually different tuple δ (R) must be kept available at page M-1
To re-sort algorithm (Sort-basedDuplicateElimination) based
Essentially based on the same algorithm to re-sort and merge sort multiplexer (multiwaymergesort) algorithm, the following two differences:
-
When you create a merge segment (run), sorted by the entire tuple
-
In the merge phase, the same tuple outputs only one, discarding all others
Analysis of Algorithms
- I / O consideration: 3B (R)
- When creating a merge segment, R 1 each reading, total B (R) sub-I / O
- Each segment written to the file merge, total B (R) sub-I / O
- In the merge phase, each segment 1 scan merge, total B (R) sub-I / O
- Available memory pages required: B (R) ≤M ^ 2
for each merged segment M pages does not exceed
a maximum of M segments merge
Hash-based deduplication algorithm (Hash-basedDuplicateElimination)
By weight of a barrel reason to go: some elements may be repeated in a bucket, hash process has been separated
Ri weight of each bucket to put together the results obtained to the weight result R
Examples
Analysis of Algorithms
- I / O consideration: 3B (R)
When hash bucket minutes, R 1 each reading, total B (R) sub-I / O
Each bucket file write
execution algorithm trip to I weight on each bucket Ri / O at the expense of B (Ri)
- Available memory pages required: B (R) ≤
-
M-1 buckets were
-
Each bucket does not exceed M-1 block thus performed on each bucket algorithm when the train to weight, to meet the requirements of available memory pages
Gather operations
Gather operation and to re-execute the same operation on nature
-
Method 1: a trip aggregation algorithm (One-passAggregation)
-
Method 2: ordered aggregation algorithm (Sort-basedAggregation) based
-
Method 3: Hash aggregation algorithm (Hash-basedAggregation) based on
Set difference operation
Trip set difference algorithm (One-PassSetDi ff erence)
algorithm
Examples
Analysis of Algorithms
- I / O Consideration: B (R) + B (S)
In constructing (Build) phase, S read only once each, total B (S) secondary I / O
in the probe (Probe) phase, R read only once each, total B (R) sub-I / O
- Available memory pages required: B (S) ≤ M-1
- Find memory structure accounts for B (S) page
to sum up
We play go play, go downtown downtown, Do not take a joke learning.
This introduction of the five "magic" inquiry do: sort, select, de-duplication, gather and set difference operation . To focus on the principles and spirit of learning to seize the major operations, such as operations and to re-gather operations are essentially the same, the operation can be re-learned to comprehend by analogy, to know how to perform aggregation operations. Each operation of several algorithms, the algorithm according to the results of the analysis will be apparent and the advantages and disadvantages of different algorithms match the scene.