On the InnoDB index structure

0 REVIEW

What features, as well as the organizational structure of the index index InnoDB tables have what


1, InnoDB clustered index Features

We know that the clustered index InnoDB table engine of the organization, will have a clustered index.


Sequential rows (row data) stored in a clustered index leaf node (except column overflow occurs, see, later referred to as "pre-text"), and it depends on the relative order of the stored aggregate index. Here that the relative order rather than physical order, because the leaf node data page, the physical order of rows of data and relative order may not be the same, will speak on the back.


Select the order InnoDB clustered index is as follows:


If there are explicitly defined primary key (PRIMARY KEY), which will be selected as the primary key of a clustered index

Otherwise, select the first all columns are not allowed a unique index is NULL

If the first two are not, InnoDB will choose the built-in DB_ROW_ID as a clustered index, named GEN_CLUST_INDEX

Special note: DB_ROW_ID 6 bytes, each increment, and examples within the entire global allocation. In other words, the current instance if there are multiple tables using the built-in DB_ROW_ID as a clustered index, then insert new data in these tables, they built DB_ROW_ID values ​​are not continuous, but jumping. Like this:


ROW_ID t1 table: 1,3,7,10

ROW_ID t2 table: 2,4,5,6,8,9

2, InnoDB index structure

InnoDB default index using B + tree structure data (using the R-tree spatial index), data stored in a leaf node index.


InnoDB basic I / O units are stored in a data page (page), a default page is 16KB. We pre said text, each page by default will reserve 1/16 free space required for the subsequent update data "variable length", in the inserted state and therefore the best order, which produces the least debris, which about the same time be able to fill the page space 15/16. If it is random write, then the page space utilization is about 1/2 to 15/16.


When row_format = DYNAMIC | COMPRESSED, the index is a maximum length of 3072 bytes, when row_format = REDUNDANT | COMPACT, the index of the maximum length of 767 bytes. When the page size is not the default of 16KB, the maximum index length limit will follow the change.


Next we were on to verify the basic structural features of InnoDB index.


First, create the following test table:


[[email protected]] [innodb]> CREATE TABLE `t1` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `c1` int(10) unsigned NOT NULL DEFAULT '0',

  `c2` varchar(100) NOT NULL,

  `c3` varchar(100) NOT NULL,

  PRIMARY KEY (`id`),

  KEY `c1` (`c1`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

By the following method to write test data 10:

set @uuid1=uuid(); set @uuid2=uuid();

insert into t1 select 0, round(rand()*1024),

                @uuid1, concat(@uuid1, @uuid2);

T1 look overall configuration table:

# View tool with innodb_ruby

[[email protected]]# innodb_space -s ibdata1 -T innodb/t1 space-indexes

id    name       root   fseg        fseg_id   used    allocated   fill_factor

238   PRIMARY    3      internal    1         1       1           100.00%

238   PRIMARY    3      leaf        2         0       0           0.00%

239   c1         4      internal    3         1       1           100.00%

239   c1         4      leaf        4         0       0           0.0

 

# View tool with innblock

[[email protected]]# innblock innodb/t1.ibd scan 16

...

===INDEX_ID:238

level0 total block is (1)

block_no:     3,level:   0|*|

===INDEX_ID:239

level0 total block is (1)

block_no:     4,level:   0|*|

can be seen

Index ID Index Type Root page no Index-storey
238 Primary key index (clustered index) 3 1
239 Secondary indexes 4 1

3, InnoDB index validation features

3.1 Features 1: aggregation index leaf node stores the entire line

Scans the third page, the interception of contents of the first physical record:

[[email protected]]# innodb_space -s ibdata1 -T innodb/t1 -p 3 page-dump

...

records:

{:format=>:compact,

 :offset=>127,

 :header=>

  {:next=>263,

   :type=>:conventional,

   :heap_number=>2,

   :n_owned=>0,

   : Min_rec => false,

   :deleted=>false,

   :nulls=>[],

   :lengths=>{"c2"=>36, "c3"=>72},

   : External => []

   :length=>7},

 :next=>263,

 :type=>:clustered,

 # First physical records, id = 1

 :key=>[{:name=>"id", :type=>"INT UNSIGNED", :value=>1}],

 :row=>

  [{:name=>"c1", :type=>"INT UNSIGNED", :value=>777},

   {:name=>"c2",

    :type=>"VARCHAR(400)",

    :value=>"a1c1a7c7-bda5-11e9-8476-0050568bba82"},

   {:name=>"c3",

    :type=>"VARCHAR(400)",

    :value=>

     "a1c1a7c7-bda5-11e9-8476-0050568bba82a1c1aec5-bda5-11e9-8476-0050568bba82"}],

 :sys=>

  [{:name=>"DB_TRX_ID", :type=>"TRX_ID", :value=>10950},

   {:name=>"DB_ROLL_PTR",

    :type=>"ROLL_PTR",

    :value=>

     {:is_insert=>true,

      :rseg_id=>119,

      :undo_log=>{:page=>469, :offset=>272}}}],

 :length=>129,

 :transaction_id=>10950,

 :roll_pointer=>

  {:is_insert=>true, :rseg_id=>119, :undo_log=>{:page=>469, :offset=>272}}}

Obviously, indeed the entire contents stored data.


Aggregation tree index key (key) is the primary key index (i = 10), a clustered index node value (value) other non-clustered index column (c1, c2, c3) and implicitly column (DB_TRX_ID, DB_ROLL_PTR).


Optimization Tips 1: Try not to store large object data so that each leaf node can store more data, reduce breakage rates, increase buffer pool utilization. Also try to avoid overflow can occur.


3.2 Feature 2: non-clustered index leaf pointer pointing to the storage node with children

For the above test tables continue to write new data until the clustered index from one tree to split into two layers.


When our old paper-storey InnoDB tables clustered index when changes occur in the calculation according to extrapolated expected a leaf node can store up to 111 records, so the insert 112 records, will split into two height from the floor layer height. After measurement, also of Queshiruci.


[[email protected]] [innodb]>select count(*) from t1;

+----------+

| count(*) |

+----------+

|      112 |

+----------+

 

[[email protected]]# innblock innodb/t1.ibd scan 16

...

===INDEX_ID:238

level1 total block is (1)

block_no:     3,level:   1|*|

level0 total block is (2)

block_no:     5,level:   0|*|block_no:     6,level:   0|*|

...

At this point it can be seen that the root node is still pageno = 3, and the leaf node becomes a [5, 6] two page. It can be seen, the root node should be in two physical records, stores pointing pageno = [5, 6] page of the two pointers.


We resolved at No. 3 page, look at its specific structure:


[[email protected]]# innodb_space -s ibdata1 -T innodb/t1 -p 3 page-dump

...

records:

{:format=>:compact,

 :offset=>125,

 :header=>

  {:next=>138,

   :type=>:node_pointer,

   :heap_number=>2,

   :n_owned=>0,

   : Min_rec => true, # is the first record min_key

   :deleted=>false,

   :nulls=>[],

   :lengths=>{},

   : External => []

   :length=>5},

 :next=>138,

 :type=>:clustered,

 # First record, store only key values

 :key=>[{:name=>"id", :type=>"INT UNSIGNED", :value=>1}],

 :row=>[],

 :sys=>[],

 : Child_page_number => 5, #value value is a leaf node pointed pageno = 5

 : Length => 8} # 8 bytes consumed an entire record, except for the key value of 4 bytes, 4 bytes pointer need

 

{:format=>:compact,

 :offset=>138,

 :header=>

  {:next=>112,

   :type=>:node_pointer,

   :heap_number=>3,

   :n_owned=>0,

   : Min_rec => false,

   :deleted=>false,

   :nulls=>[],

   :lengths=>{},

   : External => []

   :length=>5},

 :next=>112,

 :type=>:clustered,

 # Second record, store only key values

 :key=>[{:name=>"id", :type=>"INT UNSIGNED", :value=>56}],

 :row=>[],

 :sys=>[],

 : Child_page_number => 6, #value value is a leaf node pointed pageno = 6

 :length=>8}

Optimization Tips 2: the length of the index column of data as small as possible, so the higher the index tree storage efficiency, can store more data in a non-leaf nodes, delaying the index tree-storey split speed, higher average search efficiency.


3.3 Feature 3: secondary indexes simultaneously storing the primary key index column values

In the secondary index, while always stores the primary key index (or clustered index) column value, which is acting upon the auxiliary scan index can be obtained directly from the leaf node corresponding to the index value of the aggregation, and the basis of the return value table query rows of data acquisition (if needed, then back to the table query). This feature is also known as Index Extensions (after new features optimizer version 5.6, see Use of Index Extensions).


Further, in the auxiliary non-leaf nodes of the index, key value of the index column values ​​of the index record is defined, and the corresponding value is the value of the aggregate index column values ​​(referred to as PKV). If the secondary index definition already contains a partially aggregated column index, value is the value of the index record contains the remaining non-clustered index column values.


Create a test table as follows:


CREATE TABLE `t3` (

  `a` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `b` int(10) unsigned NOT NULL DEFAULT '0',

  `c` varchar(20) NOT NULL DEFAULT '',

  `d` varchar(20) NOT NULL DEFAULT '',

  `e` varchar(20) NOT NULL DEFAULT '',

  PRIMARY KEY (`a`,`b`),

  KEY `k1` (`c`,`b`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Random insert some test data:

# Call the shell script written 500 data

[[email protected]]# cat insert.sh

#!/bin/bash

. ~/.bash_profile

cd / data / perconad

i=1

max=500

while [ $i -le $max ]

do

 mysql -Smysql.sock -e "insert ignore into t3 select

    rand()*1024, rand()*1024, left(md5(uuid()),20) ,

    left(uuid(),20), left(uuid(),20);" innodb

 i=`expr $i + 1`

done

 

# 498 actually written data (which has two primary key violation failure)

[[email protected]] [innodb]>select count(*) from t3;

+----------+

| count(*) |

+----------+

|      498 |

+----------+

Parses the data structure:

# Primary key

[root@test1 perconad]# innodb_space -s ibdata1 -T innodb/t2 space-indexes

id    name     root  fseg        fseg_id   used   allocated   fill_factor

245   PRIMARY  3     internal    1         1      1           100.00%

245   PRIMARY  3     leaf        2         5      5           100.00%

246   k1       4     internal    3         1      1           100.00%

246   k1       4     leaf        4         2      2           1

 

[[email protected]]# innodb_space -s ibdata1 -T innodb/t2 -p 4 page-dump

...

records:

{:format=>:compact,

 :offset=>126,

 :header=>

  {:next=>164,

   :type=>:node_pointer,

   :heap_number=>2,

   :n_owned=>0,

   :min_rec=>true,

   :deleted=>false,

   :nulls=>[],

   :lengths=>{"c"=>20},

   : External => []

   :length=>6},

 :next=>164,

 :type=>:secondary,

 :key=>

  [{:name=>"c", :type=>"VARCHAR(80)", :value=>"00a5d42dd56632893b5f"},

   {:name=>"b", :type=>"INT UNSIGNED", :value=>323}],

 :row=>

  [{:name=>"a", :type=>"INT UNSIGNED", :value=>310},

   {:name=>"b", :type=>"INT UNSIGNED", :value=>9}],

   # Here to resolve to values ​​of b column is actually a pointer to a leaf node, i.e. child_page_number = 9

   # B columns from the real value is 323

 :sys=>[],

 :child_page_number=>335544345,

 # Parse here is not accurate, actually record header next record, a total of 6 bytes

 :length=>36}

 

{:format=>:compact,

 :offset=>164,

 :header=>

  {:next=>112,

   :type=>:node_pointer,

   :heap_number=>3,

   :n_owned=>0,

   : Min_rec => false,

   :deleted=>false,

   :nulls=>[],

   :lengths=>{"c"=>20},

   : External => []

   :length=>6},

 :next=>112,

 :type=>:secondary,

 :key=>

  [{:name=>"c", :type=>"VARCHAR(80)", :value=>"7458824a39892aa77e1a"},

   {:name=>"b", :type=>"INT UNSIGNED", :value=>887}],

 :row=>

  [{:name=>"a", :type=>"INT UNSIGNED", :value=>623},

   {:name=>"b", :type=>"INT UNSIGNED", :value=>10}],

   # Ditto, in fact, child_page_number = 10, rather than the value of b column

 :sys=>[],

 :child_page_number=>0,

 : Length => 36} # 16 bytes data length

Under way, not on the secondary storage TRX_ID index, ROLL_PTR these (they are stored only on the clustered index).


The above analytical tool with innodb_ruby part of the non-leaf nodes is not accurate enough, so we open the data file in binary mode secondary confirmation confirmation:


# This tool can also be used hexdump

[[email protected]]# vim -b path/t3.ibd

...

:%!xxd

 

# Secondary indexes to find the portion of the data resides

0010050: 0002 0272 0000 00e1 0000 0002 01b2 0100  ...r............

0010060: 0200 1b69 6e66 696d 756d 0003 0000 000B lowest ...... ...

0010070: 7375 7072 1410 0011 0026 3030 656d 756d final ..... & 00

0010080: 6135 6434 3264 6435 3636 3332 3839 3362 a5d42dd56632893b

0010090: 3566 0000 0143 0000 0136 0000 0009 1400  5f...C...6......

00100a0: 0019 NQF 3734 3538 3832 3461 3339 3839 .... 7458824a3989

00100b0: 3261 6137 3765 3161 0000 0377 0000 026f 2aa77e1a ... in a ...

00100c0: 0000 000a 0000 0000 0000 0000 0000 0000  ................

 

# Reference page mode analyzing the physical structure, the following results are obtained

/ * First record * /

1410 0011 0026, record header, 5 bytes

3030 6135 6434 3264 6435 3636 3332 3839 3362 3566,c='00a5d42dd56632893b5f',20B

0000 0143, b = 323, 4B

0000 0136, a = 310, 4B

0000 0009, child_pageno=9, 4B

 

/* 2 */

1400 0019 SBF header record

3734 3538 3832 3461 3339 3839 3261 6137 3765 3161, c='7458824a39892aa77e1a'

0000 0377, b=887

0000 026f, a=623

0000 000a, child_pageno=10

Now Conversely, above the page-dump out the results of analytical tools should be used innodb_ruby such fishes (I select only one record, please differences between themselves and contrast before):


{:format=>:compact,

 :offset=>164,

 :header=>

  {:next=>112,

   :type=>:node_pointer,

   :heap_number=>3,

   :n_owned=>0,

   : Min_rec => false,

   :deleted=>false,

   :nulls=>[],

   :lengths=>{"c"=>20},

   : External => []

   :length=>6},

 :next=>112,

 :type=>:secondary,

 :key=>

  [{:name=>"c", :type=>"VARCHAR(80)", :value=>"7458824a39892aa77e1a"},

   {:name=>"b", :type=>"INT UNSIGNED", :value=>887}],

 :row=> [{:name=>"a", :type=>"INT UNSIGNED", :value=>623}],

 :sys=>[],

 :child_page_number=>10,

 :length=>36}

Can be seen that, indeed, as previously mentioned, value the value stored in the non-leaf nodes of the secondary index is an aggregate index column values.


Optimization Tips 3: length of the auxiliary column index defined as small as possible, when the secondary index is defined, is not necessary to explicitly add aggregate (after version 5.6) indexed columns.


3.4 Features 4: no clustered index columns available, use the built-in as the clustered index ROW_ID

Create several tables like the following, so as to select the built-in ROW_ID clustered index:


[[email protected]] [innodb]> CREATE TABLE `tn1` (

  `c1` int(10) unsigned NOT NULL DEFAULT 0,

  `c2` int(10) unsigned NOT NULL DEFAULT 0

) ENGINE=InnoDB;

Cycle write data to several tables:

insert into tt1 select 1,1;

insert into tt2 select 1,1;

insert into tt3 select 1,1;

insert into tt1 select 2,2;

insert into tt2 select 2,2;

insert into tt3 select 2,2;

View tn1 - tn3 table of data (here due innodb_ruby analytical tool results are not accurate, so I switched to hexdump analysis):


TN1

000c060: 1a69 6e66 696d 756d 0200 0003 0000 000B lowest ...... ...

000c070: 656d 756d 7375 7072 0000 1000 2000 0000 final .... ...

000c080: 0003 1200 0000 003d f6aa 0000 01d9 0110  .......=........

000c090: 0000 0001 0000 0001 0000 18ff d300 0000  ................

000c0a0: 0003 1500 0000 003d f9ad 0000 01da 0110  .......=........

000c0b0: 0000 0002 0000 0002 0000 0000 0000 0000  ................

 

tn2

000c060: 1a69 6e66 696d 756d 0200 0003 0000 000B lowest ...... ...

000c070: 656d 756d 7375 7072 0000 1000 2000 0000 final .... ...

000c080: 0003 1300 0000 003d f7ab 0000 0122 0110  .......=....."..

000c090: 0000 0001 0000 0001 0000 18ff d300 0000  ................

000c0a0: 0003 1600 0000 003d feb0 0000 01db 0110  .......=........

000c0b0: 0000 0002 0000 0002 0000 0000 0000 0000  ................

 

tn3

000c060: 1a69 6e66 696d 756d 0200 0003 0000 000B lowest ...... ...

000c070: 656d 756d 7375 7072 0000 1000 2000 0000 final .... ...

000c080: 0003 1400 0000 003d f8ac 0000 0123 0110  .......=.....#..

000c090: 0000 0001 0000 0001 0000 18ff d300 0000  ................

000c0a0: 0003 1700 0000 003e 03b3 0000 012a 0110  .......>.....*..

000c0b0: 0000 0002 0000 0002 0000 0000 0000 0000  ................

Wherein a value representing DB_ROW_ID are:

Zhengzhou regular hospital infertility: http: //www.xbzztj.com/

TN1

0003 12 => (1,1)

0003 15 => (2,2)

 

tn2

0003 13 => (1,1)

0003 16 => (2,2)

 

tn3

0003 14 => (1,1)

0003 17 => (2,2)

Obviously, the built-in self-energizing DB_ROW_ID indeed the entire allocated instance level share instead of each table a DB_ROW_ID exclusive sequence.


We can imagine the next, if there is more than one instance of this DB_ROW_ID tables are used, then, is bound to cause concurrent requests Competition / wait. In addition may also cause a master-slave replication environment, you may cause serious problems because of replication latency issues data scanning mechanism when relay log playback from the library. Find details and parameters slave_rows_search_algorithms reference library data from.


Optimization Tips 4: Display definitions available clustered index / primary key index on their own, do not let the built-in InnoDB choose DB_ROW_ID as a clustered index, avoid the potential loss of performance.


Space has been a bit bigger, the analysis of this work is first come here, and resume it later.


4, a few summary

Finally engine InnoDB table, it summarizes a few suggestions.

Each table must have a primary key explicit, preferably self-energizing an integer, and no service use

Whether it is primary key index, or secondary index, choose smaller data type column as much as possible

When you define a secondary index, no need to explicitly add the primary key index columns (for later MySQL 5.6)

Line data as short as possible, if the length of each column is fixed even better (not like the type VARCHAR variable length)

Based on the above test environment Percona Server 5.7.22:

# MySQL version is Percona Server 5.7.22-22, I downloaded the source code compiled

[[email protected]#] mysql -Smysql.sock innodb

...

Server version: 5.7.22-22-log Source distribution

...

[[email protected]]> \s

...

Server version:     5.7.22-22-log Source distribution

Enjoy MySQL :)


http://www.chacha8.cn/detail/1132398242.html


Guess you like

Origin blog.51cto.com/14510351/2439945