Mapping from relational model to Key-Value model

Mapping from relational model to Key-Value model

Reference from https://book.tidb.io/session1/chapter3/tidb-computing.html

Simply understand the relational model as Table and SQL statement, then the problem becomes how to save Table on the KV structure and how to run SQL statement on the KV structure.

For a Table, the data to be stored includes three parts:

  1. table meta information
  2. Row in Table
  3. index data

operational requirements

For the Insert statement, Row needs to be written into KV, and the index data should be established.

For the Update statement, you need to update the Row while updating the index data (if necessary).

For the Delete statement, it is necessary to delete the index while deleting the Row.

For the Select statement, it is first necessary to be able to read a row of data simply and quickly, so each Row needs to have an ID (display or implicit ID). Secondly, multiple consecutive rows of data may be read, for example Select * from user;. Finally, there is the need to read data through the index. The use of the index may be a point search or a range query.

underlying storage

There is a globally ordered distributed Key-Value engine . For quickly obtaining a row of data, assuming that we can construct a certain key or several keys, and locate this row, we can use the Seek method provided by the KV storage engine to quickly locate the location of this row of data. Another example is the need to scan the entire table, if it can be mapped to a Key Range, scanning from StartKey to EndKey, then the entire table data can be easily obtained in this way. The operation of Index data is also a similar idea.

TiDB

table data

To map each column data in a row into a (Key, Value) key-value pair, you need to consider how to construct the Key.

  1. In order to ensure that the data in the same table are kept together for easy search, TiDB assigns a table ID to each table, denoted by TableID. The table ID is an integer, unique across the cluster.
  2. TiDB will assign a row ID to each row of data in the table, denoted by RowID. The row ID is also an integer, unique within the table. For the row ID, TiDB has made a small optimization. If a table has an integer primary key, TiDB will use the value of the primary key as the row ID of this row of data.

Each row of data is encoded into (Key, Value) key-value pairs according to the following rules:

Key:   tablePrefix{TableID}_recordPrefixSep{RowID}
Value: [col1, col2, col3, col4]

Where tablePrefixand recordPrefixSepare specific string constants used to distinguish other data in the Key space.

index data

TiDB assigns an index ID to each index in the table, IndexIDdenoted by .

For the primary key and unique index, we need to quickly locate the corresponding RowID according to the key value. Therefore, encode it into (Key, Value) key-value pairs according to the following rules:

Key:   tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
Value: RowID

For ordinary secondary indexes that do not need to satisfy unique constraints, one key value may correspond to multiple rows, and we need to query the corresponding RowID according to the key value range. Therefore, it is encoded into (Key, Value) key-value pairs according to the following rules:

Key:   tablePrefix{TableID}_indexPrefixSep{IndexID}_indexedColumnsValue_{RowID}
Value: null

Regardless of the Key encoding scheme for table data or index data, all rows in a table have the same Key prefix, and all data in an index also have the same prefix. Such data with the same prefix are arranged together in the key space of TiKV. Therefore, as long as the encoding scheme of the suffix part is carefully designed to ensure that the comparison relationship between before and after encoding remains unchanged, the table data or index data can be stored in TiKV in an orderly manner.

CREATE TABLE User {
    ID int,
    Name varchar(20),
    Role varchar(20),
    Age int,
    PRIMARY KEY (ID),
    KEY idxAge (Age)
};

After the TiDB SQL layer completes the SQL parsing, it will convert the SQL execution plan into an actual call to the TiKV API.

TiDB SQL层

TiDB's SQL layer, tidb-server, is similar to Google's F1 . It is responsible for translating SQL into Key-Value operations, forwarding them to the shared distributed Key-Value storage layer TiKV, and then assembling the results returned by TiKV. Return query results to the client. The nodes at this layer are all stateless, the nodes themselves do not store data, and the nodes are completely peer-to-peer.

Map the SQL query to the KV query, then obtain the corresponding data through the KV interface, and finally perform various calculations.

For example, for select count(*) from user where name = "TiDB"such a statement, we need to read all the data in the table, and then check namewhether the field is TiDB, and if so, return this row. The specific process is:

  1. Construct a Key Range: all in a table RowIDare within [0, MaxInt64)this range, then we can construct a left-closed right-open interval by using 0and MaxInt64according to the encoding rules of the row dataKey[StartKey, EndKey)
  2. Scan Key Range: Read the data in TiKV according to the Key Range constructed above
  3. Filter data: For each row of data read, calculate name = "TiDB"this expression, if it is true, return this row upwards, otherwise discard this row of data
  4. Calculation Count(*): For each row that meets the requirements, accumulate to Count(*)the result of

shortcoming:

  1. When scanning data, each row must be read from TiKV through the KV operation, at least one RPC overhead, if there is a lot of data to be scanned, the overhead will be very large
  2. Not all rows are useful. If the conditions are not met, it is not necessary to read them out
  3. The value of the row that meets the requirements is meaningless. In fact, only a few rows of data are needed here.

Resolution: Compute needs to be placed as close as possible to the storage nodes to avoid a large number of RPC calls .

  1. Push down the predicate conditions in SQL to storage node calculations, return only valid rows, and avoid meaningless network transmission
  2. You can also push down the aggregation function Count(*)to the storage node, pre-aggregate, and only return the result of count
  3. The SQL layer then accumulates and sums the count results of each node.

Guess you like

Origin blog.csdn.net/qq_47865838/article/details/128515146