Hundred battles c++ (database 1)

 

three paradigms

First normal form: When all attributes of the relational schema R cannot be decomposed into more basic data units, R is said to satisfy the first normal form, abbreviated as 1NF. Satisfying first normal form is the minimum requirement for relational schema normalization

If the relation schema R satisfies the first normal form, and all non-key attributes of R are completely dependent on each candidate key attribute of R, it is said that R satisfies the second normal form to remove partial functional dependencies

In the third normal form, all data elements in the table must not only be uniquely identified by the primary key, but also must be independent of each other without other functional relationships. Remove transitive functional dependencies.

Use intersection, union and complement to describe left join, outer join, inner join and right join

 

 

 

 

A left join returns all the rows in the left table and matching rows in the right table, and replaces null if there is no match.

Outer joins return all rows from the left and right tables. If a data row in one of the tables does not have a matching row in the other table, then the opposite data is replaced with NULL

Right join returns all the rows in the right table and matching rows in the left table, if there is no match, replace it with null

Inner join returns matching rows from left table and right table

Union merges two result sets with the same number of columns and the same attribute values.

mysql index data structure

MySQL study notes (five) index related data structure of index_RedHaohao's blog-CSDN blog

binary search tree

The left node is smaller than the root node, and the right node is larger than the root node.

All non-leaf nodes have at most two sons, and each node only stores one key

If the number of nodes in the left and right subtrees of all non-leaf nodes remains similar (balanced), then the search performance is close to binary search; but its advantage over binary search in continuous memory space is that it does not change the structure (insert and delete nodes) There is no need to move large segments of memory data, and often even constant overhead;

The more "chunky" the binary search tree is, that is, when each level is "stuffed" as much as possible (each parent node has two child nodes), the more efficient the search is. When each layer is full, the search efficiency is the highest, up to O(log n). When the binary search tree degenerates into a singly linked list, the search efficiency is the lowest, the lowest is O(n).

balanced binary tree

"Balanced" on top of a binary search tree.

The height difference between the left and right subtrees of any node is at most 1.

When inserting or removing a node, one or more left or right rotations are usually required to maintain balance.

 

 

 

Index structure (B tree B + tree, hash, space, full text)

1.1 B-Tree index

        B-Tree indexes are the default index type for most MySQL storage engines.

B- tree

       is a multiway search tree (not binary):

       1. Define that any non-leaf node has at most M sons; and M>2;

       2. The number of sons of the root node is [2, M];

       3. The number of sons of non-leaf nodes other than the root node is [M/2, M];

       4. Each node stores at least M/2-1 (round up) and at most M-1 keywords; (at least 2 keywords)

       5. The number of keywords of non-leaf nodes = the number of pointers to sons - 1;

       6. Keywords of non-leaf nodes: K[1], K[2], ..., K[M-1]; and K[i] < K[i+1];

       7. Pointers to non-leaf nodes: P[1], P[2], ..., P[M]; where P[1] points to a node with a keyword smaller than K[1]

Subtree, P[M] points to the subtree whose key is greater than K[M-1], and other P[i] points to the subtree whose key belongs to (K[i-1], K[i]);

       8. All leaf nodes are on the same layer;

B+ tree

       The B+ tree is a variant of the B-tree and is also a multi-way search tree:

       1. Its definition is basically the same as B-tree, except:

       2. The number of subtree pointers of non-leaf nodes is the same as that of keywords;

       3. The subtree pointer P[i] of the non-leaf node points to the subtree whose key value belongs to [K[i], K[i+1])

(B-tree is an open interval);

       5. Add a chain pointer for all leaf nodes;

       6. All keywords appear in leaf nodes;

1.2 Hash index

    Based on the hash table implementation, the advantage is that the search is very fast.

    Only the Memory engine explicitly supports hash indexes in MySQL.

    Hash indexes only contain hash values ​​and row pointers, and do not store field values, so values ​​in the index cannot be used to avoid reading rows.

1.3. Spatial index (R-Tree)

    The MyISAM storage engine supports spatial indexes and can be used for geographic data storage.

    Spatial indexes index data from all dimensions, and can effectively use arbitrary dimensions for combined queries.

1.4 Full-text indexing

    The MyISAM storage engine supports full-text indexing, which is used to find keywords in text, rather than directly comparing values ​​in the index.

What are the indexes of mysql , what are the differences, and their characteristics

· Clustered index: put the data storage and index together, find the index and find the data

· Non-clustered index: Store data in a separate index structure, and the leaf nodes of the index structure usually only store the primary key. Second search is required

In innodb , the index created on top of the clustered index is called an auxiliary index. The auxiliary index access data always requires a secondary search. The non-clustered index is an auxiliary index, such as a composite index, a prefix index, and a unique index (index column must be unique, but null values ​​are allowed ) Auxiliary index leaf nodes store no longer the physical location of the row, but the primary key value.

A clustered index is not an index type, but a data storage method.

The term " clustering " means that data rows and adjacent key values ​​are stored closely together, and the data rows of InnoDB 's clustered index are stored in the leaf pages of the B-Tree .

Because there is no way to store data rows in two different places, a table can have only one clustered index.

advantage

  1. Related data can be stored together to reduce I/O operations;
  2. Because data is stored in B-Tree , data access is faster.

shortcoming

  1. Clustered indexes maximize the performance of I/O -intensive applications, but if all the data is in memory, there is no need to use clustered indexes.
  2. Insertion speed is heavily dependent on insertion order, with insertion in primary key order being fastest.
  3. Update operations are expensive because each updated row is moved to a new location.
  4. When inserting into a full page, the storage engine will split the page into two pages to accommodate the row, and the page split will cause the table to take up more disk space.
  5. Clustered indexes can slow down full table scans if rows are sparse, or if data storage is not contiguous due to page splits.

index optimization

3.1 Independent columns

    When making a query, the indexed column cannot be part of an expression, nor can it be a parameter of a function, otherwise the index cannot be used.

For example, the following query cannot use an index on the actor_id column:

SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5;

3.2 Prefix index

For columns of BLOB, TEXT and VARCHAR types, you must use a prefix index, which only indexes the first part of characters.

The selection of the prefix length needs to be determined according to the index selectivity: the ratio of the unique index value to the total number of records. The higher the selectivity, the higher the query efficiency. The maximum value is 1, at this time each record has a unique index corresponding to it.

3.3 Multi-column index

When you need to use multiple columns as conditions for query, using multi-column indexes is better than using multiple single-column indexes. For example, in the following statement, it is better to set actor_id and file_id as multi-column indexes.

SELECT file_id, actor_ id FROM sakila.film_actor WhERE actor_id = 1 OR film_id = 1;

3.4 Order of index columns

Put the most selective index columns first.

Slow query log: It will record all queries whose execution time exceeds long_query_time or queries that do not use indexes, so you can optimize the corresponding SQL statements according to the slow query log.

Use a covering index: a covering index means that all columns queried have indexes. Using a covering index can reduce a large number of back-to-table operations, and can also perform range searches to reduce disk IO.

(Back to the table operation: index scan first, and then use the primary key to fetch the data that cannot be provided in the index, that is, return to the table)

Index order: put the index with high selectivity in front, and you can filter most indexes. If there is an (a, b) index, there is no need to maintain an a index.

Regularly clear outdated indexes: Maintaining indexes consumes a lot of system resources, especially when there is a lot of data in the table.

Try to expand the index instead of creating a new index.

Query performance optimization (Explain and others)

1. Explain

Used to analyze SQL statements, the more important fields in the analysis results are:

  • select_type: query type, including simple query, joint query and subquery
  • key : the index to use
  • rows : the number of rows scanned

2. Reduce the returned columns

Slow query is mainly due to accessing too much data, in addition to accessing too many rows, it also includes accessing too many columns.

It is best not to use the SELECT * statement, but to select the columns to be queried as needed.

3. Fewer rows returned

It is best to use the LIMIT statement to fetch the desired rows.

Indexes can also be created to reduce full table scans for conditional statements. For example, for the following statement, if the index is not applicable, a full table scan is required, while using the index only needs to scan a few rows of records. Using the Explain statement, you can see the difference by observing the rows field .

SELECT * FROM sakila.film_actor WHERE film_id = 1;

4. Split large DELETE or INSERT statements

If it is executed at one time, it may lock a lot of data at one time, occupy the entire transaction log, exhaust system resources, and block many small but important queries.

Index Design Principles

      ① The primary key automatically creates a unique index

  ②Fields that are frequently used as query conditions should create indexes

③Fields associated with other tables in the query, and foreign key relationships are indexed

Build indexes on columns that are frequently sorted and grouped. If there are multiple columns to be sorted, you can build composite indexes on these columns.

  ④Frequently updated fields are not suitable for indexing, because each update not only updates the record but also updates the index

  ⑤ Do not create indexes for fields that are not used in the WHERE condition

  ⑥The choice of single key / combined index, who? ( Ten to create a combined index under high concurrency )

  ⑦For the sorted fields in the query, if the sorted fields are accessed through the index, the sorting speed will be greatly improved

  ⑧ Statistics or grouping fields in the query

Under what circumstances do not create an index

  ① Too few table records

  ② Tables that are frequently added, deleted, and modified

Improve the query speed, but at the same time will reduce the speed of updating the table, such as INSERT , UPDATE , and DELETE     on the table .

    Because when updating the table, MySQL not only saves the data, but also saves the index file.

    Table fields with repeated and evenly distributed data, so only the most frequently queried and most frequently sorted data should be indexed.

  ③ Note that if a data column contains many duplicate content, indexing it will not have much practical effect.

More indexes are not always better. Too many indexes will take up a lot of memory space, and will affect the performance of adding, deleting and modifying.

Avoid creating too many indexes on frequently updated tables, and keep the columns in the index as few as possible.

It is best not to create an index for a table with a small amount of data. Due to the small amount of data, the query time may be shorter than the time spent traversing the index.

Build indexes on columns with many distinct values ​​that are often used in conditional expressions. For example, the 'gender' column in the student table has only two values ​​of "male" and "female", so there is no need to create an index.

For indexing fields of string type, a prefix length should be specified if possible. If you have a char ( 255 ) column with most values ​​unique within the first 10 or 30 characters, you don't need to index the entire column . Short indexes can not only improve query speed but also save disk space and reduce I/O operations.

In the query statement that uses the LIKE keyword to query, if the first character of the matching string is "%" , the index will not work. The index will only work if the "%" is not in the first position.

When using a multi-column index, the index will only be used if the first of these fields is used in the query condition. The leftmost matching principle.

exCREATE INDEX index_id_price ON fruits(f_id,f_price);

SELECT * FROM fruits WHERE f_id=‘12’;

At this point the index will be used

SELECT * FROM fruits WHERE f_price=5.2;

The index will not be used at this time

An indexed column cannot be part of an expression, nor can it be an argument to a function.

ex: select * from table where id-1>4;

At this time, the id is part of the expression and the index will not be used.

Talk about locks in the database

global lock

The global lock is to lock the entire database instance. The command is Flush Tables with read lock . After execution, the entire library is in a read-only state, and data addition, deletion, and modification statements, data definition statements (creating tables, modifying table structures), and update transaction submissions will all be blocked.

Global locks are often used for backups of Myisam databases, plus read-only locks for backups.

But there is a risk that the entire library is read-only.

Risks of using global lock backups

If it is backed up on the main library, the update operation cannot be performed during the backup period, and the business stops.

If it is backed up on the slave library, the slave library cannot execute the binlog from the master library during the backup , resulting in master-slave delay.

However, if the lock is not added, if the data is modified during the backup period, it will cause data inconsistency. The original library has been modified, but the backup library has not been changed.

One way to ensure data consistency is to start a transaction at the repeatability isolation level.

mysqldump -single-transaction , a transaction will be started before the data is imported to ensure a consistent view. Due to the support of mvcc , the data can be updated normally during this process.

But the single-transaction method is only applicable to all tables using the transaction engine library. So Myisam can only use FTWRL .

Why not use set global readonly =true but FTWRL

In some systems, the value of readonly is used for other logic, such as to determine whether a library is a primary library or a standby library.

After executing the FTWRL command, Mysql will automatically release the global lock due to the abnormal disconnection of the client, and the entire library will return to the state where it can be updated normally. If an exception occurs on the client after the entire library is set to readonly , the database will remain in the readonly state, which will cause the entire library to be in an unwritable state for a long time. The risk is too high.

table lock

There are two types of table-level locks in Mysql : one is a table lock, and the other is a metadata lock ( MDL ).

The usage of table lock is lock tables ... read/write. It can be released actively by using unlock tables , or it can be released automatically when the client is disconnected.

MDL does not need to be called explicitly, it will be added automatically when accessing a table. MDL is to ensure the correctness of reading and writing. For example, lock the table during query to prevent other threads from modifying the table structure.

When doing DML on a table , add an MDL read lock. When doing DDL on a table , add MDL write lock.

Read locks are not mutually exclusive. Can add, delete, modify and query at the same time

Read-write locks and write locks are mutually exclusive.

When the above operation is performed, AB is executed normally, and C is locked, because the read lock of A has not been released, and D is also blocked because C is blocked .

It can be seen that the MDL lock is released after the transaction is committed.

The correct approach should be to set the waiting time in the alter table . It is best if you can get the MDL write lock within this time . If you can’t, don’t block the subsequent business statements and give up first. Then try again.

row lock

InnoDB supports row locks, but Myisam does not, only table locks. Therefore, in terms of business concurrency, InnoDB performs better.

In InnoDB transactions, row locks are added when needed, but they are not released immediately when they are not needed, but are not released until the end of the transaction. This is the two-phase locking protocol.

If you need to lock multiple rows in a transaction, put the locks that are most likely to cause lock conflicts and affect concurrency as far back as possible.

Because if you put it later, the time before the transaction is committed is shorter, that is, the time before the lock is released is shorter, and the time for blocking is shorter.

Set shared locks: SELECT ... LOCK IN SHARE MODE; shared locks are all row locks
Set exclusive locks: SELECT ... FOR UPDATE; exclusive locks can be row locks or table locks

Insert update will automatically add an exclusive lock

1. Shared lock (also known as read lock), exclusive lock (also known as write lock):

The lock mechanism of the InnoDB engine: InnoDB supports transactions, and row locks and table locks are widely used. Myisam does not support transactions, but only supports table locks.

 

Shared lock ( S ): Allows one transaction to read a row, preventing other transactions from obtaining exclusive locks on the same data set.

Exclusive lock ( X) : Allow transactions that obtain exclusive locks to update data, and prevent other transactions from obtaining shared read locks and exclusive write locks on the same data set.

Intention Shared Lock ( IS ): The transaction intends to add a row-shared lock to the data row, and the transaction must first obtain the IS lock of the table before adding a shared lock to a data row.

Intent exclusive lock ( IX ): The transaction intends to add an exclusive lock to the data row, and the transaction must first obtain the IX lock of the table before adding an exclusive lock to a data row.

The two intent locks are automatically added by the system.

2. Optimistic lock, pessimistic lock:

Pessimistic lock: Pessimistic lock, as its name suggests, refers to a conservative attitude towards data being modified by the outside world (including other current transactions of the system, as well as transaction processing from external systems). Therefore, during the entire data processing process, the Data is locked. The realization of pessimistic locking often relies on the locking mechanism provided by the database (only the locking mechanism provided by the database layer can truly guarantee the exclusivity of data access, otherwise, even if the locking mechanism is implemented in this system, there is no guarantee that the external system will not modify data)

Optimistic lock:

Optimistic Locking ( Optimistic Locking ) Compared with pessimistic locking, optimistic locking assumes that data will not cause conflicts under normal circumstances, so when the data is submitted for update, it will formally detect whether the data conflicts or not. If a conflict is found , let the user return error information, let the user decide how to do it ( usually rollback the transaction ).

From the above introduction to the two locks, we know that the two locks have their own advantages and disadvantages, and one cannot be considered better than the other. Optimistic locks are suitable for situations where there are fewer writes (more read scenarios) , that is, conflicts are true When it rarely happens, this can save the overhead of locks and increase the overall throughput of the system. However, if there is a lot of writing, conflicts will often occur, which will cause the upper-layer application to continuously retry , which will actually reduce performance. Therefore, it is more appropriate to use pessimistic locks in scenarios with a lot of writing.

Guess you like

Origin blog.csdn.net/hebtu666/article/details/127205011