Review: database

https://yanglinwei.blog.csdn.net/article/details/103972288

MySQL master-slave replication: Master-slave replication can realize data backup, failover, MySQL cluster, high availability, read-write separation, etc. The principle is as follows:

  1. Generate two threads from the library,one I/O thread and one SQL thread;
  2. I/O thread requests the binlog of the main library and writes the obtained binlog log to the relay log (relay log) file;
  3. The main library will generate a log dump thread to transmit binlog to the slave library I/O thread;
  4. The SQL thread will read the log in the relay log file and parse it into specific operations to achieve consistent master-slave operations and final data consistency;

Insert image description here


MyCat: MyCat mainly intercepts SQL through, and then undergoes certain rules of fragmentation analysis, routing analysis, and read-write separation analysis. , cache analysis, etc., and then send the SQL to the real data block of the backend, and process the returned result appropriately and return it to the client.

Insert image description here


MyCat installation and configuration: Mainly configure the schema.xml file of the decompression package

<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">
    <!-- TESTDB1 是mycat的逻辑库名称,链接需要用的 -->
    <schema name="mycat_testdb" checkSQLschema="false" sqlMaxLimit="100" dataNode="dn1"></schema>
        <!-- database 是MySQL数据库的库名 -->
    <dataNode name="dn1" dataHost="localhost1" database="test" />
    <!--
    dataNode节点中各属性说明:
    name:指定逻辑数据节点名称;
    dataHost:指定逻辑数据节点物理主机节点名称;
    database:指定物理主机节点上。如果一个节点上有多个库,可使用表达式db$0-99,     表示指定0-99这100个数据库;

    dataHost 节点中各属性说明:
        name:物理主机节点名称;
        maxCon:指定物理主机服务最大支持1000个连接;
        minCon:指定物理主机服务最小保持10个连接;
        writeType:指定写入类型;
            0,只在writeHost节点写入;
            1,在所有节点都写入。慎重开启,多节点写入顺序为默认写入根据配置顺序,第一个挂掉切换另一个;
        dbType:指定数据库类型;
        dbDriver:指定数据库驱动;
        balance:指定物理主机服务的负载模式。
            0,不开启读写分离机制;
            1,全部的readHost与stand by writeHost参与select语句的负载均衡,简单的说,当双主双从模式(M1->S1,M2->S2,并且M1与 M2互为主备),正常情况下,M2,S1,S2都参与select语句的负载均衡;
            2,所有的readHost与writeHost都参与select语句的负载均衡,也就是说,当系统的写操作压力不大的情况下,所有主机都可以承担负载均衡;
-->
    <dataHost name="localhost1" maxCon="1000" minCon="10" balance="3" writeType="0" dbType="mysql" dbDriver="native" switchType="1"  slaveThreshold="100">
        <heartbeat>select user()</heartbeat>
        <!-- 可以配置多个主从 -->
        <writeHost host="hostM1" url="192.168.162.132:3306" user="root" password="123456">
            <!-- 可以配置多个从库 -->
            <readHost host="hostS2" url="192.168.162.133:3306" user="root" password="123456" />
        </writeHost>
    </dataHost>
</mycat:schema>

Database splitting principle:Followvertical split andSplit horizontally.

  • Vertical split: According to different businesses, it is divided into different databases, such as member database, order database, payment database, etc. Vertical split is used in large e-commerce systems. Very common. The disadvantage is that some business tables cannot be joined and can only be solved through interfaces, which increases the complexity of the system.
  • Horizontal split: Split the same table into different databases. Horizontal segmentation of data can be understood as segmentation according to data rows, that is, segmenting some rows in the table into one database, and segmenting other rows into other databases, mainly divided into tables. , two modes of sub-library. Horizontal sharding improves the stability and load capacity of the system, but cross-database join performance is poor. For example: split tables by time, hash, and business rules.

MyCat 10 horizontal sharding strategies:

  • Modulo algorithm: Perform a decimal modulation operation based on the id, and the result of the operation is a partition index, which is very similar to the ES cluster.
  • Fragment enumeration: For example, save according to provinces or districts and counties
  • scope agreement
  • Date designation
  • Fixed shard hash algorithm
  • Delivery model
  • ASCII code modulo wildcard
  • Programmatically specified
  • String split hash parsing

Sharding-Jdbc: Provides standardized data sharding, distributed transactions and database management functions, applicable to Java isomorphism, heterogeneous languages, containers, cloud native, etc. Various diverse application scenarios. Application scenarios:

  • Database read and write separation
  • Database table and database

The difference between SJdbc and MyCat:

  • MyCat is a database proxy framework based on third-party application middleware. All jdbc requests from the client must be handed over to MyCat first, and then MyCat forwards them to the specific real server.
  • Sharding-Jdbc is a Jar format that rewrites Jdbc's native methods at the local application layer to implement database sharding.
  • MyCat is a server-side database middleware, while Sharding-Jdbc is a local database middleware framework.

The configuration is as follows:

server:
  port: 9002
mybatis-plus:
#  mapper-locations: classpath*:/mapper/*.xml
  global-config:
    db-config:
      column-underline: true
#shardingjdbc配置      
sharding:
  jdbc:
    data-sources:
     ###配置第一个从数据库
      ds_slave_0:
        password: root
        jdbc-url: jdbc:mysql://192.168.212.203:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=true
        driver-class-name: com.mysql.jdbc.Driver
        username: root
      ###主数据库配置  
      ds_master:
        password: root
        jdbc-url: jdbc:mysql://192.168.212.202:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=true
        driver-class-name: com.mysql.jdbc.Driver
        username: root
    ###配置读写分离    
    master-slave-rule:
    ###配置从库选择策略,提供轮询与随机,这里选择用轮询
      load-balance-algorithm-type: round_robin
      ####指定从数据库
      slave-data-source-names: ds_slave_0
      name: ds_ms
      ####指定主数据库
      master-data-source-name: ds_master

Database index: It is a data structure used to improve the retrieval speed and query performance of database tables. An index is a data structure in a database, similar to the table of contents in a book, that provides a quick way to access specific rows in a table. By creating indexes, the database system can perform queries more efficiently, reducing the time it takes to scan data, thus speeding up retrieval and query operations.

Default data and index file locations:/var/lib/mysql


Index Realization Principle

  • hash algorithm: A data structure that is directly accessed based on the key value. It accesses records by mapping the key value to a location in the table. Speed ​​up searches. This mapping function is called 散列函数, and the array storing the records is called a hash table. The disadvantage is that range searches cannot be performed.
  • Balanced Binary Tree Algorithm: Its left subtree and right subtree are both balanced binary trees, and the absolute value of the difference between the depths of the left subtree and the right subtree (balance factor ) does not exceed 1. In other words, the balance factor of each node of the AVL tree can only be -1, 0, and 1 (the height of the left subtree minus the height of the right subtree). The advantage is that the balanced binary tree algorithm is basically the same as the binary tree query and is more efficient. The disadvantage is that the insertion operation requires rotation and range query is supported.

Insert image description here

  • Data structure B-tree: B-tree (B-tree) is a tree-like data structure that can store data, sort it and allow O(log n ) time complexity to run a data structure that performs lookups, sequential reads, insertions, and deletions. B-tree, in summary, is a binary search tree in which a node can have more than 2 child nodes. Unlike self-balancing binary search trees, B-trees optimize the reading and writing of large blocks of data for the system. The B-tree algorithm reduces the intermediate process experienced when locating records, thereby speeding up access. Commonly used in databases and file systems

Insert image description here

  • Data structure B+ tree: Compared with B tree, B+ tree has a new relationship between leaf nodes and non-leaf nodes. Leaf nodes contain key and value, and non-leaf nodes only Contains key but not value. All adjacent leaf nodes include non-leaf nodes, which are combined using linked lists and sorted in a certain order, so the range query efficiency is very high.

Insert image description here


MyISAM engine (read more and write less, does not support transactions): Use B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. Use table-level locking. Transactions are not supported.

Insert image description here


**InnoDB engine (recommended, writes more and reads less, supports transactions): ** Also uses B+Tree as the index structure to store both data and indexes in the same table space. This storage structure makes InnoDB perform better when handling a large number of write operations and concurrent transactions. Uses row-level locking and supports transactions.


MySQL improvement plan

  • Index optimization
  • Slow query optimization: Enable slow query and slow query limit time (slow_query_log)
  • Table optimization

explain command: It is the first recommended command to solve database performance. Most performance problems can be easily solved by this command

Insert image description here


Left prefix principle of joint index: Refers to the leftmost matching principle of the index. In MySQL, if you create a multi-column index, queries can only make full use of the leftmost column of the index.

Example:

-- 创建了一个复合索引(多列索引)
CREATE INDEX idx_example ON your_table (col1, col2, col3);

-- 索引的最左边的列是col1
-- 以下查询可以使用索引
SELECT * FROM your_table WHERE col1 = 'some_value';

-- 以下查询不能充分利用索引,因为没有涉及到索引的最左边的列
SELECT * FROM your_table WHERE col2 = 'some_value';

Because when MySQL uses an index, it will start matching from the leftmost column of the index, and only when the query condition involves the leftmost column of the index, MySQL can make full use of this index.


**How ​​many bytes of data the B+ tree can store: **The main consideration is how much data the leaf nodes of the B+ tree can store. The leaf nodes of the B+ tree contain the actual index data, while the non-leaf nodes only contain the index key values ​​for navigation.

Insert image description here


Transaction propagation mechanism: Defines the behavior and propagation rules of transactions when other transactions are nested in a transaction. Common transaction propagation behaviors include:

  1. REQUIRED (default):Keywords:必须, indicating that if a transaction currently exists, the new operation must be performed in the transaction; if There is currently no transaction, a new transaction will be created.
  2. REQUIRES_NEW:需要新的, indicating that a new transaction will be created regardless of whether a transaction currently exists.
  3. NESTED:嵌套, indicating execution in a nested transaction of the current transaction.
  4. SUPPORTS:支持, indicating that if a transaction currently exists, join the transaction; if there is no transaction, execute in a non-transactional manner.
  5. NOT_SUPPORTED:不支持, indicating execution in a non-transactional manner. If a transaction currently exists, the transaction will be suspended.
  6. MANDATORY:强制, indicating that if a transaction currently exists, join the transaction; if there is no transaction, an exception will be thrown.
  7. NEVER:从不, indicating execution in a non-transactional manner. If a transaction currently exists, an exception will be thrown.
  8. TIMEOUT:超时, which means based on relative time, if the transaction is not completed within the specified time, the transaction will be rolled back.

Transaction isolation level:

  1. **READ UNCOMMITTED (uncommitted read): **Transactions can read data that other transactions have not yet committed, which may lead to dirty reads, non-repeatable reads, and phantom reads.
  2. **READ COMMITTED: **Features: A transaction can only read data that has been submitted by other transactions, which solves the problem of dirty reads, but there may still be problems of non-repeatable reads and phantom reads.
  3. **REPEATABLE READ (repeatable read):** When the same record is read multiple times in the same transaction, the results are consistent. The dirty read and non-repeatable read problems are solved, but there may still be phantom read problems.
  4. **SERIALIZABLE (serialization): **The highest isolation level, by locking data, it solves all problems of dirty reads, non-repeatable reads and phantom reads. But it may also reduce concurrency performance.
  5. **READ SNAPSHOT (snapshot read): **Transactions can read a data version that exists in the snapshot at the beginning of the transaction, suitable for special implementations of some database systems. The transaction reads the snapshot data at the beginning of the transaction without being affected by other transactions, avoiding the problems of dirty reads, non-repeatable reads and phantom reads.

Dirty reads, non-repeatable reads and phantom reads:

  1. Dirty read ( read and wrote uncommitted):

    • Definition: A dirty read occurs when one transaction reads uncommitted data from another transaction.
    • Scenario: If transaction A modifies a row of data but has not yet committed it, and transaction B reads this uncommitted data, it may cause a dirty read.
  2. Non-repeatable read (read after commit):

    • Definition: When a transaction reads the same row of data multiple times, but during this process another transaction modifies and commits the row of data, causing transaction A to read When different data is accessed, non-repeatable reading occurs.
    • Scenario: If transaction A reads a certain row of data, transaction B modifies and commits this row of data. At this time, transaction A reads the same row of data again, it is possible Get different results than before.
  3. Phantom reading (Read a batch of data after submission):

    • Definition: When a transaction queries a certain range of data multiple times, but during the process another transaction inserts or deletes data that matches this range, causing transaction A to get Different results, phantom reading occurs.
    • Scenario: If transaction A queries a batch of data in a table, and transaction B inserts a new piece of data that meets the query conditions, the result of the query again will be Containing this newly inserted data, phantom reads may occur.

    FOR UPDATE : is a statement used to lock selected rows in a transaction. When you execute a SELECT query within a transaction, you can use FOR UPDATE if you want to prevent other transactions from modifying the data. This sets a shared lock on the selected rows, and other transactions that attempt to update, delete, or add new rows on these rows will be blocked until the holding transaction releases the lock.

Guess you like

Origin blog.csdn.net/qq_20042935/article/details/134660747