interview1-DB

If you need project experience, you can go to Gitee to find project resources.

 1. Redis

 

 1. Cache

The key points of caching can be divided into penetration, breakdown, avalanche, double-write consistency, persistence, data expiration, and elimination strategies.

(1) Penetration, breakdown, avalanche

1. Cache penetration

If you query data that does not exist , mysql cannot query the data and will not directly write it to the cache, which will cause the database to be checked every time you request it.

  • Solution 1: Cache empty data. If the data returned by the query is empty, the empty result is still cached.

Advantages: Simple. Disadvantages: memory consumption, inconsistencies may occur

  • Solution two: Bloom filter.

Bloom filters are mainly used to retrieve whether an element is in a collection. We were using the Bloom filter implemented by redisson. Its bottom layer is mainly to initialize a relatively large array, which stores binary 0 or 1. They are all 0 at the beginning. When a key comes, after three hash calculations, the subscript of the data is found modulo the length of the array and then the original 0 in the array is changed to 1. In this way, the positions of the three arrays can be marked. The existence of a key. The search process is the same. Of course, it has shortcomings. The Bloom filter may produce certain misjudgments. We can generally set the misjudgment rate, which will probably not exceed 5%. In fact, this misjudgment is bound to exist, otherwise we need to increase the array. The length is actually quite divided. A misjudgment rate of less than 5% is acceptable for ordinary projects and will not overwhelm the database under high concurrency.

The smaller the array, the greater the misjudgment rate. The larger the array, the smaller the misjudgment rate, but at the same time it brings more memory consumption.

2. Cache breakdown

Set an expiration time for a certain key . When the key expires, there will be a large number of concurrent requests for this key at this time. These concurrent requests may instantly overwhelm the DB

  • Solution 1: Mutex lock

When the cache fails, do not load db immediately, first use setnx such as Redis to set a mutex, and then load db and reset the cache when the operation returns successfully, otherwise retry the get cache method

  • Solution 2: Logical expiration

①: When setting the key, set an expiration time field and store it in the cache. Do not set an expiration time for the current key.

②: When querying, determine whether the time has expired after retrieving the data from redis

③: If it expires, open another thread for data synchronization. The current thread returns the data normally. This data is not the latest.

If you choose strong consistency of data, it is recommended to use the distributed lock solution, which may not be so high in performance, lock needs, etc., and may also cause deadlock problems

If you choose logical deletion of key, priority is given to high availability and high performance, but data synchronization cannot achieve strong consistency.

3. Avalanche problem

It means that a large number of cache keys expire at the same time or the Redis service is down, resulting in a large number of requests reaching the database, bringing huge pressure.

The solution is generally to spread out the cache expiration time. For example, you can add a random value to the original expiration time, such as 1-5 minutes randomly, so that the repetition rate of each cache expiration time will be reduced, which is very convenient. It is difficult for events to cause collective failure.

(2) Double writing is consistent and durable

1. Double writing is consistent

Remember to introduce the business scenario first, whether the data has high consistency requirements or allows delay.  

That is the problem of synchronization between redis cache and db written data. When the database data is modified, the cached data must also be updated at the same time. The cache and database data must be consistent.

  • Solution 1: Read-write lock.

The read-write lock implemented by redisson is used. Adding a shared lock when reading can ensure that reading and reading are not mutually exclusive and that reading and writing are mutually exclusive. When we update data, add an exclusive lock, which is mutually exclusive for reading and writing. This ensures that other threads will not read data while writing data, and avoid dirty data. What needs to be noted here is that the same lock needs to be used for the read method and the write method.

  • Solution 2: Delay double deletion.

If it is a write operation, we first delete the data in the cache, then update the database, and finally delay deleting the data in the cache. The length of this delay is not easy to determine, and dirty data may appear during the delay. Strong consistency is not guaranteed, so it is not used.

  • Solution 3: Use canal.

2.Sustainability

Redis provides two methods of data persistence: 1. RDB 2. AOF

The full name of RDB is Redis Database Backup file (Redis data backup file), also called Redis data snapshot. To put it simply, all the data in the memory is recorded to the disk. When the Redis instance fails and restarts, the snapshot file is read from the disk and the data is restored.

AOF stands for Append Only File. Every write command processed by Redis will be recorded in the AOF file, which can be regarded as a command log file. When the redis instance crashes to recover data, the command will be executed again from this file to recover the data.

   Which of these two methods recovers faster?

Because RDB is a binary file, its size is relatively small when saved. It can be restored faster, but it may lose data. We usually use AOF to restore data in projects, although AOF restores slower. , but the risk of data loss is much smaller. You can set the disk brushing policy in the AOF file. What we set at the time was to batch write commands once per second.

(3) Data expiration and elimination strategy

1. Data expiration

Redis sets the validity time of the data. After the data expires, the data needs to be deleted from the memory. Deletion can be carried out according to different rules. This deletion rule is called the data deletion policy (data expiration policy).

  • Lazy deletion: After setting the expiration time of the key, we leave it alone. When the key is needed, we check whether it has expired. If it expires, we delete it, otherwise the key is returned.

  • Regular deletion: Every once in a while, we check some keys and delete the expired keys (take a certain number of random keys from a certain number of databases for inspection, and delete the expired keys).

SLOW mode is a scheduled task. The execution frequency defaults to 10hz and does not exceed 25ms each time. This number can be adjustedby modifying the hz option in the configuration file redis.conf.

The execution frequency of FAST mode is not fixed. Each event loop will try to execute, but the interval between two times is not less than 2ms, and each time takes no more than 1ms.

 Redis's expired deletion strategy: lazy deletion + regular deletion are used together.

2. Elimination strategy

When the memory in Redis is not enough, add a new key to Redis at this time, then Redis will delete the data in the memory according to a certain rule. This data deletion rule is called the memory elimination strategy. .

  • LRU means least recently used, subtracting the last access time from the current time. The larger the value, the higher the elimination priority.

  • LFU means least frequency usage. The access frequency of each key will be counted. The smaller the value, the higher the elimination priority.

  1. Prefer the allkeys-lru policy. Take full advantage of the LRU algorithm and keep the most recently accessed data in the cache. If the business has obvious distinction between hot and cold data, it is recommended to use it.

  2. If there is little difference in data access frequency in the business and there is no obvious distinction between hot and cold data, it is recommended to use allkeys-random and randomly select and eliminate them.

  3. If there is a need for pinning in the business, you can use the volatile-lru strategy. At the same time, if the pinned data does not set an expiration time, these data will never be deleted, and other data with an expiration time set will be eliminated.

  4. If there is short-term and high-frequency access data in the business, you can use the allkeys-lfu or volatile-lfu policy.

2. Distributed lock

  • How to implement Redis distributed lock?

There is a command setnx (SET if not exists) provided in redis. Since redis is single-threaded, after using the command, only one client can set the value of a certain key. When the key has not expired or deleted, it is other clients. This key cannot be set on the end.

  • So how do you control the effective duration of distributed locks implemented by Redis?

The setnx instruction of redis is not easy to control this problem. We used redisson, a framework of redis, to implement it. Manual locking is required in redisson, and the lock expiration time and waiting time can be controlled. When a locked business has not been completed, a watchdog mechanism is introduced in redisson, which means that the lock will be locked every once in a while. Check whether the current business still holds the lock. If so, increase the lock holding time. When the business execution is completed, you only need to release the lock.

  • Is the distributed lock implemented by redisson reentrant?

Yes, it can be re-entered. This is done to avoid deadlock. This reentrancy actually internally determines whether the lock is held by the current thread. If the lock is held by the current thread, it will be counted. If the lock is released, the calculation will be reduced by one. The hash structure used when storing data, the big key can be customized according to your own business, the small key is the unique identifier of the current thread, and the value is the number of reentries of the current thread.

3. Cluster

(1) Master-slave replication

The concurrency capability of single-node Redis has an upper limit. To further improve the concurrency capability of Redis, it is necessary to build a master-slave cluster to achieve separation of reading and writing.

Replication Id: Replid for short, is the mark of the data set, and the same id means that it is the same data set. Each master has a unique reply, and the slave will inherit the reply of the master node.

offset: offset, which gradually increases as the data recorded in repl_baklog increases. When the slave completes synchronization, it will also record the current synchronization offset. If the slave's offset is smaller than the master's offset, it means that the slave data lags behind the master and needs to be updated.

The process of master-slave synchronization data:

Full synchronization means that the slave node uses full synchronization when it establishes a connection with the master node for the first time. The process is as follows:

  1. The slave node requests the master node to synchronize data (replication id, offset)

  2. The master node judges whether it is the first request, and synchronizes the version information (replication id and offset) with the slave node for the first time

  3. The master node executes bgsave, generates the rdb file, and sends it to the slave node for execution.

  4. During the execution of rdb generation, the master node will record to the buffer (a log file) in the form of commands

  5. Send the generated command log file to the slave node for synchronization

Incremental synchronization means that when the slave node service is restarted, the data is inconsistent, so at this time, the slave node will request the master node to synchronize data:

  1. The slave node requests synchronization data from the master node. The master node determines that it is not the first request and obtains the offset value of the slave node if it is not the first time.

  2. The master node obtains the data after the offset value from the command log and sends it to the slave node for data synchronization

(2) Sentinel mode

The Sentinel mechanism is used to realize automatic failure recovery of the master-slave cluster. The structure and functions of Sentinel are as follows:

  • Monitoring: Sentinel constantly checks that your master and slaves are working as expected.

  • Automatic failure recovery : If the master fails, Sentinel will promote a slave to the master. When the failed instance is restored, the new master will take over.

  • Notification: Sentinel acts as a service discovery source for the Redis client. When a cluster failover occurs, the latest information will be pushed to the Redis client.

How to solve the split brain in redis cluster?

The cluster brain split is due to the fact that the master node, slave node and sentinel are in different network partitions, so sentinel cannot detect the master node in heartbeat, so it promotes a slave node as the master through election, so that there are two masters . Like a brain split, this will cause the client to still write data to the old master node, and the new node cannot synchronize data. When the network recovers, sentinel will demote the old master node to a slave node, and then start from the new master node. Synchronizing data will result in data loss.

Solution: We can modify the configuration of redis, set the minimum number of slave nodes and shorten the delay time of master-slave data synchronization . If the requirements are not met, the request will be rejected, thus avoiding a large amount of data loss.

(3) Sharded cluster

Master-slave and sentry can solve the problems of high availability and high concurrent reading. But there are still two problems that have not been solved: the problem of massive data storage and the problem of high concurrent writing.

Using sharded clusters can solve the above problems. Features of sharded clusters:

  • There are multiple masters in the cluster, and each master stores different data.

  • Each master can have multiple slave nodes

  • Masters monitor each other's health status through ping

  • Client requests can access any node in the cluster and will eventually be forwarded to the correct node.

The Redis sharding cluster introduces the concept of hash slots. The Redis cluster has 16384 hash slots. After each key passes the CRC16 check, the 16384 is moduloed to determine which slot to place. Each node in the cluster is responsible for a part of the hash slots.

4. Why is Redis so fast?

  1. Completely memory-based, written in C language

  2. Use a single thread to avoid unnecessary context switching and race conditions

  3. Use multiple I/O multiplexing model, non-blocking IO

For example: bgsave and bgrewriteaof both perform operations in the background, do not affect the normal use of the main thread, and will not cause blocking.

Can you explain the I/O multiplexing model?

Redis is a pure memory operation, and its execution speed is very fast. Its performance bottleneck is network delay rather than execution speed . The I/O multiplexing model mainly realizes efficient network requests .

The I/O multiplexing model refers to using a single thread to monitor multiple Sockets at the same time and be notified when a Socket is readable and writable, thereby avoiding invalid waiting and making full use of CPU resources. The current I/O multiplexing is implemented using the epoll mode. It will notify the user process that the Socket is ready and write the ready Socket into the user space. There is no need to traverse the Socket one by one to determine whether it is ready. This is improved performance.

Redis network model:

2. MySQL 

1. Positioning slow query

Symptoms: Page loading is too slow, interface stress test response time is too long (more than 1s)

  1. Option 1: Open source tools

  • Debugging Tool: Arthas

  • Operation and maintenance tools: Prometheus, Skywalking

  1. Solution 2: MySQL comes with a slow log

    The slow query log records the logs of all SQL statements whose execution time exceeds the specified parameter (long_query_time, unit: second, default 10 seconds). If you want to enable the slow query log, you need to set it in the MySQL configuration file (/etc/my.cnf) Configure the following information:

 After the configuration is complete, restart the MySQL server for testing with the following command, and check the information recorded in the slow log file /var/lib/mysql/localhost-slow.log.

2. SQL execution plan

Then the execution of this SQL statement is very slow. How to analyze it?

You can use the EXPLAIN or DESC command to obtain information about how MySQL executes the SELECT statement.

You can use the analysis tool EXPLAIN that comes with MySQL

  1. Check whether the index is hit through key and key_len (whether the index itself is invalid)

  2. Use the type field to check whether there is room for further optimization of SQL and whether there is a full index scan or full disk scan.

  3. Use extra suggestions to determine whether a table return has occurred. If so, you can try to add an index or modify the return field to fix it.

If a sql executes very slowly, we usually use mysql's automatic execution plan explain to check the execution of the sql. For example, you can check whether the index is hit through key and key_len. If the index has been added, You can also determine whether the index is invalid. Second, you can use the type field to check whether there is room for further optimization of the SQL, whether there is a full index scan or a full scan. The third one can use the extra suggestion to determine whether a reply has occurred. If this happens to the table, you can try to fix it by adding an index or modifying the return field.

3. Index

Index is a data structure (ordered) that helps MySQL obtain data efficiently . In addition to data, the database system also maintains data structures (B+ trees) that satisfy specific search algorithms. These data structures refer to (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. The data structure is the index.

  • Index is a data structure that helps MySQL obtain data efficiently (ordered)

  • Improve the efficiency of data retrieval and reduce the IO cost of the database (no need for full table scan)

  • Sorting data through index columns reduces the cost of data sorting and reduces CPU consumption.

(1) Storage engine

The storage engine is the implementation of technologies such as storing data, building indexes, and updating/querying data. The storage engine is table-based, not library-based, so the storage engine can also be called a table type.

There are many storage engines provided in mysql, the more common ones are InnoDB, MyISAM, Memory

  • InnoDB storage engine is the default engine after mysql5.5. It supports transactions, foreign keys, table-level locks and row-level locks.

  • MyISAM is an early engine. It does not support transactions, only table-level locks, and no foreign keys. It is not used much.

  • Memory mainly stores data in memory, supports table-level locks, has no foreign keys and transactions, and is not used much.

(2) Index underlying data structure

MySQL's InnoDB engine uses the B+ tree data structure to store indexes

  • More orders, shorter paths

  • The cost of disk read and write is lower in B+ tree. Non-leaf nodes only store pointers and leaf stages store data.

  • B+ tree is convenient for database scanning and interval query. The leaf node is a doubly linked list.

(3) Clustered and non-clustered indexes

  • If a primary key exists, the primary key index is a clustered index.

  • If no primary key exists, the first unique (UNIQUE) index will be used as the clustered index.

  • If the table does not have a primary key, or a suitable unique index, InnoDB will automatically generate a rowid as a hidden clustered index.

The clustered index mainly refers to putting the data and the index together. The leaf nodes of the B+ tree store the entire row of data , and there is only one. Generally, the primary key is used as the clustered index.

The value of the non-clustered index is that the data and the index are stored separately. The leaf nodes of the B+ tree store the corresponding primary key . There can be more than one. Generally, the indexes we define ourselves are all non-clustered indexes.

Table return query is to find the corresponding primary key value through the secondary index and search for the entire row of data in the clustered index. This process is table return.

A covering index means that the query uses an index and all the columns that need to be returned can be found in the index.

Covering index means that the select query statement uses the index, and all returned columns must be found in the index. If we use the id query, it will directly go to the clustered index query, an index scan, and directly return the data, with high performance. If there is no index created in the returned column when querying data according to the secondary index, it may trigger a query back to the table, try to avoid using select *, and try to include the indexed fields in the returned column.

Mysql super large paging processing method: Super large paging usually occurs when the amount of data is relatively large. We use limit paging query and need to sort the data. At this time, the efficiency is very low. We can use covering index and subquery to solve the problem first. Query the id field of the data in paging. After determining the id, use a subquery to filter and query only the data in the id list. Because when querying the id, the index is overwritten, so the efficiency can be improved a lot.

(4) Index Creation Principles

Generally there are the following principles: primary key index, unique index, index created based on business (composite index)

  1. Tables with large amounts of data and frequent queries (generally more than 100,000)

  2. Often used as a field for query conditions, sorting, and grouping

  3. Field content is highly differentiated

  4. If the content is long, use prefix index

  5. associative index

  6. To control the number of indexes

  7. If an indexed column cannot store NULL values, constrain it using NOT NULL when creating the table

(5) Index failure scenario

  • Violation of the leftmost prefix rule

  • Range query on the right column, index cannot be used

  • Do not perform operations on index columns, the index will become invalid.

  • The string is not enclosed in single quotes, causing the index to fail. (type conversion)

  • Like fuzzy query starting with %, index failure

4. SQL optimization experience

(1) Table design optimization

  • For example, setting an appropriate value (tinyint int bigint) should be selected according to the actual situation.

  • For example, setting the appropriate string type (char and varchar) has high efficiency with fixed length of char, and slightly lower efficiency with variable length of varchar.

(2) SQL statement optimization

  • SQL statement optimization SELECT statement must specify the field name (avoid using select * directly)

  • SQL statements should be written in a way that causes index failure

  • Try to use union all instead of union. Union will filter more times and is inefficient.

  • Avoid expression operations on fields in where clauses

  • Join optimization can use innerjoin instead of left join or right join. If it must be used, it must be driven by the small table. The inner join will optimize the two tables, giving priority to the small table to the outside and the large table to the inside. left join or right join, the order will not be re-adjusted

(3) Master-slave replication, read-write separation

Master-slave replication, read-write separation If the database usage scenario involves a lot of read operations, in order to avoid the performance impact caused by write operations, a read-write separation architecture can be adopted. The separation of reading and writing solves the problem of database writing, which affects the efficiency of queries.

(4) Sub-database and sub-table

Timing of sub-database sub-table:

  1. The premise is that the project business data is gradually increasing, or the business is developing rapidly (single table data reaches 1000W or 20G)

  2. Optimization can no longer solve performance problems (master-slave read-write separation, query index...)

  3. IO bottleneck (disk IO, network IO), CPU bottleneck (aggregation query, too many connections)

Split strategy:

 Specific split strategy

  1. Horizontal database splitting splits the data of one database into multiple databases to solve the problem of massive data storage and high concurrency.

  2. Horizontally split tables to solve single table storage and performance issues

  3. Vertical sub-library, split according to business, increase the number of disk IO and network connections under high concurrency (commonly used in microservices)

  4. Vertical table splitting, hot and cold data separation, multiple tables do not affect each other

When doing horizontal splitting, mycat middleware is generally used to divide databases and tables:

 Mycat middleware can solve the problems encountered when sharding databases and tables, such as:

  • Distributed transaction consistency issues

  • Cross-node related query

  • Cross-node paging and sorting functions

  • Duplicate primary key

5. Affairs

(1) Transaction characteristics

A transaction is a set of operations , which is an indivisible unit of work. A transaction will submit or revoke operation requests to the system as a whole, that is, these operations will either succeed at the same time or fail at the same time.

ACID properties of transactions:

  • Atomicity : A transaction is an indivisible minimum unit of operations, either all succeed or all fail.

  • Consistency : When a transaction is completed, all data must be in a consistent state.

  • Isolation : The isolation mechanism provided by the database system ensures that transactions run in an independent environment that is not affected by external concurrent operations.

  • Durability : Once a transaction is committed or rolled back, its changes to the data in the database are permanent.

redo log: records the physical changes of the data page. If the service is down, it can be used to synchronize the data.

undo log: records the logical log. When the transaction is rolled back, the original data is restored through the reverse operation.

The redo log ensures the durability of the transaction, and the undo log ensures the atomicity and consistency of the transaction.

(2) Isolation level

Concurrent transaction issues: dirty reads, non-repeatable reads, phantom reads

Solution: Isolate transactions

Isolation level: read uncommitted, read committed, repeatable read, serialized

 Note: The higher the transaction isolation level, the more secure the data, but the lower the performance.

 What guarantees the isolation of transactions?

  • Lock: exclusive lock (if a transaction acquires an exclusive lock on a data row, other transactions cannot acquire other locks on the row)

  • mvcc : multi-version concurrency control

(3)MVCC

The full name is Multi-Version Concurrency Control, multi-version concurrency control. Refers to maintaining multiple versions of a data, so that read and write operations do not conflict. The specific implementation of MVCC mainly depends on the implicit fields in the database records , undo log logs , and readView .

  • Hidden fields:

  1. trx_id (transaction id), records the transaction id of each operation, is auto-incrementing

  2. roll_pointer (rollback pointer), points to the transaction version record address of the previous version

  • undo log:

  1. Roll back the log and store the old version data

  2. Version chain: Multiple transactions operate a certain row of records in parallel, record the versions of data modified by different transactions, and form a linked list through the roll_pointer pointer.

  • readView solves the problem of selecting a version of a transaction query

  1. Determine which version of data should be accessed based on the matching rules of readView and some current transaction IDs

  2. Snapshot reads are different at different isolation levels, and the final access results are different.

RC: ReadView is generated every time a snapshot read is performed.

RR: ReadView is only generated when a snapshot read is performed for the first time in a transaction and is reused later.

6. Master-slave synchronization principle

 The core of MySQL master-slave replication is the binary log.

The binary log (BINLOG) records all DDL (data definition language) statements and DML (data manipulation language) statements, but does not include data query (SELECT, SHOW) statements.

Copying is divided into three steps:

  1. When the Master database commits a transaction, it will record the data changes in the binary log file Binlog.

  2. The slave library reads the binary log file Binlog of the main library and writes it to the relay log Relay Log of the slave library.

  3. The slave redoes events in the relay log, and the changes will reflect its own data.

Guess you like

Origin blog.csdn.net/yueyue763184/article/details/132456364