Ali+Tencent+Meituan's favorite MySQL+Redis interview questions, a collection of social recruitment + spring recruitment + autumn recruitment!

What is MySQL?

  • MySQL is a relational database, which is not often used in Java enterprise-level development, because MySQL is open source and free, and it is easy to expand. The Ababa database system also uses MySQL extensively, so its stability is guaranteed. MySQL is open source, so anyone can download it under the GPL (General Public License) license and modify it according to individual needs. The default port number of MySQL is 3306.

The difference between MyISAM and InnoDB

  • MyISAM is MySQL's default database engine (before version 5.5). Although the performance is excellent, and it provides a large number of features, including full indexing, compression, spatial functions, etc., MyISAM does not support transactions and level locks, and the most important flaw is that it cannot be safely recovered after a crash. However, after version 5.5, MySQL introduced InnoDB (transactional database engine). After MySQL version 5.5, the default storage engine is InnoDB.
  • Most of the time we use the InnoDB storage engine, but in some cases it is also appropriate to use MyISAM, such as in the case of intensive reading. (If you don't mind MyISAM crash recovery issues).

Contrast between the two:

  • Whether to support high-level locking: MyISAM only has table-level locking, while InnoDB supports row-level locking and table-level locking, the default is high-level locking.
  • Whether to support transactions and safe recovery after a crash: MyISAM emphasizes performance, each query is original, and its execution speed is faster than InnoDB type, but does not provide transaction support. However, InnoDB provides advanced database functions such as transaction support transactions and foreign keys. Transaction-safe (ACID compliant) type table with transaction (commit), rollback (rollback) and crash recovery capabilities (crash recovery capabilities).
  • Whether to support foreign keys: MyISAM is not supported, but InnoDB is supported.
  • Whether to support MVCC: Only supported by InnoDB. To deal with highly concurrent transactions, MVCC is more effective than simple locking; MVCC only works under the two isolation levels of READ COMMITTED and REPEATABLE READ; MVCC can be implemented using optimistic locks and pessimistic locks; each The implementation of MVCC in the database is not uniform.

Character set and collation rules

  • Character set refers to a mapping from binary codes to certain types of character symbols. The collation rules refer to the sorting rules under a certain character set. Each character set in MySQL corresponds to a series of collation rules.
  • MySQL uses a method similar to inheritance to specify the default value of the character set. Each database and each data table has its own default value, and they are inherited layer by layer. For example: the default character set of all tables in a library will be the character set specified by the database (these tables will only use the default character set unless the character set is specified)

Talk about the index

  • The data structures used by MySQL indexes mainly include BTree indexes and hash indexes. For a hash index, the underlying data structure is a
    hash table, so when most of the requirements are for single record query, you can choose a hash index, which has the fastest query performance; for other
    scenarios, it is recommended to choose a BTree index.

MySQL's BTree index uses the B+Tree in the B tree, but the implementation of the two main storage engines is different.

  • MyISAM: The data field of the B+Tree leaf node stores the address of the data record. During index retrieval, the index is first searched according to the B+Tree search algorithm. If the specified Key exists, the value of its data field is taken out, and then the corresponding data record is read with the value of the data field as the address. This is called "non-clustered index".
  • InnoDB: The data files themselves are index files. Compared with MyISAM, index files and data files are separated. The table data file itself is an index structure organized by B+Tree, and the leaf node data field of the tree saves complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index. This is called a "clustered index (or clustered index)". The rest of the indexes are used as auxiliary indexes. The data field of the auxiliary index stores the value of the primary key of the corresponding record instead of the address, which is also different from MyISAM. When searching based on the primary index, you can directly find the node where the key is located to retrieve the data; when searching based on the secondary index, you need to retrieve the value of the primary key first, and then go through the primary index. Therefore, when designing a table, it is not recommended to use fields that have been used as primary keys, and it is not recommended to use non-monotonous fields as primary keys. This will cause frequent splits of the primary index.

What is a transaction?

  • A transaction is a logical set of operations, either all of them are executed or none of them are executed.
  • The most classic business is often used as an example to transfer money. If Minming wants to transfer 1,000 yuan to Minhong, this transfer will involve two key operations: reduce Minhong's balance by 1,000 yuan, and increase Minhong's balance by 1,000 yuan. If there is a sudden error between these two operations, it is like a bank system crash, resulting in a decrease in the small balance and no increase in the small balance, which would be wrong. The transaction is to ensure that these two key operations either succeed or both fail.

Four characteristics of things (ACID)

  • Atomicity: Transaction is the smallest unit of execution, and division is not allowed. The originality of the transaction ensures that the actions are either completely completed or completely useless;
  • Consistency: Before and after the transaction is executed, the data remains consistent, and the result of multiple transactions reading the same data is the same;
  • Isolation: When concurrently accessing the database, a user's transaction is not disturbed by other transactions, and the database is unique among concurrent transactions;
  • Durability: After a transaction is committed. The changes to the data in the database are permanent, and even if the database fails, it should not have any impact on it.

What problems do concurrent transactions bring?

In a typical application, multiple transactions are sent concurrently, and the same data is often manipulated to complete each task (multiple users perform operations on the same data). Although concurrency is necessary, it may cause the following problems.

  • Dirty read: When a transaction is accessing data and makes changes to the data, but this modification has not been committed to the database, at this time another transaction also accessed the data, and then used this data. Because this data is data that has not yet been committed, the data read by another transaction is "dirty data", and the operation based on "dirty data" may be incorrect.
  • Lost to modify: refers to when one transaction reads a piece of data, another transaction also accesses the data, then after the data is modified in the first transaction, the second transaction also modified This data. In this way, the result of the modification in the first transaction is lost, so it is called a lost modification. For example: Transaction 1 reads the data A=20 in a table, transaction 2 also reads A=20, transaction 1 modifies A=A-1, transaction 2 also modifies A=A-1, the final result A=19, transaction The modification of 1 is lost.
  • Unrepeatableread: Refers to reading the same data multiple times in one transaction. Before this transaction is over, another transaction also accesses the data. Then, between the two data reads in the first transaction, the data read twice in the first transaction may not be the same due to the modification of the second transaction. This leads to a situation where the data read twice in a transaction is different, so it is called non-repeatable read.
  • Phantom read: Phantom read is similar to non-repeatable read. It occurs when one transaction (T1) reads some data, and then another concurrent transaction (T2) inserts some data. In the ensuing query, the first transaction (T1) will find that there are more records that did not exist originally, as if an illusion occurred, so it is called a phantom reading.

The difference between non-repeatable reading and phantom reading:

  • The point of non-repeatable reading is to modify. For example, if you read a record multiple times and find that the values ​​of some of the columns are modified, the point of phantom reading is to add or delete it. Just read a record multiple times and find that the number of records has increased or decreased. .

What are the transaction isolation levels? What is the default isolation level of MySQL?

  • READ-UNCOMMITTED (read uncommitted): The lowest isolation level, allowing to read uncommitted data changes, which may cause dirty reads, phantom reads or non-repeatable reads.
  • READ-COMMITTED: Allows to read data that has been committed by concurrent transactions. Dirty reads can be prevented, but phantom reads or non-repeatable reads may still occur.
  • REPEATABLE-READ (repeatable read): The results of multiple reads of the same field are consistent, except that the data is modified by its own transaction, which can prevent dirty reads and non-repeatable reads, but phantom reads It may still happen.
  • SERIALIZABLE (serializable): The highest isolation level, fully compliant with the ACID isolation level. All transactions are executed one by one, so that there is absolutely no possibility of interference between transactions, that is, this level can prevent dirty reads, non-repeatable reads, and phantom reads.

    The default isolation level supported by the MySQL InnoDB storage engine is REPEATABLE-READ. We can use the SELECT @@tx_isolation; command to view

Locking mechanism and InnoDB lock algorithm

Locks used by MyISAM and InnoDB storage engines:

  • MyISAM uses table-level locking.
  • InnoDB supports row-level locking and table-level locking, and defaults to row-level locking

Contrast between table-level lock and high-level lock:

  • Table-level lock: The lock with the largest locking granularity in MySQL. It locks the entire table of the current operation, which is simple to implement, consumes less resources than locks, locks quickly, and does not cause deadlocks. It has the largest locking granularity, the highest probability of triggering lock conflicts, and the lowest concurrency. Both MyISAM and InnoDB engines support table-level locks.
  • High-level lock: The lock with the smallest lock granularity in MySQL, which only locks the current operation. High-level locks can greatly reduce conflicts in database operations. The locking granularity is the smallest and the concurrency is high, but the locking overhead is also the largest, and the locking is slow, and deadlocks will occur.

There are three lock algorithms for the InnoDB storage engine:

  • Record lock: the lock on a single record
  • Gap lock: gap lock, lock a range, excluding the record itself
  • Next-key lock: record+gap locks a range, including the record itself

Related knowledge points:

  • Innodb uses next-key lock for the next query
  • Next-locking keying in order to solve the Phantom Problem
  • When the query index contains unique attributes, downgrade next-key lock to record key
  • Gap lock is designed to prevent multiple transactions from inserting records into the same range, which will cause phantom reading problems.
  • There are two ways to explicitly close gap locks: (except for foreign key constraints and uniqueness checks, only record lock is used in other cases) A. Set the transaction isolation level to RC B. Set the parameter innodb_locks_unsafe_for_binlog to 1

Limit the scope of the data

  • Be sure to prohibit query statements without any conditions that restrict the data range. For example: when the user is inquiring about the order history, we can control it within a range;

Read/write separation

  • In the classic database splitting scheme, the main database is responsible for writing and the secondary database is responsible for reading;

Vertical partition

  • Split according to the relevance of data tables in the database. For example, the login information of the existing user in the user table contains the basic information of the user. You can split the user table into two separate tables, or even put them in a separate library for sub-library. In simple terms, vertical splitting refers to the splitting of data table columns, splitting a table with more columns into multiple tables. As shown in the figure below, it should be easier to understand this way.
  • The advantages of vertical splitting: It can make the column data smaller, reduce the number of blocks read during query, and reduce the number of I/Os. In addition, vertical partitioning can simplify the structure of the table and is easy to maintain.
  • Disadvantages of vertical split: The primary key will be redundant, redundant columns need to be managed, and will cause Join operations, which can be solved by joining in the application layer. In addition, vertical partitioning will make transactions more complicated;

Horizontal partition

  • Keep the structure of the data table unchanged, and store data partitions through a certain strategy. In this way, each piece of data is scattered into different tables or libraries, achieving a distributed goal. Horizontal splitting can support a very large amount of data.
  • Horizontal splitting refers to the splitting of data tables. When the number of tables exceeds 2 million rows, it will slow down. At this time, the data of one table can be divided into multiple tables for storage. For example: we can split the user information table into multiple user information tables, so that we can avoid the performance impact of the excessive amount of data in a single table.

    Horizontal splitting can support a very large amount of data. One point to note is that: splitting tables only solves the problem of excessive data in a single table, but because the data in the table is still on the same machine, it does not make much sense to improve the concurrency of MySQL, so it is not split. The best sub-database.

Explain what the pooling design idea is. What is a database connection pool? Why do you need a database connection pool?

  • Pooling design should not be a new term. What we often use, such as java thread pool, jdbc connection pool, redis connection pool, etc., are representative implementations of this type of design. This design will initially preset resources, and the problem solved is to offset the consumption of resources each time, such as the overhead of creating threads and the overhead of acquiring remote connections. It’s better if you go to the hall for cooking. The mother who prepares the rice will first serve the rice and put it there. When you come, you can just add the food directly with the lunch box, and you don’t need to temporarily serve the rice. The efficiency is high. In addition to initializing resources, the pooling design also includes the following characteristics: the initial value of the pool, the active value of the pool, the maximum value of the pool, etc. These characteristics can be directly mapped to the member properties of the java thread pool and database connection pool . This chapter is a good introduction to the pooling design idea, and it is copied directly to avoid repeated wheel building.
  • The database connection is essentially a socket connection. The database server also maintains some caches and user permission information, so it takes up some memory. We can think of the database connection pool as a cache of database connections that are maintained so that these connections can be reused when future requests to the database are needed. Opening and maintaining database connections for each user, especially requests for dynamic database-driven website applications, is expensive and wastes resources. In the connection pool, after creating a connection, place it in the pool and use it again, so there is no need to create a new connection. If all connections are used, a new connection is created and added to the pool. The connection pool also reduces the time that users must wait to establish a connection to the database.

Introduction to redis

  • Simply put, redis is a database, but unlike traditional databases, redis data is stored in memory, so the read and write speed is very fast, so redis is widely used in caching directions. In addition, redis is often used for distributed locks. Redis provides a variety of data types to support different business scenarios. In addition, redis supports transactions, persistence, LUA scripts, LRU-driven events, and multiple cluster scenarios.

Why use redis/why use cache

  • This problem is mainly viewed from two points: "high performance" and "high concurrency".

High performance:

  • Suppose the user accesses some data in the database for the first time. This process will be slower because it is read from the hard disk. The data accessed by the user is stored in the cache, so that the data can be directly obtained from the cache when the data is accessed next time. Operating the cache is to directly manipulate the memory, so the speed is quite fast. If the corresponding data in the database is changed, the corresponding data in the cache can be changed synchronously!

High concurrency:

  • Direct operation of the cache can withstand requests is much larger than direct access to the database, so we can consider transferring part of the data in the database to the cache, so that part of the user's request will go directly to the cache without going through database.

Why use redis instead of map/guava for caching?

  • Cache is divided into local cache and distributed cache. Taking Java as an example, the use of native map or guava is a local cache. The main feature is light weight and fast speed. The life cycle ends with the destruction of the jvm, and in the case of multiple instances, each Each instance needs to save a cache, and the cache is not consistent.
  • The use of redis or memcached is called distributed cache. In the case of multiple instances, each instance uses one copy of cached data, and the cache is consistent. The disadvantage is that it is necessary to maintain the high availability of redis or memcached services, and the entire program architecture is complicated.

The difference between redis and memcached

  • Redis supports richer data types (supports more complex application scenarios): Redis not only supports simple k/v type data, but also provides storage for list, set, zset, hash and other data structures. memcache supports a simple data type, String.
  • Redis supports the persistence of data, which can keep the data in the memory on the disk, and can be loaded again for use when restarting, but Memecache stores all the data in the memory.
  • Cluster mode: memcached does not have the original cluster mode, and needs to rely on the client to realize the splitting and writing of data to the cluster; but redis originally supported the cluster mode.
  • Memcached is a multi-threaded network model that does not block IO reuse; Redis uses a single-threaded multi-channel IO reuse model.

redis transaction

  • Redis implements transaction functions through commands such as MULTI, EXEC, and WATCH. The transaction provides a mechanism to package multiple command requests, and then execute multiple commands at once and sequentially, and during the transaction execution, the server will not interrupt the transaction change to execute the command requests of other clients , It will execute all the commands in the transaction and then process the command requests of other clients.
  • In traditional relational databases, ACID properties are often used to test the reliability and security of transaction functions. In Redis, transactions always have originality (Atomicity), consistency (Consistency) and isolation (Isolation), and when Redis operates in a specific persistence mode, transactions are also durable ( Durability).

Cache avalanche and cache penetration problem solutions

Cache avalanche

Introduction: The cache is invalid for a large amount of time at the same time, so subsequent requests will fall on the database, causing the database to withstand a large number of requests in a short period of time and collapse.

solution

  • Beforehand: Try to ensure the high availability of the entire redis cluster, and make up for machine downtime as soon as possible. Choose the appropriate memory elimination strategy.
  • In the event: local ehcache cache + hystrix current limit & downgrade to prevent MySQL from crashing
  • After the event: use the redis persistence mechanism to restore the cache as soon as possible

Cache penetration

  • The simple point of caching penetration is that the key of a large number of requests does not exist in the cache at all, causing the request to go directly to the database without going through the cache layer at all. For example: a customer deliberately makes a key that does not exist in our cache and initiates a large number of requests, causing a large number of requests to fall into the database. The following pictures are used to show them (the two pictures were not drawn by me, I found them directly on the web in order to save trouble, here is an explanation):

    Cache penetration process:

How to solve Redis's concurrent competition Key problem

  • The so-called problem of Redis's concurrent competition for keys is that multiple systems operate on a key at the same time, but the final execution order is different from the order we expect, which leads to different results! One proposal is recommended: distributed locks (both zookeeper and redis can implement distributed locks). (If there is no Redis concurrency competition for Key issues, do not use distributed locks, which will affect performance)
  • Distributed locks that can be implemented based on zookeeper temporary ordered nodes. The coherent idea is: when each client locks a certain method, it creates a unique instantaneous ordered node in the directory of the specified node corresponding to the method on zookeeper. The method of determining whether to acquire the lock is very simple, only the one with the smallest sequence number among the ordered nodes is determined. When the lock is released, only the instantaneous node needs to be deleted. At the same time, it can prevent the locks from being unable to be released due to service downtime, and the resulting deadlock problem. After completing the business process, delete the corresponding subnode to release the lock.
  • In practice, of course, it is based on reliability. So I pushed Zookeeper.

to sum up

Thank you for seeing this. Please correct me if there are any shortcomings in the article. If you think the article is helpful to you, remember to give me a three-link!

Finally, I prepared ava core knowledge points for everyone + a full set of architect learning materials and videos + first-line big factory interview collections + interview resume templates + Alibaba Meituan Netease Tencent Xiaomi Iqiyi Kuaishou Bilibili interview questions + Spring source code collection+ Let’s share the Java Architecture Practical eBooks with everyone for free! Friends in need can click here to remark csdn to download by themselves . The editor will share java-related technical articles or industry information every day. Welcome everyone to pay attention and forward the article!

Guess you like

Origin blog.csdn.net/jiagouwgm/article/details/113883704