Frequently asked database questions in the latest Java interviews in 2020

MySQL

Why use auto-increment column as primary key

  1. If we define the primary key (PRIMARY KEY), then InnoDB will choose the primary key as the clustered index. If the primary key is not explicitly defined, InnoDB will choose the first unique index that does not contain NULL values ​​as the primary key index. If there is no such Unique index, InnoDB will choose the built-in 6-byte long ROWID as the implicit clustered index (ROWID increases as the primary key is written as row records are written. This ROWID is not as referenceable as ORACLE's ROWID and is implicit).
  2. The data record itself is stored in the leaf node of the main index (a B+Tree). This requires that the data records in the same leaf node (one memory page or disk page) are stored in the order of the primary key, so whenever a new record is inserted, MySQL will insert it into the appropriate node according to its primary key And position, if the page reaches the load factor (InnoDB defaults to 15/16), then a new page (node) is opened
  3. If the table uses an auto-increasing primary key, every time a new record is inserted, the record will be sequentially added to the subsequent position of the current index node. When a page is full, a new page will be opened automatically
  4. If you use a non-incremental primary key (such as an ID number or student number, etc.), since the value of the primary key inserted each time is approximately random, each new record must be inserted into a certain position in the middle of the existing index page. MySQL has to move the data in order to insert the new record into the appropriate position. Even the target page may have been written back to the disk and cleared from the cache. At this time, it must be read back from the disk. This adds a lot of overhead. Frequent movement and paging operations caused a lot of fragmentation, resulting in a less compact index structure, and then had to use OPTIMIZE TABLE to rebuild the table and optimize the page filling.

 

Why use data indexing can improve efficiency

  1. The storage of data indexes is ordered
  2. In the case of order, querying a data through the index does not need to traverse the index records
  3. In extreme cases, the query efficiency of the data index is the dichotomy query efficiency, which is close to log2(N)

Everyone who thinks that the summary of this interview question is well written, you can forward + follow, and then scan the QR code below to get more interview questions and answers — scan to add a password: [CSDN]

 

The difference between B+ tree index and hash index

B+ tree is a balanced multi-branch tree, the height difference from the root node to each leaf node is not more than 1, and the nodes of the same level are linked by pointers, which is ordered

 

The hash index is to use a certain hash algorithm to convert the key value into a new hash value. It does not need to be searched from the root node to the leaf node level by level like a B+ tree. Just one hash algorithm is enough. Disorderly

 

Advantages of hash index:

 

  1. Equivalent query. Hash index has an absolute advantage (the premise is: there is not a large number of repeated key values, if a large number of repeated key values, the efficiency of the hash index is very low, because of the so-called hash collision problem.)

 

Scenarios where hash index is not applicable:

 

  1. Does not support range query
  2. Does not support index completion sort
  3. The leftmost prefix matching rule of the joint index is not supported

Generally, the B+ tree index structure is suitable for most scenarios, and it is more advantageous to use hash index in the following scenarios:

In the HEAP table, if the stored data has a low degree of repetition (that is to say, the cardinality is large), the column data is mainly equivalent query. When there is no range query and no sorting, it is especially suitable to use a hash index, such as This SQL:

select id,name from table where name='李明'; — 仅等值查询

The B+ tree index is used by default in the commonly used InnoDB engine, which will monitor the usage of the index on the table in real time. If it is considered that the establishment of a hash index can improve the query efficiency, it will automatically store the "adaptive hash index buffer in memory" "Create a hash index (adaptive hash index is turned on by default in InnoDB). By observing the search mode, MySQL will use the prefix of the index key to create a hash index. If almost most of a table is in the buffer pool, then create a Hash index can speed up equivalent query.

Note: Under certain workloads, the performance improvement brought by the hash index search is far greater than the additional monitoring index search situation and the overhead of maintaining the hash table structure. But sometimes, in the case of high load, the read/write lock added in the adaptive hash index will also bring competition, such as high-concurrency join operations. The like operation and% wildcard operation are also not applicable to adaptive hash indexes, and adaptive hash indexes may need to be turned off.

The difference between B tree and B+ tree

  1. B-tree, each node stores key and data, all nodes form this tree, and the leaf node pointer is nul, the leaf node does not contain any key information.
  2. B+ tree, all leaf nodes contain the information of all keywords, and pointers to records containing these keywords, and the leaf nodes themselves are linked in the order of smaller and larger keywords, and all non-terminal nodes The point can be regarded as the index part, and the node only contains the largest (or smallest) key in the root node of its subtree. (And the non-terminal nodes of the B-tree also contain valid information that needs to be found)

 

Why is B+ more suitable for file indexing and database indexing of operating systems in practical applications than B-tree?

  1. B+'s disk read and write costs are lower. B+'s internal node does not have a pointer to the specific information of the keyword. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.
  2. The query efficiency of B+-tree is more stable because the non-terminal point is not the node that ultimately points to the file content, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

MySQL Union Index

  1. A joint index is an index on two or more columns. For joint index: Mysql uses the fields in the index from left to right. A query can use only a part of the index, but only the leftmost part. For example, the index is key index (a,b,c). It can support a, a,b, a,b,c 3 combinations for searching, but it does not support b,c for searching. When the leftmost field is a constant reference , The index is very effective.
  2. With additional columns in the index, you can narrow the scope of your search, but using one index with two columns is different from using two separate indexes. The structure of the composite index is similar to that of the phone book. The names of people are composed of surnames and first names. The phone book is first sorted by surname pair, and then the people with the same last name are sorted by first name. If you know the last name, the phone book is very useful; if you know the first and last name, the phone book is more useful, but if you only know the first name but not the last name, the phone book will be useless.

Under what circumstances should not build or build less index

  1. Too few table records
  2. Tables that are frequently inserted, deleted, and modified
  3. Table fields with repeated data and even distribution. If a table has 100,000 rows of records, there is a field A that has only two values ​​of T and F, and the distribution probability of each value is about 50%, then create a field in this table A Indexes generally do not increase the query speed of the database.
  4. Table fields that are often queried together with the main field but the main field index value is more

MySQL partition

1. What is a table partition?

Table partitioning refers to decomposing a table in the database into multiple smaller, easy-to-manage parts according to certain rules. Logically, there is only one table, but the bottom layer is composed of multiple physical partitions.

 

The difference between table partitioning and sub-table

Sub-table: It refers to dividing a table into multiple different tables through certain rules. For example, the user order records are divided into multiple tables according to time.

The difference between partitioning and partitioning is that partitioning logically has only one table, while partitioning is to decompose a table into multiple tables.

 

3. What are the benefits of table partitioning?

  1. The data of the partition table can be distributed on different physical devices, thereby efficiently using multiple hardware devices. 2. Compared with a single disk or file system, it can store more data
  2. Optimize queries. When the where statement contains partition conditions, you can only scan one or more partition tables to improve query efficiency; when it involves sum and count statements, you can also process them in parallel on multiple partitions, and finally summarize the results.
  3. The partition table is easier to maintain. For example: If you want to delete a large amount of data in batches, you can clear the entire partition.
  4. You can use the partition table to avoid some special bottlenecks, such as the exclusive access of a single index of InnoDB, ext3 asks for the inode lock competition of your system, etc.

 

Four. The limiting factors of the partition table

  1. A table can only have 1024 partitions at most
  2. In MySQL 5.1, the partition expression must be an integer, or an expression that returns an integer. Support for non-integer expression partitioning is provided in MySQL 5.5.
  3. If there are primary key or unique index columns in the partition field, then many primary key columns and unique index columns must be included. That is: the partition field either does not contain the primary key or index column, or contains all the primary key and index column.
  4. Cannot use foreign key constraints in partitioned tables
  5. MySQL partitioning is applicable to all data and indexes of a table. It cannot only partition the table data but not the index, nor can it only partition the index but not the table, nor can it partition only part of the data of the table.

 

5. How to judge whether MySQL currently supports partitioning?

Command: show variables like'%partition%' Operation result:

mysql> show variables like'%partition%'; +-------------------+-------+ | Variable_name | Value | +---- ---------------+-------+ | have_partitioning | YES | +-------------------+ -------+ 1 row in set (0.00 sec) The value of have_partintioning is YES, indicating that partitioning is supported.

 

6. What are the partition types supported by MySQL?

  1. RANGE partition: This mode allows data to be divided into different ranges. For example, a table can be divided into several partitions by year
  2. LIST partition: This mode allows the system to partition the data by the value of a predefined list. According to the value partition in List, the difference from RANGE is that the range value of the range partition is continuous.
  3. HASH partition: This mode allows to calculate the Hash Key of one or more columns of the table, and finally partition the data area corresponding to different values ​​of this Hash code. For example, you can create a table that partitions the primary key of the table.
  4. KEY partition: An extension of the above Hash mode, where the Hash Key is generated by the MySQL system.

 

Four isolation levels

  1. Serializable: It can avoid dirty reads, non-repeatable reads, and phantom reads.
  2. Repeatable read (repeatable read): to avoid the occurrence of dirty reads and non-repeatable reads.
  3. Read committed (read has been committed): to avoid the occurrence of dirty reads.
  4. Read uncommitted: The lowest level, there is no guarantee under any circumstances.

 

About MVVC

The MySQL InnoDB storage engine implements the multi-version concurrency control protocol-MVCC (Multi-Version Concurrency Control) (Note: As opposed to MVCC, it is Lock-Based Concurrency Control). The biggest advantage of MVCC: read without lock, read and write without conflict. In OLTP applications with more reads and less writes, it is very important that reads and writes do not conflict, which greatly increases the concurrent performance of the system. At this stage, almost all RDBMS support MVCC.

  1. LBCC: Lock-Based Concurrency Control, lock-based concurrency control.
  2. MVCC: Multi-Version Concurrency Control, based on a multi-version concurrency control protocol. The purely lock-based concurrency mechanism has low concurrency. MVCC is an improvement on lock-based concurrency control, mainly to increase the concurrency in read operations.

In MVCC concurrency control, read operations can be divided into two categories:

  1. Snapshot read (snapshot read): Read the visible version of the record (may be a historical version), without locking (shared read lock s is also locked, so it will not block other transactions from writing).
  2. Current read: The latest version of the record is read, and the record returned by the current read will be locked to ensure that other transactions will not concurrently modify this record.

 

Advantages of row-level locking:

  1. There are only a few locking conflicts when accessing different rows in many threads.
  2. Only a few changes when rolling back
  3. You can lock a single row for a long time.

 

Disadvantages of row-level locking:

  1. It takes up more memory than page-level or table-level locking.
  2. When used in most of the table, it is slower than page-level or table-level locking because you have to acquire more locks.
  3. If you frequently perform GROUP BY operations on most data or must scan the entire table frequently, it is significantly slower than other locks.
  4. With high-level locking, you can also easily adjust the application by supporting different types of locking, because the cost of locking is less than row-level locking.

 

Simple example of MySQL trigger

  1. CREATE TRIGGER <trigger name> --The trigger must have a name, up to 64 characters, and may be followed by a separator. It is basically similar to the naming of other objects in MySQL.
  2. {BEFORE | AFTER} --The trigger has an execution time setting: it can be set before or after the event occurs.
  3. {INSERT | UPDATE | DELETE} --The trigger events can also be set: they can be triggered during the execution of insert, update or delete.
  4. ON <table name> --The trigger belongs to a certain table: when insert, update or delete operation is performed on this table, it will cause the trigger to activate. We cannot arrange two events for the same table trigger.
  5. FOR EACH ROW-Trigger execution interval: The FOR EACH ROW clause informs the trigger to perform an action every other row, rather than once for the entire table.
  6. <Trigger SQL statement> --The trigger contains the SQL statement to be triggered: the statement here can be any legal statement, including compound statements, but the statement here is subject to the same restrictions as the function.

What is a stored procedure

Simply put, it is a set of SQL statements with powerful functions, which can implement some more complex logic functions, similar to the methods in the JAVA language;

ps: Stored procedures are a bit similar to triggers, both are a set of SQL sets, but stored procedures are actively called and have more powerful functions than triggers. Triggers are automatically called after something is triggered;

What are the characteristics

  1. There are input and output parameters, variables can be declared, there are control statements such as if/else, case, while, etc., by writing stored procedures, complex logic functions can be realized;
  2. General features of functions: modularization, encapsulation, code reuse;
  3. Fast speed, only the first execution needs to go through the compilation and optimization steps, and subsequent calls can be executed directly, eliminating the above steps;
 
 

DROP PROCEDURE IF EXISTS `proc_adder`;

 

DELIMITER ;;

 

CREATE DEFINER=`root`@`localhost` PROCEDURE `proc_adder`(IN a int, IN b int, OUT sum int)

 

BEGIN

 

   #Routine body goes here...

   
 

   DECLARE c int;

 

   if a is null then set a = 0;

 

   end if;

 

 

 

   if b is null then set b = 0;

 

   end if;

   
 

   set sum  = a + b;

 

END

 

;;

 

DELIMITER ;

   
 

set @b=5;

 

call proc_adder(0,@b,@s);

 

SELECT @s as sum;

   
   
   
 

create table tab2(

 

  tab2_id varchar(11)

 

);

   
 

DROP TRIGGER if EXISTS t_ai_on_tab1;

 

create TRAILING t_ai_on_tab1

 

AFTER INSERT ON tab1

 

for EACH ROW

 

BEGIN

 

  INSERT INTO tab2(tab2_id) values(new.tab1_id);

 

end;

   
 

INSERT INTO tab1(tab1_id) values('0001');

   
 

SELECT * FROM tab2;

MySQL optimization

  1. Open query cache, optimize query
  2. Explain your select query, which can help you analyze the performance bottleneck of your query statement or table structure. EXPLAIN query results will also tell you how your index primary key is used, how your data table is searched and sorted
  3. When limit 1 is used when there is only one row of data, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to find the next data that matches the record.
  4. Index the search field
  5. 使用 ENUM 而不是 VARCHAR,如果你有一个字段,比如“性别”,“国家”,“民族”,“状态”或“部门”,你知道这些字段的取值是有限而且固定的,那么,你应该使用 ENUM 而不是VARCHAR。
  6. Prepared Statements Prepared Statements很像存储过程,是一种运行在后台的SQL语句集合,我们可以从使用 prepared statements 获得很多好处,无论是性能问题还是安全问题。Prepared Statements 可以检查一些你绑定好的变量,这样可以保护你的程序不会受到“SQL注入式”攻击
  7. 垂直分表
  8. 选择正确的存储引擎

key和index的区别

  1. key 是数据库的物理结构,它包含两层意义和作用,一是约束(偏重于约束和规范数据库的结构完整性),二是索引(辅助查询用的)。包括primary key, unique key, foreign key 等
  2. index是数据库的物理结构,它只是辅助查询的,它创建时会在另外的表空间(mysql中的innodb表空间)以一个类似目录的结构存储。索引要分类的话,分为前缀索引、全文本索引等;

Mysql 中 MyISAM 和 InnoDB 的区别有哪些?

区别:

  1. InnoDB支持事务,MyISAM不支持,对于InnoDB每一条SQL语言都默认封装成事务,自动提交,这样会影响速度,所以最好把多条SQL语言放在begin和commit之间,组成一个事务;
  2. InnoDB支持外键,而MyISAM不支持。对一个包含外键的InnoDB表转为MYISAM会失败;
  3. InnoDB是聚集索引,数据文件是和索引绑在一起的,必须要有主键,通过主键索引效率很高。但是辅助索引需要两次查询,先查询到主键,然后再通过主键查询到数据。因此,主键不应该过大,因为主键太大,其他索引也都会很大。而MyISAM是非聚集索引,数据文件是分离的,索引保存的是数据文件的指针。主键索引和辅助索引是独立的。
  4. InnoDB不保存表的具体行数,执行select count(*) from table时需要全表扫描。而MyISAM用一个变量保存了整个表的行数,执行上述语句时只需要读出该变量即可,速度很快;
  5. Innodb不支持全文索引,而MyISAM支持全文索引,查询效率上MyISAM要高;

如何选择:

  1. 是否要支持事务,如果要请选择innodb,如果不需要可以考虑MyISAM;
  2. 如果表中绝大多数都只是读查询,可以考虑MyISAM,如果既有读写也挺频繁,请使用InnoDB。
  3. 系统奔溃后,MyISAM恢复起来更困难,能否接受;
  4. MySQL5.5版本开始Innodb已经成为Mysql的默认引擎(之前是MyISAM),说明其优势是有目共睹的,如果你不知道用什么,那就用InnoDB,至少不会差。

大家觉得本次面试题总结的写得不错的朋友,大家可以转发+关注,然后扫描下方二维码获取更多面试题以及答案— 扫描添加暗号:【CSDN】

 

数据库表创建注意事项

一、字段名及字段配制合理性

  1. 剔除关系不密切的字段
  2. 字段命名要有规则及相对应的含义(不要一部分英文,一部分拼音,还有类似a.b.c这样不明含义的字段)
  3. 字段命名尽量不要使用缩写(大多数缩写都不能明确字段含义)
  4. 字段不要大小写混用(想要具有可读性,多个英文单词可使用下划线形式连接)
  5. 字段名不要使用保留字或者关键字
  6. 保持字段名和类型的一致性
  7. 慎重选择数字类型
  8. 给文本字段留足余量

 

二、系统特殊字段处理及建成后建议

  1. 添加删除标记(例如操作人、删除时间)
  2. 建立版本机制

 

三、表结构合理性配置

  1. 多型字段的处理,就是表中是否存在字段能够分解成更小独立的几部分(例如:人可以分为男人和女人)
  2. 多值字段的处理,可以将表分为三张表,这样使得检索和排序更加有调理,且保证数据的完整性!

 

四、其它建议

  1. 对于大数据字段,独立表进行存储,以便影响性能(例如:简介字段)
  2. 使用varchar类型代替char,因为varchar会动态分配长度,char指定长度是固定的。
  3. 给表创建主键,对于没有主键的表,在查询和索引定义上有一定的影响。
  4. 避免表字段运行为null,建议设置默认值(例如:int类型设置默认值为0)在索引查询上,效率立显!
  5. 建立索引,最好建立在唯一和非空的字段上,建立太多的索引对后期插入、更新都存在一定的影响(考虑实际情况来创建)。

 

Redis

Redis单线程问题

单线程指的是网络请求模块使用了一个线程(所以不需考虑并发安全性),即一个线程处理所有网络请求,其他模块仍用了多个线程。

 

为什么说Redis能够快速执行

  1. 绝大部分请求是纯粹的内存操作(非常快速)
  2. 采用单线程,避免了不必要的上下文切换和竞争条件
  3. 非阻塞IO - IO多路复用

 

Redis的内部实现

 

内部实现采用epoll,采用了epoll+自己实现的简单的事件框架。epoll中的读、写、关闭、连接都转化成了事件,然后利用epoll的多路复用特性,不在io上浪费一点时间 这3个条件不是相互独立的,特别是第一条,如果请求都是耗时的,采用单线程吞吐量及性能很差。redis为特殊的场景选择了合适的技术方案。

 

Redis关于线程安全问题

redis实际上是采用了线程封闭的观念,把任务封闭在一个线程,自然避免了线程安全问题,不过对于需要依赖多个redis操作的复合操作来说,依然需要锁,而且有可能是分布式锁。

 

使用Redis有哪些好处?

  1. 速度快,因为数据存在内存中,类似于HashMap,HashMap的优势就是查找和操作的时间复杂度都是O(1)
  2. 支持丰富数据类型,支持string,list,set,sorted set,hash
  3. 支持事务,操作都是原子性,所谓的原子性就是对数据的更改要么全部执行,要么全部不执行
  4. 丰富的特性:可用于缓存,消息,按key设置过期时间,过期后将会自动删除

 

redis相比memcached有哪些优势?

  1. memcached所有的值均是简单的字符串,redis作为其替代者,支持更为丰富的数据类型
  2. redis的速度比memcached快很多
  3. redis可以持久化其数据
  4. Redis支持数据的备份,即master-slave模式的数据备份。
  5. 使用底层模型不同,它们之间底层实现方式 以及与客户端之间通信的应用协议不一样。Redis直接自己构建了VM 机制 ,因为一般的系统调用系统函数的话,会浪费一定的时间去移动和请求。
  6. value大小:redis最大可以达到1GB,而memcache只有1MB

 

Redis主从复制

过程原理:

  1. 当从库和主库建立MS关系后,会向主数据库发送SYNC命令
  2. 主库接收到SYNC命令后会开始在后台保存快照(RDB持久化过程),并将期间接收到的写命令缓存起来
  3. 当快照完成后,主Redis会将快照文件和所有缓存的写命令发送给从Redis
  4. 从Redis接收到后,会载入快照文件并且执行收到的缓存的命令
  5. 之后,主Redis每当接收到写命令时就会将命令发送从Redis,从而保证数据的一致

缺点:所有的slave节点数据的复制和同步都由master节点来处理,会照成master节点压力太大,使用主从从结构来解决

 

Redis两种持久化方式的优缺点

  1. RDB 持久化可以在指定的时间间隔内生成数据集的时间点快照(point-in-time snapshot)
  2. AOF 持久化记录服务器执行的所有写操作命令,并在服务器启动时,通过重新执行这些命令来还原数据集。
  3. Redis 还可以同时使用 AOF 持久化和 RDB 持久化。当redis重启时,它会有限使用AOF文件来还原数据集,因为AOF文件保存的数据集通常比RDB文件所保存的数据集更加完整

 

RDB的优点:

 

  1. RDB 是一个非常紧凑(compact)的文件,它保存了 Redis 在某个时间点上的数据集。 这种文件非常适合用于进行备份: 比如说,你可以在最近的 24 小时内,每小时备份一次 RDB 文件,并且在每个月的每一天,也备份一个 RDB 文件。 这样的话,即使遇上问题,也可以随时将数据集还原到不同的版本。
  2. RDB 非常适用于灾难恢复(disaster recovery):它只有一个文件,并且内容都非常紧凑,可以(在加密后)将它传送到别的数据中心,或者亚马逊 S3 中。
  3. RDB 可以最大化 Redis 的性能:父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程,然后这个子进程就会处理接下来的所有保存工作,父进程无须执行任何磁盘 I/O 操作。
  4. RDB 在恢复大数据集时的速度比 AOF 的恢复速度要快

 

Redis常见的性能问题都有哪些?如何解决?

  1. Master写内存快照,save命令调度rdbSave函数,会阻塞主线程的工作,当快照比较大时对性能影响是非常大的,会间断性暂停服务,所以Master最好不要写内存快照。
  2. Master AOF持久化,如果不重写AOF文件,这个持久化方式对性能的影响是最小的,但是AOF文件会不断增大,AOF文件过大会影响Master重启的恢复速度。Master最好不要做任何持久化工作,包括内存快照和AOF日志文件,特别是不要启用内存快照做持久化,如果数据比较关键,某个Slave开启AOF备份数据,策略为每秒同步一次。
  3. Master调用BGREWRITEAOF重写AOF文件,AOF在重写的时候会占大量的CPU和内存资源,导致服务load过高,出现短暂服务暂停现象。
  4. Redis主从复制的性能问题,为了主从复制的速度和连接的稳定性,Slave和Master最好在同一个局域网内

 

Redis提供6种数据淘汰策略

  1. volatile-lru:从已设置过期时间的数据集(server.db[i].expires)中挑选最近最少使用的数据淘汰
  2. volatile-ttl:从已设置过期时间的数据集(server.db[i].expires)中挑选将要过期的数据淘汰
  3. volatile-random:从已设置过期时间的数据集(server.db[i].expires)中任意选择数据淘汰
  4. allkeys-lru:从数据集(server.db[i].dict)中挑选最近最少使用的数据淘汰
  5. allkeys-random:从数据集(server.db[i].dict)中任意选择数据淘汰
  6. no-enviction(驱逐):禁止驱逐数据

以上便是此次分享的面试题以及答案,如果觉得还不过瘾,大家可以关注我的公众号-【Java烂猪皮】,里面有往期的面试题以及最新的面试分享,关注后回复:【666】即可免费获取更多的Java架构进阶vip学习资料

 

Guess you like

Origin blog.csdn.net/lanzhupi/article/details/109241218