JAVA core knowledge: database

1.1. Storage Engine

1.1.1. Concept

The database storage engine is the underlying software organization of the database, and the database management system (DBMS) uses the data engine to create, query, update, and delete data. Different storage engines provide different storage mechanisms, indexing techniques, locking levels and other functions. Using different storage engines can also obtain specific functions. Many different database management systems now support a variety of different data engines. The storage engines mainly include:

  1. MyIsam
  2. InnoDB
  3. Memory
  4. Archive
  5. Federated

1.1.2. InnoDB (B+ tree)

The underlying storage structure of InnoDB is a B+ tree. Each node of the B tree corresponds to a page of InnoDB. The page size is fixed, generally set to 16k. Among them, non-leaf nodes only have key values, and leaf nodes contain completed data.
Insert picture description here

Applicable scenarios:
1) Frequently updated tables, suitable for handling multiple concurrent update requests.
2) Support affairs.
3) It can be recovered from a disaster (via bin-log, etc.).
4) Foreign key constraints. Only he supports foreign keys.
5) Support for automatically adding column attribute auto_increment.

1.1.3. TokuDB (Fractal Tree-node with data)

TokuDB underlying storage Fractal Tree structure, somewhat similar Fractal Tree and B + tree structure, in the Fractal Tree in 每一个 child 指针除了需要指向一个 child 节点外,还会带有一个 Message Buffer ,这个Message Buffer 是一个 FIFO 的队列,用来缓存更新操作.
For example, an insert operation only needs the Message Buffer that falls on a certain node to return immediately, without the need to search for leaf nodes. These cache updates will be asynchronously merged and applied to the corresponding nodes during query or in the background.
Insert picture description here

TokuDB adds indexes online, does not affect read and write operations, and has very fast write performance. Fractal-tree has advantages in transaction implementation. It is mainly suitable for archiving data or historical data that is not frequently accessed.

1.1.4. MyIASM

MyIASM is the default engine of MySQL, but it does not provide support for database transactions, nor does it support row-level locks and foreign keys. Therefore, when INSERT (insert) or UPDATE (update) data, write operations need to lock the entire table, which is efficient It will be lower.
ISAM is executed 读取操作的速度很快without taking up a lot of memory and storage resources. At the beginning of the design, it was envisaged that the data was organized into fixed-length records and stored in order. —ISAM is a static index structure.
缺点是它不 支持事务处理.

1.1.5. Memory

Memory (also called HEAP) heap memory: Use the content stored in memory to create a table. Each MEMORY table actually corresponds to only one disk file. MEMORY type table access is very fast, because its data is placed in memory, and the HASH index is used by default. But once the service is closed, the data in the table will be lost. Memory 同时支持散列索引和 B 树索引,B树索引可以使用部分查询和通配查询, you can also use operators such as <,> and >= to facilitate data mining. The hash index is equal faster but the range is much slower.

1.2. Index

Index is a data structure that helps MySQL obtain data efficiently. Common query algorithms, sequential search, binary search, binary sort tree search, hash hash method, block search, balanced multi-way search tree B-tree (B-tree)

1.2.1. Common indexing principles are

  1. Select a unique index
  • The value of a unique index is unique, and a record can be determined more quickly through the index.
  1. Create indexes for fields that often require sorting, grouping, and joint operations:
  2. Create indexes for fields that are often used as query conditions.
  3. Limit the number
    of indexes : the more indexes, it will be a waste of time to update the table.
    Try to use indexes with less data
  4. If the value of the index is very long, the query speed will be affected.
    Try to use prefixes to index
  5. If the value of the index field is very long, it is best to use the prefix of the value to index.
  6. Delete indexes that are no longer used or rarely used
  7. The leftmost prefix matching principle is a very important principle.
  8. Try to choose a column with a high degree of discrimination as the index
    . The formula for discrimination is the proportion of fields that are not repeated
  9. Index columns cannot participate in calculations, keep columns "clean": queries with functions do not participate in indexes.
  10. Try to expand the index as much as possible, do not create a new index.

1.3. Three paradigms of database

The paradigm is a table structure with minimal redundancy. 3 The paradigm is as follows:

1.3.1. First Normal Form (1st NF-columns are not subdivided)

The goal of the first normal form is to ensure the atomicity of each column: if each column is the smallest data unit (also known as the smallest atomic unit) that cannot be subdivided, then the first normal form (1NF) is satisfied
Insert picture description here

1.3.2. Second Normal Form (2nd NF-each table only describes one thing)

First, the first normal form is satisfied, and the non-primary key columns in the table do not have partial dependence on the primary key. 第二范式要求每个表只描述一件事情.
Insert picture description here

1.3.3. Third Normal Form (3rd NF-there is no transitive dependency on non-primary key columns)

The definition of the third paradigm is that the second paradigm is satisfied, and the columns in the table do not have transitive dependencies on non-primary key columns. In addition to the primary key order number, the customer name depends on the non-primary key customer number.
Insert picture description here

1.4. The database is a transaction

A transaction (TRANSACTION) is a series of operations performed as a single logical unit of work. These operations are submitted to the system together as a whole, either all are executed or not executed. Transaction is an indivisible logical unit of work

The transaction must have the following four attributes, referred to as ACID attributes:
Atomicity

  1. A transaction is a complete operation. Each step of the transaction is indivisible (atomic); either all are executed or none are executed.
    Consistency
  2. When the transaction is complete, the data must be in a consistent state.
    13/04/2018 Page 218 of 283
    Isolation
  3. All concurrent transactions that modify data are isolated from each other, which means that the transaction must be independent and it should not
    depend on or affect other transactions in any way.
    Durability
  4. After the transaction is completed, its modifications to the database are permanently maintained, and the transaction log can maintain the permanence of the transaction.

1.5. Stored procedures (set of SQL statements for specific functions)

A set of SQL statements set to complete a specific function, stored in the database, after the first compilation, it is called again without recompilation, the user specifies the name of the stored procedure and gives parameters (if the stored procedure has parameters). Execute it. The stored procedure is an important object in the database. Optimization ideas for stored procedures:

  1. Try to use some SQL statements to replace some small loops, such as aggregate functions, average functions, etc.
  2. The intermediate results are stored in a temporary table and indexed.
  3. Use cursor sparingly. sql is a collective language and has high performance for set operations. And cursors are process operations. For example,
    a query of 1 million rows of data. The cursor needs to read the table 1 million times, and it only needs a few reads without using the cursor.
  4. The shorter the transaction, the better. sqlserver supports concurrent operations. If the transaction is too long, or the isolation level is too high, it will cause concurrent operation blocking and deadlock. As a result, the query is extremely slow and the CPU usage rate is extremely low.
  5. Use try-catch to handle error exceptions.
  6. Try not to put the search statement in the loop.

1.6. Trigger

(A program that can be executed automatically) A trigger is a program that can be executed automatically. It is a special stored procedure. The difference between a trigger and an ordinary stored procedure is:, 触发器是当对某一个表进行操作时触发。诸如:update、insert、delete 这些操作的时候,系统会自动调用执行该表上对应的触发器。SQL Server 2005 中触发器可以分为两类:DML 触发器和 DDL 触发器DDL triggers will affect a variety of data definition languages Statement, these statements include create, alter, and drop statements.

1.7. Database Concurrency Strategy

Concurrency control generally uses three methods, namely乐观锁和悲观锁以及时间戳

1.7.1. Optimistic locking

Optimistic lock believes that when a user reads data, others will not write the data they read; pessimistic lock is just the opposite. When you feel that you are reading the database, others may just be writing the data you just read. A more conservative attitude; timestamp means no locks, and use timestamps to control concurrency problems.

1.7.2. Pessimistic lock

Pessimistic locking means that when reading data, in order to prevent others from modifying the data you read, you will first lock the data you read. Only after you have read the data, can you allow others to modify that part of the data, or vice versa. In 自己修改某条数据的时候,不允许别人读取该数据other words , only when your entire transaction is submitted, can you release the lock you added and allow other users to access that part of the data.

1.7.3. Timestamp

Timestamp is to add a column of timestamps separately in the database table, such as "TimeStamp". 每次读出来的时候,把该字段也读出来,当写回去的时候,把该字段加1,提交之前 ,跟数据库的该字段比较一次,如果比数据库的值大的话,就允许保存,否则不允许保存Although this processing method does not use the lock mechanism provided by the database system, this method can greatly increase the amount of concurrency of database processing. The above pessimistic lock is He said adding the "lock" is actually divided into several locks, namely: 排它锁(写锁)和共享锁(读锁).

1.8. Database Lock

1.8.1. Row-level locks

行级锁是一种排他锁,防止其他事务修改此行; When using the following statement, Oracle will automatically apply row-level locks:

  1. INSERT、UPDATE、DELETE、SELECT … FOR UPDATE [OF columns] [WAIT n | NOWAIT];
  2. SELECT… FOR UPDATE statement allows users to lock multiple records at once for update
  3. Use COMMIT or ROLLBACK statement to release the lock

1.8.2. Table-level lock

It means to lock the entire table of the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table-level locks are divided into table shared read locks (shared locks) and table exclusive write locks (exclusive locks).

1.8.3. Page level lock

Page-level locks are a type of lock that has a locking granularity between row-level locks and table-level locks in MySQL. Table-level locking is fast, but there are many conflicts, and row-level conflicts are few, but the speed is slow. Therefore, a compromised page level is taken, and a set of adjacent records is locked at a time. BDB supports page-level locks

1.9. Distributed lock based on Redis

  1. 获取锁的时候,使用 setnx(SETNX key val: If and only if the key does not exist, set a string with a key of val, and return 1; if the key exists, do nothing and return 0) Lock, the value of the lock is a randomly generated The UUID is judged when the lock is released. And use the expire command to add a timeout time to the lock, and the lock will be automatically released after this time.

  2. Call setnx when acquiring the lock 如果返回 0,则该锁正在被别人使用,返回 1 则成功获取锁. It also sets an acquisition timeout period, if it exceeds this time, it will give up acquiring the lock.

  3. When the lock is released, pass UUID 判断是不是该锁, if it is the lock, then 执行 delete 进行锁释放.

1.10. Classification table

There are two types of sub-database sub-tables: vertical and horizontal.

Vertical segmentation (according to functional modules)
divides the table into functional modules and the degree of closeness, and deploys them to different libraries. For example, we will establish definition database workDB, commodity database payDB, user database userDB, log database logDB, etc., which are used to store project data definition tables, commodity definition tables, user data tables, log data tables, etc., respectively.
Insert picture description here

Horizontal segmentation (division and storage according to rules)
When the amount of data in a table is too large, we can divide the data in the table according to certain rules, such as userID hashing, and then store it in multiple tables with the same structure. And different libraries.
Insert picture description here

1.11. Two-phase commit agreement

Distributed transactions refer to transactions that involve operating multiple databases. In a distributed system, each node is physically independent of each other, and communicates and coordinates through the network.

XA is the interface specification (ie interface function) between the transaction middleware and the database defined by X/Open DTP. The transaction middleware uses it to notify the database transaction of the start, end, commit, rollback, etc. XA interface functions are provided by database vendors.
Two-phase Commit (Two-phase Commit) refers to an algorithm (Algorithm) designed to keep all nodes based on a distributed system architecture consistent when committing transactions in the field of computer networks and databases. Usually, two-phase commit is also called a protocol (Protocol). In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of other nodes' operations. When a transaction spans multiple nodes, in order to maintain the ACID characteristics of the transaction, it is necessary to introduce a component as a coordinator to uniformly control the operation results of all nodes (called participants) and finally instruct these nodes whether to perform the operation results. Submit (for example, write the updated data to disk, etc.). Therefore, the algorithm idea of ​​the two-stage submission can be summarized as: the participant informs the coordinator of the success or failure of the operation, and the coordinator decides whether each participant should submit the operation or suspend the operation based on the feedback information of all participants.

1.11.1. Preparation phase

The transaction coordinator (transaction manager) sends to each participant (resource manager) Prepare 消息, each participant either directly returns a failure (such as permission verification failure), or executes the transaction locally 写本地的 redo 和 undo 日志, but does not commit, and reaches a " Everything is ready, only owed to the state of "dongfeng".

1.11.2. Submission phase

If the coordinator receives the participant’s failure message or timeout, it directly sends a rollback (Rollback) message to each participant; otherwise, sends a commit (Commit) message; the participant performs the commit or rollback operation according to the coordinator’s instructions, Release all lock resources used during transaction processing. (Note: The lock resource must be released in the final stage)

1.11.3. Disadvantages

Synchronous blocking problem
1. During execution, all participating nodes are transaction blocking.

Single point of failure
2. Due to the importance of the coordinator, once the coordinator fails. Participants will be blocked forever.

Data inconsistency (split brain problem)
3. In the second stage of the second stage submission, when the coordinator sends a commit request to the participant, a local network abnormality occurs or the coordinator fails during the commit request, resulting in only a part The participant receives the commit request. As a result, the entire distributed system has the phenomenon of data consistency (split brain phenomenon).

Problems that cannot be solved in the second stage (data status is uncertain)
4. After the coordinator sends a commit message, it goes down, and the only participant who receives this message also goes down. So even if the coordinator generates a new coordinator through the election agreement, the status of the transaction is uncertain, and no one knows whether the transaction has been committed.

1.12. Three-phase submission agreement

Three-phase commit, also called three-phase commit protocol, is an improved version of two-phase commit (2PC).
Unlike the two-phase commit, the three-phase commit has two changes.
1. Introduce a timeout mechanism. At the same time, a timeout mechanism is introduced in both the coordinator and the participants.
2. Insert a preparation phase in the first and second phases. It is ensured that the state of each participating node is consistent before the final submission stage. In other words, in addition to introducing a timeout mechanism, 3PC divides the preparation phase of 2PC into two again, so that the three-phase commit has three phases: CanCommit, PreCommit, and DoCommit.

1.12.1. CanCommit phase

The coordinator sends a commit request to the participant, and the participant returns a Yes response if it can submit, otherwise it returns a No response.

1.12.2. PreCommit phase

The coordinator decides whether to proceed according to the reaction of the participants. There are two possibilities. If the coordinator receives a Yes response from all participants, the pre-execution of the transaction will be executed. If any participant sends a No response to the coordinator, or waits for a timeout, the coordinator does not receive any participation Response, then the interruption of the transaction is executed.

1.12.3. doCommit phase

The real transaction submission at this stage mainly includes 1. Coordinating the sending of the submission request 2. The participant submits the transaction 3. The participant response feedback (After the transaction is submitted, the coordinator sends an Ack response.) 4. The coordinator confirms the completion of the transaction .

1.13. Flexible Affairs

1.13.1. Flexible transactions

In Internet scenarios such as the e-commerce field, traditional transactions have exposed bottlenecks in database performance and processing capabilities. Based on the CAP theory and BASE theory in the distributed field, some people have proposed the concept of flexible transactions. CAP (Consistency, Availability, Partition Tolerance) theory has been understood many times and will not be described here. Talk about the BASE theory, it is an extension of the CAP theory. Including Basically Available (Basically Available), Soft State (Soft State), and Eventual Consistency.
Generally speaking, there 两阶段型、补偿型、异步确保型、最大努力通知型are several types of flexible transactions .

Two-phase type
1. Distributed transaction two-phase commit, corresponding to technical XA, JTA/JTS. This is a typical mode of transaction processing in a distributed environment

Compensation type
2, TCC type transaction (Try/Confirm/Cancel) can be classified as compensation type

WS-BusinessActivity provides a long-running transaction processing model based on compensation. Server A initiates the transaction and server B participates in the transaction. If the transaction of server A is executed smoothly, then transaction A will be submitted first. If transaction B is also executed smoothly, transaction B will also be submitted, and the entire transaction will be completed. But if the execution of transaction B fails, transaction B itself rolls back, at this time transaction A has been committed, so a compensation operation needs to be performed to reverse the operation performed by transaction A that has been committed, and restore to the state of transaction A before execution . Such a SAGA transaction model sacrifices a certain degree of isolation and consistency, but improves the availability of long-running transactions.
Insert picture description here

Asynchronous guaranteed type
3. By changing a series of synchronous transaction operations into asynchronous operations based on message execution, the influence of synchronous blocking operations in distributed transactions is avoided.
Insert picture description here

The best-effort notification type (multiple attempts)
4. This is the least demanding type of distributed transaction, and it can also be implemented through message middleware. The difference from the previous asynchronous guaranteed operation is that the message is delivered to consumption by MQ Server After that, the transaction is allowed to end normally after reaching the maximum number of retries.

1.14. CAP

The CAP principle, also known as the CAP theorem, refers to the fact that in a distributed system, Consistency (consistency), Availability (availability), and Partition tolerance (partition tolerance) are not compatible.

Consistency (C):
1. Whether all data backups in a distributed system have the same value at the same time. (Equivalent to all nodes accessing the same copy of the latest data)

Availability (A):
2. After some nodes in the cluster fail, whether the entire cluster can respond to client read and write requests. (High availability for data update)

Partition tolerance (P):
3. In terms of actual effect, partition is equivalent to the time limit for communication. If the system cannot achieve data consistency within the time limit, it means that a partition has occurred, and the current operation must choose between C and A.

Guess you like

Origin blog.csdn.net/qq_46914021/article/details/109204516