The study notes - "High Performance MySQL"

Database-related knowledge, read "High Performance MySQL" and "database system to achieve" two. Two books comprehensive look better. "High Performance MySQL" start from the point of view, a "database system to achieve" from the perspective of principle. When used to have a database of relevant knowledge learning obsession, we must understand how it is implemented, directly bought a "MySQL kernel: Innodb storage engine", the results can not read, shelved.

Knowledge database, personally feel that the following order is reasonable.

hard disk

"Database system to achieve" in the second chapter, a chapter alone explain the principles of disk storage. This is because the computer has a built-in component in persistent storage capacity of the hard disk only, software succumb to the hardware. Therefore, understanding the characteristics of disk storage in order to understand the logic behind the design software. Disk storage has the following characteristics:

  1. Characteristics A: CPU compared to the delay, disk latency is very, very large. There are in the "performance of the top" in the comparison done, for 3.3GHz CPU, the instruction cycle is a 0.3ns; mechanical hard drives one I / O delay of 1 ~ 10ms. The gap between how much, if a CPU instruction cycle is 1s, then a mechanical hard disk I / O delay is 1 to 12 months. Really wait until the flowers are grateful.

  2. Characteristic B: the disk block device, is written by each of the blocks. Usually a block 512byte. That the use of hard drives, have to pay attention 不能用轮船只运输一个土豆到美国.

  3. Characteristics C: much higher than the order of IO performance of random IO performance. Because the order of IO avoid seek time and rotational latency.

The above-mentioned features not only influenced the design of the database, but also a profound impact on operating system design, for example,page cache

Read and write

Operation of the database is basically a higher-order literacy: select and delete / update the database that we use most frequently operated. Therefore, the database address the core problem is how to organize data to achieve high performance read and write.

### Transaction
high performance can not be ignored concurrent read and write in concurrent scenario, there will be data consistency problems. So 事务it is used to solve the problem of data consistency.

The default each SQL statement is a transaction, you can manually set the commit point change this rule

In MySQL, the transaction isolation level there are four. The four fact, do not rote, can be derived from the application scenarios.

Uncommitted Read

A modified record things a, not submitted; transaction B reads the table, the read transaction. This is 未提交读. If our own design database, modify fields in the original data, if there is no other means of control, it will happen under concurrency. Since the dirty data read, also known as a dirty read.这里我们也可以将事务换一个熟悉的概念:线程来理解

Read Committed

For the above 未提交读problem, if the changes stored in the internal affairs of scope, so uncommitted data does not affect other transactions. This isolation level is committed read. Also known as non-repeatable read. Since the implementation of mistake twice inside you may get different results.

Repeatable read

Non-repeatable read questions submitted by reading faced in 可重复读avoided under isolation level. It can guarantee a transaction repeatedly read the same record will not change. Of course, if the change in the internal affairs records, and the other said. This problem is caused by another level [phantom reads]. This is well understood: two transactions. A read transaction is written record does not exist; B transaction record is written. Transaction A possible case of write failure occurs.

Serializable

The transaction execution order. Minimum performance practice.

index

Query data usually have two typical scenarios: 等值查询and 区间查询. That is select * from table where field=aor select * from table where field between a and b. Without an index, the only way is a full table scan. It's a needle in a haystack approach. Programmers generally focus on two points: where and how optimization problems. For the equivalent query, the best way to optimize the hash. For range queries, not a hash algorithm comes in, because it has a hidden logic: the sort. Typically, data structures are provided with sorting function: the sorted array, a linked list. Jump table. AVL trees, red-black tree, B tree, B + tree.

Why B + tree?

  1. Hard block device. Use a B + tree, can be accommodated in the same block in the N elements, thereby controlling the level of the B-tree does not exceed three layers. Reduction in the number of IO.
  2. B + is the node leaves the list. Order to facilitate disk IO operations.

performance

Performance is response time. The idea is to troubleshoot performance issues top-down manner:

  1. CPU, memory, network, IO, hard disk is OK?
  2. Not OK only three cases: insufficient margin or configuration unreasonable or out of order.
  3. Index optimization, optimizing the structure of the table, the query optimizer its head in hand.

Performance problems need more knowledge related to the operating system.

copy

This is the MySQL killer. Without replication, MySQL can not be so popular. Arise out of the copy features: separate read and write, load balancing, high availability, failover, backup, upgrade test. Main Street is the concept of abuse.

Copy of ways: copy the log and row-based replication based.

Scalability

Scalability is to improve system capacity by increasing resource capacity. Such as MySQL by separate read and write, add a Slave each node, database concurrency level reading ability has improved. Of course, because the system hierarchy, each level supports scalability of the entire system is qualified for scalability.

Database-level common strategy is to expand the sub-library sub-table of these strategies.

In addition, scalability with system performance are two different things. Hive example, lower performance, but it does not affect its scalability.

High Availability

Availability is essentially less downtime. If the others from the perspective of the computer, the general availability of only 50% human, eight-hour work per day, only 33% availability of people.

The idea of using the top-down, highly available, only two ways: 提升平均失效时间间隔in short, is to make the system down in the middle of the two time intervals as long as possible; 降低平均恢复时间in short, is out of time to repair the shorter the better. Therefore, more highly available architecture is the level of things, such as MySQL's mutual backup, load balancing.

Backup

Backup is easily overlooked topic. Usually wait to see one of those key moments need to carry mine. There are three basic backup role: disaster recovery, auditing, testing.

"High Performance MySQL" book is the depth and breadth of a balance, not so much the principle of the details for the development of reading. If you want more in-depth study, an estimated "Database system implementation" is a more appropriate book.

Guess you like

Origin blog.51cto.com/sbp810050504/2428524