MySQL Optimization Reference

One,

  • Table design standardization (three paradigms)
  • Add index (general index, primary key index, the only index, full-text indexing)
  • Part table (horizontal split, vertical split)
  • Separate read and write (write add, update, delete)
  • Stored Procedures
  • For mysql configuration optimization (configure the maximum number of concurrent my.ini, adjust the cache size)
  • Mysql hardware upgrades
  • Regularly delete unnecessary data, regular defragmentation (myisam)

 

Three paradigm: 3NF:

Anti 3NF:

 

SQL statement optimization:

Positioning: Slow Query

  1. Operational status query (running time, number of concurrent, etc. show status)
  2. Show status like 'com_select'
  3. show global status
  4. show global status like 'slow_queries' slow queries

    How to locate slow queries

     

 

Index Tuning:

Explain

 

Database Engine:

Defragmentation: optimiza table table name

Scheduled backup: mysqldump -uroot -proot temp D: /a.bak

Timing: crontab 0 0 0 0 0 mybash.sh

 

Separate read and write:

Master-slave replication:

 

two,

  • 1. must consider performance database design and table creation
  • Note that the optimization of the preparation 2.sql
  • 3. partition
  • 4. The sub-table
  • 5. branch warehouse

1, we must consider the performance of the database design and table creation

mysql database itself is highly flexible, resulting in insufficient performance, it relies heavily on human capacity development. That is a developer of high capacity, the high performance mysql. This is also a common problem many relational databases, so the company dba usually pay huge high.

To pay attention to the design table:

  1. Field avoid null values ​​appear, and query optimization null value is difficult to take up extra space index, suggesting default number 0 instead of null.
  2. Try to use INT instead of BIGINT, if the non-negative plus UNSIGNED (the value of such capacity will be doubled), of course, can use TINYINT, SMALLINT, MEDIUM_INT better.
  3. Instead of using the integer string or an enumeration type
  4. Try to use TIMESTAMP instead of DATETIME
  5. Do not have too many single table field, within the recommended 20
  6. With integer to store IP

index

  1. The index is not possible, to create targeted based on the query, consider listed on WHERE and ORDER BY commands related to index, the index can be used whether or EXPLAIN to see the full table scan according to
  2. Should be avoided fields NULL value judgment in the WHERE clause, will cause the engine to give up using the index and full table scan
  3. Value distribution is sparse field is not suitable for building an index, such as "gender" only two or three values ​​of this field
  4. Character field to build only the prefix index
  5. Character is best not to field a primary key
  6. Without foreign keys, bound by the procedural guarantees
  7. Try not UNIQUE, bound by the procedural guarantees
  8. Idea and query sequence is consistent when using a multi-column index, delete unnecessary single index

In short, the use of the appropriate data type, select the appropriate index

Select the appropriate data type (1) may be able to save the data using minimum data type Integer <date, time <char, varchar <blob (2) the use of simple data types, the integer is smaller than the character processing overhead, because string comparisons more complicated. Such as, int type storage time type, BIGINT type conversion function ip (3) the use of rational attribute length field, the fixed length table will be faster. Use enum, char instead varchar (4) is defined as the use of not null field (5) minimize the use of text, choose a suitable non-indexed column is preferably not part table # (1) frequently queries columns, in where, group by, order by, the column (2) on the where clause conditions appear <, <=, =,>,> =, between, in, + wildcard character string and the like (%) of column (3) the length of the small column, index fields as small as possible, because the database is stored in units of pages, one page can be better able to save data (4) dispersion degree (multiple different values) are shown, on the front joint index . View dispersion, is achieved by different statistical column values, count, the higher the degree of dispersion:

The original developers already running, the table is already established, I can not modify it: the wording to be unenforceable, give up!

2, need to pay attention to optimize the preparation of sql

  1. Limit the use of the recording is defined query results
  2. Avoid select *, the fields will need to find listed
  3. Using a connection (join) instead subquery
  4. Split large delete or insert statement
  5. SQL can to find out by opening a slower slow query log
  6. Do not do arithmetic column: SELECT id WHERE age + 1 = 10, any operation on the columns will cause a table scan, which includes a database tutorial function, evaluate expressions, etc., when a query to the operation moved to the right side of the equal sign as far as possible
  7. sql statement as simple as possible: a sql only a cpu operation; big statement demolition small statement, reducing the lock time; a big sql entire library can be blocked
  8. OR rewritten IN: OR efficiency level is n, IN efficiency is log (n) level, in the number of recommended control in 200
  9. Do not function and triggers in applications to achieve
  10. Avoid% xxx-style inquiry
  11. JOIN less
  12. Were compared using the same type, such as a '123' and '123' ratio, ratio of 123 and 123
  13. Avoid the use in the WHERE clause! = Or <> operator, otherwise the engine to give up using the index and a full table scan
  14. For continuous values, without using BETWEEN IN: SELECT id FROM t WHERE num BETWEEN 1 AND 5
  15. Do not take a full list of data tables, to use LIMIT to pagination, page number and do not much

The original developers have been running, the program has been completed on the line, I can not edit sql, it is: the wording to be unenforceable, give up!

engine

engine

Currently widely used MyISAM and InnoDB two kinds of engines:

MyISAM

MyISAM engine is MySQL 5.1 and earlier versions of the default engine, which is characterized by:

  1. Does not support row lock, while reading all the tables needs to read lock, then the table plus an exclusive lock when writing
  2. It does not support transactions
  3. Does not support foreign keys
  4. It does not support security after the crash recovery
  5. In the table has read queries at the same time, support the insertion of a new record to the table
  6. Support for BLOB and TEXT indexes the first 500 characters, and supports full-text indexing
  7. Support delay update the index, which greatly enhance the write performance
  8. For not modify tables, table supports compression, which greatly reduce the disk space occupied

InnoDB

InnoDB in the MySQL 5.5 as the default index, which is characterized by:

1. Support row lock, using MVCC to support high-concurrency

2. Support Services

3. support foreign keys

4. Support safe recovery after a crash

5. does not support full-text indexing

Overall, MyISAM table for intensive SELECT, INSERT and UPDATE fit and InnoDB table intensive

MyISAM speed may be fast, take up storage space is small, but the program requires transaction support, it is necessary InnoDB, so the program can not be executed, give up!

3, partition

MySQL introduced in version 5.1 is a simple partition split level, users need to add in the construction of the table when the partition parameters, is transparent to the application without having to modify the code

For users, the partition table is an independent logical table, but a plurality of physical sub underlying tables, partition code is actually implemented by a set of object wrapper to the underlying tables, but is a complete SQL layer black box package bottom. MySQL achieve partitioning way also means that the index is in accordance with sub-partition table definition, there is no global index

User SQL statement needs to be done for the partition table optimization, SQL conditions to bring the column partitioning criteria so that a query to locate on a small number of partitions, otherwise it will scan all partitions can be viewed through an SQL statement EXPLAIN PARTITIONS will fall on those partitions to perform SQL optimization, I tested, with no partition condition query column will improve the speed, so the measure is worth a try.

Zoning benefits are:

  1. It allows a single table to store more data
  2. Data partition tables easier to maintain, the entire partition can bulk delete large amounts of data through clearly, you can also add a new partition to support the newly inserted data. It is also possible to optimize, inspection, repair and other operations on a separate partition
  3. Part of the query can be determined only on a small number of falls partition from the query, the speed will soon
  4. Partition table data may also be distributed on different physical devices, with a plurality of hardware devices so funny
  5. You can use some special partition table Lai avoid bottlenecks, such as inode lock contention exclusive access InnoDB single index, ext3 file system
  6. You can back up and restore a single partition

Zoning limitations and disadvantages:

  1. A table can have a maximum of 1024 partitions
  2. If there are columns in the primary key or unique index partition field, then all primary key columns and column must contain a unique index to come in
  3. Partition table can not use foreign key constraints
  4. NULL values ​​will filter invalid partition
  5. All partitions must use the same storage engine

Partition type:

  1. RANGE Partitioning: based on a column belonging to a given continuous value range, the multi-line assigned to the partition
  2. LIST partitioning: RANGE partition according to similar, except that LIST partition is a value based on column values ​​match a set of discrete values ​​to be selected
  3. HASH Partitioning: user-defined based on the expression of the return value to the selected partition, the value of this expression using the column to be inserted into the table of these lines are calculated. This function can include any expression MySQL valid, generating non-negative integer value
  4. Subdivision KEY: similar to HASH by partition, except that only support partition KEY calculating one or more columns, and MySQL server provides its own hash function. There must be one or more columns comprise integer values
  5. Mysql specific concept of partition of your own google or check the official document, I am here just a start a discussion.

My first month according to online records table RANGE partition 12 copies, query efficiency increased by about six times, the effect is not obvious, it is: HASH change id for the partition, divided 64 partitions, query speed increase significantly. problem solved!

The results are as follows:

  1. PARTITION BY HASH (id)PARTITIONS 64 
  2. SELECT  COUNT ()  from  readroom_website; --11,901,336 rows  
  3. / Affected rows: 0 records found: 1 Warnings: 0 Duration 1 query: 5.734 sec /.  
  4. select * from readroom_website where month(accesstime) =11 limit 10;  
  5. / Affected rows: 0 records found: 10 Warning: 0 Duration 1 query:. 0.719 sec * / 

4 points table

Sub-table is to a large table, in accordance with the above processes are optimized, or check card dead, then put this table into multiple tables, put a query into multiple queries, and then combined the results returned to the user.

Sub-table is divided into horizontal and vertical split split split item usually done in a field. Such as in a split id field 100: Table named tableName_id% 100

But: sub-table need to modify the source code, the development will bring a lot of work, greatly increase the cost of development, it is: only suitable in the early stages of development taking into account the existence of large amounts of data, do the points table processing, not suitable do modifications on the line, and the cost is too high !!! select this option, not as I choose a low cost provided by the second and third programs! it is not recommended.

5, sub-libraries

Put a database into multiple, separate read and write recommendations to be on the line, do the real sub-library will also have a lot of development costs, more harm than good! Not recommended.

Scheme II is described in detail: upgrade the database, for a 100% compatible database mysql

mysql performance not, then another. To ensure that does not modify the source code, to ensure a smooth migration of existing business, it is required for a 100% compatible mysql database.

Open source selection

  1. tiDB https://github.com/pingcap/tidb
  2. Cubrid https://www.cubrid.org/
  3. Open source database will bring a lot of operation and maintenance costs and the quality of its industrial and MySQL are still gaps, there are many pit to step on, if your company requirements must be self-built database, then select the type of product.

Cloud data selection

  1. Ali cloud POLARDB
  2. https://www.aliyun.com/product/polardb?spm=a2c4g.11174283.cloudEssentials.47.7a984b5cS7h4wH

Official description language: POLARDB Ali cloud self-development of the next generation of distributed relational database cloud native, 100% compatible with MySQL, storage capacity up to 100T, the highest performance of up to 6 times MySQL. POLARDB combines both commercial databases stable, reliable, high-performance features, but also has open-source database simple, scalable, continuous iteration advantage, and cost only 1/10 of a commercial database.

I opened tested that support free mysql data migration, non-operating costs, enhance the performance of about 10 times, almost the same price with rds, is a good alternative solution!

  1. Ali cloud OcenanBase
  2. Taobao use, Go On two-eleven, outstanding performance, but in the beta, I can not try, but worth the wait
  3. Ali cloud HybridDB for MySQL (formerly PetaData)
  4. https://www.aliyun.com/product/petadata?spm=a2c4g.11174283.cloudEssentials.54.7a984b5cS7h4wH

Official description: cloud database HybridDB for MySQL (formerly known as PetaData) is to support the massive data online transaction (OLTP) and online analytical (OLAP) of HTAP (Hybrid Transaction / Analytical Processing) relational database.

I also tested a bit, is a olap and oltp compatible solutions, but the price is too high, up to 10 dollars per hour, used to store too wasteful for storage and analysis of business together.

  1. Tencent cloud DCDB
  2. https://cloud.tencent.com/product/dcdb_for_tdsql

Official description: DCDB known TDSQL, MySQL protocol and syntax in a compatible, high-performance distributed database automatically split level - i.e., service logic table shown as complete, but the data points evenly split into a plurality of sheets ; each slice using the default standby architecture provides disaster recovery, monitoring, and other non-stop expansion complete solution for TB or PB grade mass data scene.

I do not like to use Tencent, not much to say. The reason is that people can not find a problem, the problem can not be solved online headache! But he was cheap and suitable for ultra-small companies play.

Detailed Description three options: remove mysql, engine data for large data processing

The amount of data billions of dollars, and did not have a choice, and the only big data.

Open Source Solutions

hadoop family. hbase / hate it wants the hive. But there is a very high operation and maintenance costs, the company is generally not afford, did not put one hundred thousand is not a very good output!

Cloud Solutions

Ali cloud MaxCompute with DataWorks, according to the amount paid, the cost is very low.

MaxCompute Hive can be understood as open source, providing sql / mapreduce / ai algorithm / python script / shell scripts manipulate data, the data show in table form, it is stored in a distributed fashion, a timed tasks and batch processes the data. DataWorks provides a way to manage the workflow of your data processing tasks and schedule monitoring.

Of course, you can also choose to Ali cloud hbase and other products, mainly off-line processing, so choose MaxCompute, basically graphical interface operation.

 

 

 

Reproduced the original blog link: https://www.cnblogs.com/lovebing/p/10437717.html

Guess you like

Origin www.cnblogs.com/hcm-php/p/11858616.html