Mysql query process Figure
Why optimization
- Throughput bottleneck in the system often appears on the database access speed
- As the application is running, the data in the database will be more and more, the processing time will be slow
- Data is stored on disk, and memory read and write speed can not be compared
How to optimize
- When designing a database: database tables, fields of design, storage engine
- Make good use of functions provided by MySQL itself, such as indexing, etc.
- Scale: MySQL clustering, load balancing, separate read and write
- Optimizing SQL statements (with little success)
First, the field design phase
Select the most appropriate field properties
1. The width of the field set as small as possible
MySQL can be a good support large amounts of data access, but generally speaking, the smaller tables in the database, perform queries on it also will be faster. Therefore, when creating the table, in order to obtain better performance, we can set the width of columns in the table as small as possible.
2. field to try to NOTNULL
In the possible, try to keep the field is set to NOTNULL, this time to execute the query, the database do not have to compare the NULL values in the future.
3. Determine the type of data is defined as ENUM
For some text fields, such as "provinces" or "gender", we can define them as ENUM type. Because in MySQL, the ENUM type is treated as numeric data, numeric data and much faster rate than it is processed text type. In this way, we can improve database performance.
Table 4 single field should not be excessive, the field may be reserved
The premise is to meet business needs two or three fields is the limit to be set aside to facilitate expansion of the field.
Follow design specifications data sheet
1. The first paradigm (1NF)
Field value is atomic, not subdivided (all relational database systems in first normal form); for example: name field, wherein the first and last name as a whole, if the first and last name to distinguish between two separate fields must be set up; (Field indivisible).
2. The second paradigm (2NF)
A table must have a primary key that uniquely identifies each can line transactions; Note: you must first normal form; (main key, non-primary key field dependent primary key.)
3. Third Normal Form (3NF)
A table can not bear other relevant information table key fields of Africa, namely data table can not have more than Shen field; NOTE: You must first meet the second paradigm; (non-primary key field can not depend on each other)
Second, choose the storage engine
MyISAM and compare Innodb
- InnoDB support things, but MyISAM does not support things
- InnoDB supports row-level locking, and support MyISAM table-level locking
- InnoDB support MVCC (multi-version concurrency control, nothing more than an implementation of optimistic locking), and MyISAM does not support
- Support InnoDB foreign keys, and MyISAM does not support
- InnoDB does not support full-text indexing, and MyISAM support.
Third, the index
What is the index
Mapping relationship between key data is referred to as an index (key == == comprising address and a corresponding record in the disk). A keyword is extracted from the data for identifying a particular content among the retrieved data.
Index Why fast
- Keywords with respect to the data itself, a small amount of data
- Keywords are ordered, binary search can quickly determine the location
Index Type
- General index
key
( ) - Unique index
unique key
( ) - Primary key index
primary key
( ) - Full-text index
fulltext key
( )
Three kinds of indexing index is the same, but have different keyword index limits: General Index: There is no limit on the keywords. The only index: Records provide keywords can not be repeated. Primary key index: The only key requirement is not null
Fourth, the cache query
Check whether to open
Open operation
the windows are my.ini
on linux ismy.cnf
In [mysqld]
the segment configuration query_cache_type
:
- 0: Do not open
- 1: ON, default caches all, the need to increase the SQL statement
select sql-no-cache
prompted to give up Cache - 2: Open the default cache is not necessary to increase the SQL statement
select sql-cache
to proactively cache (commonly == ==)
Set at the client cache size
开启缓存之后设置缓存大小:set global query_cache_size=
64
*
1024
*
1024
;
Cache invalidation problem (big problem)
When the data table changes, the cache will be based on any of the data table is deleted. (Table-level management, not the record level of management, so a higher failure rate)
Precautions
- Applications should not be concerned about
query cache
the use of the situation. You can try, but not by thequery cache
decision of business logic, asquery cache
managed by the DBA. - Cache is a SQL statement for key storage, so even if the same SQL statement functionality, but if more than one space or capitalization differences will result in a match not to cache.
V. partition
Under normal circumstances we create the table corresponds to a set of stored files, use MyISAM
time is a storage engine .MYI
and .MYD
file, use Innodb
when a storage engine .ibd
and .frm
(table structure) files.
When large amounts of data (usually above the level of millions of records), MySQL's performance will begin to decline, then we need to store data across multiple sets of files, ensure the efficiency of its individual files .
Creating creates partitions:
View data
catalog:
Service side of the table partitioning is transparent to the client, the client data is inserted as usual, but the server will store data in the partitioning algorithm dispersion.
MySQL partitioning algorithm provided
The partitioning of the field must be part of the primary key, the partition is to quickly locate data, search for the field of higher frequency should be as strong search field, or in accordance with the field partition meaningless.
- hash (field): the same input to obtain the same output. The results with the input and output of whether the law has nothing to do. == == only for integer fields
- key (field): and
hash(field)
the same properties, onlykey
a == == string handling, specifichash()
multi-step calculates a modulo operation do integer from a string. - range algorithms: an algorithm == == conditions partition, partition size range according to the data (the data using certain conditions, into different partitions).
- list algorithm: partition is also a condition, according to a list of values partition (
in (值列表)
).
Use partition
When a large amount of data tables, partitions bring efficiency gains will be revealed.
Only search field is partition field, partition brought efficiency gains will be more obvious. Therefore, the choice == == partition field is very important, and to make the business logic == == The partition field adjusted as much as possible (as far as possible a partition field as a query).
Six clusters
Master-slave replication
Separate read and write (based on master-slave)
Load Balancing
- polling
- WRR: weighted in accordance with processing capability
- Load distribution: According to the current free state (but test the memory usage, CPU utilization of each node, etc., do compare to select the most busy one, efficiency is too low)
High Availability
When server architecture, in order to ensure that no server downtime 7x24 online, you need a single point for each server (server served by a server, such as writing server, database middleware) to provide redundancy machine.
For the write server, the need to provide a similar write - redundant servers, when writing server health (write - through redundancy heartbeat), write - redundancy as a copy write content server machine from its role to do a synchronous; when the server goes down to write, write - top to redundant servers will continue to serve as the write server. To the outside world this process is transparent, that is, the outside world only access the service through a IP.
Seven typical SQL
Online DDL
DDL (Database Definition Language) is a defined database table structure ( create table
) and maintenance ( alter table
language). Execute DDL on-line, below MySQL5.6
-time version will lead to a full table is exclusively locked, this time table is in maintenance, non-operational state, which can lead to not respond to all access tables that period. But MySQL5.6
after, support Online DDL
, greatly reducing the lock time.
DDL optimization techniques are used to maintain table structure (such as an increase or add an index), is == copy == strategy. Idea: create a new table to meet the new structure, one by one the old table data == == import (copy) to the new table, to ensure that small-time == == content locked (locked is the data being imported) , and you can perform other tasks on the old table. During the import process, for all operations of the old table recorded in the form of logs, after the import is complete, the update log will then perform it again on the new table (to ensure consistency). Finally, the new table replace the old table (complete the application, or rename the database, view completed).
But with the MySQL upgrade, this problem is almost faded out.
Database import statement
When restoring data, you may import large amounts of data. At this time, in order to quickly import, you need to have some tips:
Disable indexes and constraints on import:alter table table-name disable keys
After the completion of the data to be imported, then turn indexes and constraints, a one-time to create the index:alter table table-name enable keys
- If you are using the database engine
Innodb
, then it will default == write instructions for each transaction plus == (which also consume a certain amount of time), it is recommended to manually open a transaction, and then perform the bulk import a certain amount of the last manual commit the transaction. - If the bulk import of SQL commands in the same format but different data, then you should
prepare
== == about pre-compiled, so that can save a lot of time to repeat the compilation
limit offset,rows
Try to ensure that no large offset
, such as limit 10000,10
the equivalent number of lines have been discarded before the check out 10000
line and then take 10
the line, plus a number of conditions can filter it (complete screening), and should not be use limit
to skip the query to the data. This is a == offset
do useful work == issues. Correspond to the actual project, to avoid big page, try to guide the user to perform a condition filter.
select * to use less
That is, try to choose the fields they need select
, but the impact is not great, because the network transmission more tens of hundreds of bytes not much delay, and now the popular ORM frameworks are used select *
, but when we designed the table Note the separation of large amounts of data fields, such as product details can be pulled out of a single commodity details table, so loading speed when you view a page brief goods it will not affect the.
order by rand () do not use
It is logical random ordering (data for each generate a random number, a random number is then sorted according to size). As select * from student order by rand() limit 5
the efficiency is very low, since it generates a random number for each data table and sorting, we just before 5.
Solution: In the application, the primary key will randomly generate good, to the database using the master key retrieval.
Single-table and multi-table queries
Multi-table query: join
sub-queries are related to the multi-table queries. If you use the explain
analysis of the implementation plan you will find multi-table query is a table of a processing table, the final result of the merger. Therefore we can say single-table queries to calculate the pressure placed on applications, and multi-table queries to calculate the pressure placed on the database.
(Single-table query, if there are foreign keys automatically go to query the association table, a table is a table of the investigation) have now ORM framework to help us solve the problem single-table query object mapping brings.
count(*)
In the MyISAM
storage engine, it will automatically record the number of rows of a table, the use of count(*)
fast return. The Innodb
interior is no such a counter, we need to manually count the number of records, the idea is to use a table to solve alone:
limit 1
If you can only retrieve a determined, suggested adding limit 1
, in fact, ORM framework to help us do this (a single query operations will automatically add limit 1
).
Eight, slow query log
Used to record the execution time exceeds a certain critical value of SQL logs, slow query for fast positioning, a reference to our optimization.
Turn on slow query log
Configuration slow_query_log
items: .
It can be used show variables like ‘slov_query_log’
to see if open, if the status value OFF
can be used set GLOBAL slow_query_log = on
to open, it will datadir
generate a lower xxx-slow.log
file
Setting critical time
Configuration Item: long_query_time
View: show VARIABLES like 'long_query_time', in seconds
setting: set long_query_time = 0.5
real-time operation should be set from time to time short, is about the slowest SQL optimized away.
View Log
Once the SQL exceeds the critical time we set it will be recorded xxx-slow.log
in.
Nine, profile information
Open profile
When turned on, more information will all SQL execution is automatically recorded
View profile information
See all the detailed steps in one of the SQL time by Query_ID
Ten, a typical server configuration
max_connections
The maximum number of client connections
table_open_cache
Table file handle cache (table data is stored on disk, easy to handle caching disk file to open the file read data)
key_buffer_size
Index cache size (the index read from the disk cache to memory, you can set bigger, it is conducive to rapid retrieval)
innodb_buffer_pool_size
,Innodb
The storage engine cache pool size (forInnodb
the most important one configuration, if all the tables are usedInnodb
, so even suggested the setting value to 80 percent of physical memory,Innodb
a lot of performance improvements such as indexes rely on this)
innodb_file_per_table
(innodb
In the table stored in the data.ibd
file, if the configuration item is setON
, then a list corresponding to aibd
file, otherwise, allinnodb
shared table space)