MySQL optimization

The idea of ​​mysql performance tuning

  1. The best optimization - no query! This is no joke.
  2. Understand the proportion of performance waste: too many queries with business logic > unreasonable table structure > low sql statement efficiency > hardware
  3. If a server is overloaded for a long time/periodic overload, or occasionally gets stuck,
    how to deal with it?
    Answer: The big idea:
    is it a periodic change or an occasional problem?
    Is it a problem of the overall performance of the server, or a single problem? Statement problem?
    Specific to a single statement, is this statement the time spent waiting or the time spent querying. The
    only way is to monitor and observe the status of the server.
    1: Observe the status of the server, generally use the following two Command
    Show status; Show processlist;
    Example: mysql> show status;
    #mysqladmin ext

MySQL periodic fluctuation solution:

  1. Reduce irrelevant requests (at the business logic level, we will not discuss it for now, but it is actually the most effective method)
  2. If the number of requests is fixed, it cannot be reduced. We should try to keep the number of requests as stable as possible without drastic fluctuations.
    In many cases, it is not that the server cannot support the total query volume, but that it cannot support peak requests during a certain period of time.
  3. When the load is low at night, centralized failure. There will be a peak in a short period of time, but there are few visits at night, so the peak is not severe. When there are many people around 10:00 in the morning, a part of the cache has been established. During the day, the peak is not severe.
  4. Or let the life cycle of the cache be random within a certain range, which can also slow down the sharp peaks.

Observation of irregular delays

Irregular delays are often caused by inefficient statements. How to catch these inefficient statements.
You can use the show processlist command to observe for a long time, or use slow query.

If you observe the following states, you need to pay attention: When the
converting HEAP to MyISAM
query result is too large, put the result on the disk (the statement is not written well, and the data is too much)

create tmp table
Create a temporary table (such as storing intermediate results when grouping, indicating that the index is not well built)

Copying to tmp table on disk
Copy the memory temporary table to disk (the index is not good, the table field is not selected well)

locked
Locked by other queries (usually easy to happen when using transactions, less common in Internet applications)

logging slow query
Record slow queries

Temporary tables are created in the following situations:

  • If the column of group by has no index, an internal temporary table must be generated,
  • If order by and group by are different columns, or when order by and group by are not the columns of the first table, a temporary table will be generated (experiment 1)
  • Using distinct with order by may result in temporary tables (Experiment 2)
  • Temporary table is used when union merge query

To determine whether a query needs a temporary table, you can use the EXPLAIN query plan and check the Extra column to see if there is Using temporary.

Through the previous experiments, it can be seen that the optimization of the database is a systematic project:

  1. Building a table: Splitting the table structure. For example, the core fields use fixed-length structures such as int, char, enum, etc., and
    non- core fields, or use text and super-long varchar.

  2. Indexing: Reasonable indexes can reduce internal temporary tables (detailed in the index optimization strategy)

  3. Write Statements: Unreasonable statements will result in large data transfers and the use of internal temporary tables.

Table Optimization and Column Type Selection

Table optimization:

  1. Separation of fixed length and variable length.
    For example , id int, which occupies 4 bytes, char(4), which occupies 4 characters in length, is also a fixed length. Time
    , that is, the bytes occupied by each unit value, is fixed. The
    core and common fields should be It is built into a fixed length and placed in a table.
    And varchar, text, blob, such variable-length fields, are suitable for a single table, which is associated with the core table with the primary key.
  2. Common fields and uncommon fields should be separated.
    It is necessary to analyze the specific business of the website, analyze the query scene of the field, and separate the fields with low query frequency.
  3. Reasonably add redundant fields.

Column selection principle:

  1. Field type priority integer > date, time > enum, char > varchar > blob
  2. Just enough, don't be generous (like smallint, varchar(N))
  3. Try to avoid using NULL()

Index optimization strategy

1: Index type
1.1 B-tree index
Note : It is called btree index. From a large perspective, it uses a balanced tree, but in terms of specific implementation, each engine is slightly different.
For example, strictly speaking, the NDB engine uses It is T-tree
Myisam, in innodb, B-tree index is used by default

But abstract - B-tree system can be understood as "sorted fast search structure".

1.2 Hash index
In the memory table, the default is the hash index, and the theoretical query time complexity of hash is O(1)

Question: Since hash search is so efficient, why not use hash index?
Answer:
1. The result after hash function calculation is random
.
2. The range query cannot be optimized. 3. The prefix index cannot be used. For example, in the btree, the value of the field column is " hellopworld
", and the index is added to
query xx=helloword , you can naturally use the index, xx=hello, or you can use the index. (Left prefix index)
Because hash('helloword'), and hash('hello'), the relationship between the two is still random
4. Sorting can't be optimized.
5. You must go back. That is to say, to get the data position through the index, you must go back to the table to get the data

2: Common misunderstandings of btree index
2.1 Add an index to the columns commonly used in the where condition
Example : where cat_id=3 and price>100 ; //Query the third column, products above 100 yuan
Error: on cat_id, and, Indexes are added to price.
Wrong: only cat_id or Price index can be used, because it is an independent index, and only one can be used at the same time.

2.2 After the index is established on multiple columns, which column is queried, the index will play a role.
Wrong : On a multi-column index, the index will play a role, and it needs to meet the left prefix requirement.

Clustered and non-clustered indexes

The index difference between innodb and myisam:

The data of this row is directly stored on the primary index file of innodb, which is called the clustered index. The secondary index points to the reference to the primary key in
myisam. The primary index and secondary index both point to the physical row (disk location).

Advantages and disadvantages of clustered indexes

Advantages: When there are few items in the query based on the primary key, there is no need to return rows (the data is under the primary key node).
Disadvantages: If irregular data insertion occurs, frequent page splits are caused.

High-performance indexing strategy

For innodb, because there are data files under the node, the splitting of the node will be slower.
For the primary key of innodb, try to use an integer type, and it is an increasing integer type.
If it is irregular data, the page will be generated. The split affects the speed.

Index coverage:

Index coverage means that if the column of the query happens to be part of the index, the query only needs to be performed on the index file, and there is no need to go back to the disk to find the data.
This kind of query is very fast and is called "index coverage"

ideal index

  1. Frequent queries
  2. High degree of distinction
  3. small length
  4. Try to cover common query fields as much as possible.

delayed association

select a.* from it_area as a inner join (select id from it_area where name like '%东山%') as t on a.id=t.id;

In the second method, the inner query only follows the order of the name index layer, and the name index layer contains the id value.
Therefore, after walking through the index layer, find all the appropriate ids,
and then use the id one-time through join. Find out all the columns. Go through the name column and then take it.

The first method: Walk along the index file of name, go to the index that meets the conditions, take out its id,
and fetch the data through the id, and take it as you go.

The process of finding a row by id is delayed. — This technique is called "lazy association".

Indexing and Sorting

Sorting may occur in 2 situations:
1: For covering indexes, when querying directly on the index, it is in order. Using index
2: First take out the data and form a temporary table for filesort (file sorting, but the file may be on the disk, also may be in memory)

Our goal - the data taken out is in order! Use the index to sort.

SQL statement optimization

  1. Where is the time spent on sql statements?
    Answer: Waiting time and execution time.
    These two times are not isolated. If a single statement executes faster, there will be less locks on other statements.
    Therefore, let’s analyze how to reduce execution time.

  2. Where is the execution time of the sql statement?
    Answer:
    a: Query --> Query along the index, or even full table scan
    b: Fetch --> After finding the row, take out the data (sending data)

  3. SQL statement optimization ideas?

    • Do not check, calculate through business logic, such as the number of registered members of the forum, we can use the program to estimate the number of daily registrations based on the statistics of the first 3 months.

    • Check less, try to be as accurate as possible, and take less rows. We observe news websites, comment content, etc., and generally take about 10-30 lists at a time.

    • It must be checked, and try to query rows on the index.

Pitfalls of in-type subqueries

Reason: MySQL's query optimizer, optimized for In type, has been changed to the execution effect of exists.
When the goods table is larger, the query speed is slower.

Improvement: use join queries instead of subqueries

explain select goods_id,g.cat_id,g.goods_name from  goods as g inner join (select cat_id from ecs_category where parent_id=6) as t using(cat_id) \G

from subquery:

Note: The temporary table found by the inner from statement has no index.
Therefore: the return content of from should be as small as possible.

count() optimization

Misunderstanding:
1. Myisam's count() is very fast
Answer : Yes, it is faster, but only when querying "all rows" of the table is faster, because Myisam stores the number of rows.
Once a conditional query is made, the speed is no longer Coming soon. Especially there is no index on the column of the where condition.

  1. If the merchants with id<100 are all tested internally by us, we want to check how many real merchants there are?
select count(*) from lx_com where id>=100;  (1000多万行用了6.X秒)

Tips:

select count(*) from lx_com; 快
select count(*) from lx_com where id<100; 快
select count(*) frol lx_com -select count(*) from lx_com where id<100; 快
select (select count(*) from lx_com) - (select count(*) from lx_com where id<100)

group by

Note:
1. Grouping is used for statistics, not for filtering data. For
example: statistical average score, highest score, suitable, but for filtering duplicate data, it is not suitable.
And use index to avoid temporary table and file sorting
2. With Take the connection of table A and B as an example, mainly query the columns of table A,
then the columns of group by and order by should be the same as possible, and the column should display the column declared as A.
3. Union optimization
Note : union all does not improve the filtering efficiency. Must, please use union all
because the cost of union deduplication is very high, so it should be placed in the program to deduplicate.

limit and page turn optimization

limit offset, N, when the offset is very large, the efficiency is extremely low. The
reason is that mysql does not skip the offset row, and then fetches only N rows, but takes offset+N rows, returns the offset row before giving up, and returns N rows. Efficiency Lower, when the offset is larger, the efficiency is lower

Optimization method:
1. From the business up
solution : It is not allowed to turn over 100 pages.
Take Baidu as an example, generally turn pages to about 70 pages.
2. Do not use offset, use conditional query.

Why use a single table query

  1. Single-table queries have higher cache hits than multi-table join queries
  2. In order to facilitate future sub-table sub-libraries
  3. Single-table query is faster than multi-table join query

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325576385&siteId=291194637