Interviewer: how to optimize SQL?

In the interview session, the interviewer asked: How do you design your table structure, painted about ER diagram? Then he continued to dig deep, slow query if there is, how do you optimize your sql of?

Today, I will come and tell us something about how to answer this question Road. First of all, we need to stabilize panic, they are their own hand-made items, the first question should be not large, the second question needs to be fully prepared it before the interview ...

Before answering the question one should first understand the queries: query by a series of sub-tasks, including from the client to the server, and then on the server parse, execution plan, execute, and return the results to the client. The "execution" can be considered throughout the life cycle of the most important stage, which includes a large number of calls in order to retrieve the data storage engine and post-call data processing, including sorting, grouping. To accomplish these tasks, the query takes place at different times, including network, CPU computing, generate statistics and execution plans, lock wait operation. When some unnecessary repetition of certain additional operations or perform some additional operations will consume a lot of time.

The most basic reason for poor query performance is too much data access. Some queries may inevitably need to filter large amounts of data, most of the poor performance of queries can be optimized by reducing the amount of data accessible way. For inefficient queries can be analyzed by the following two steps:

Are you sure a large number of applications exceeds the required data in the search.

MySQL server to confirm whether the analysis of large amounts of data lines than necessary.

Above all theory, in practice, MySQL optimization involves optimization of the SQL statement and the index, optimize the data table structure of these three areas.

Optimizing SQL statements:
1, less subqueries

Minimize the use of sub-query because the subquery will generate a temporary table; unless a temporary table like count (*) small.

2, less SELECT *

Every time I see SELECT * need to look at with suspicion, really need to return all of the columns? Remove all of the columns, the optimizer will not complete the scan index covering such optimization, will also bring additional I / O, memory and CPU consumption for the server.

3, query the necessary records

A common mistake is often mistaken for MySQL returns only the data you need, in fact, MySQL is to return the entire result set and then calculated, it is recommended after the query plus LIMIT.

4. Do not repeat the same query data

Continue to perform the same query, and then return to exactly the same data every time. The program can be used is when the first query this data cached, removed from the cache when needed, so performance will obviously be better.

5, COUNT query optimization

COUNT () function effect polymerization of: a statistical quantity with a column value, statistics may be rows. Note that the requirement column value is non-empty (no statistical NULL), COUNT () when the statistic column values ​​query as few rows.

For example: If we directly record 100 check id>, it involved more than two thousand million lines of print scanning. However, due to COUNT () feature, we can use count () - (id <100) practice, so only 100 scanning lines on the line.

6, Where clause, where the connection between tables must be written before other Where conditions, those conditions can filter out the maximum number of records to be written at the end of last .HAVING Where clause.

7, with alternative EXISTS IN, NOT IN alternatively by NOT EXISTS.

8, to avoid the use of columns in the index calculation.

9, avoiding the use of IS NULL and IS NOT NULL index columns.

10, query optimization, should try to avoid full table scan, you should first consider indexing by the column involved in where and order.

11, should be avoided for null fields to determine the value in the where clause, will cause the engine to give up using the index and full table scan.

12, should be avoided for operating fields in the where clause expression, which will cause the engine to give up using the index and full table scan.

Index Tuning
1, associated with the query optimization

Make sure the ON or the USING clause of the index column. Order to be considered when creating an index associated, when the Tables A and B associated with the column c, if the optimizer is associated sequence B, A, you only need to create an index on the table A, unused indexes take up storage .

2, GROUP BY and DISTINCT Optimization

GROUP BY and DISTINCT optimization is the most effective use of the index. For all grouped columns must be indexed. such as:

select product, count(*) from orders group by product;

Such a query on the product to be indexed.

3, LIMIT page optimization

When paging operation, usually through some data query offset. Then add on to explain the order by, the performance is generally good. For the order by the index column must be added. But for such a limit 10000,10 target 10 to retrieve records must first check in front of 10,000 records. Costly, this time to optimize the easiest way is to use a covering index.

Note that the index case of failure,

1) LIKE statement to "%" at the beginning, fuzzy matching

2) OR use both before and after the statement is not indexed

3) there is an implicit data type conversion (e.g. varchar without single quotation marks may then automatically converted to an int)

Database optimization
select a few tips to optimize data types:

Smaller is usually better to make use of the smallest data type can store data correctly, because occupies less disk, memory, and CPU cache.

Simple Preferably, the selected integer rather than a string, MySQL built selected instead of a string type to store date and time, use the IP address to store the integer.

Try to avoid NULL, many tables contain columns can be NULL, because NULL is the default value for a column, you need to specify as NOT NULL.

Integer type data are generally used int, boolean data for tinyint, but generally the BIGINT integer calculation 64-bit integer.

When required decimal accurate calculation, for example, financial data was stored (calculated imprecise floating point type float and double storage) DECIMAL use, but costly DECIMAL calculation, consider the BIGINT DECIMAL place, the decimal place can be multiplied by the corresponding number of multiples.

varchar, and char

When the need to store a variable length string varchar, than more space-saving storage char, varchar using 1 or 2 additional byte record length. As for the char is stored for the following situations, when one needs to store a very short string (values ​​Y and N are only storage), two values ​​are all close to the fixed length (storage MD5 value), the third is often We need to change the value.

BIT

Before MySQL5.0, BIT is a synonym for TINYINT, as well as an updated version of the MySQL5.0, is a completely different data types. BIT new behavior types: (1) may use more than one true / false value or BIT column in the memory a. The MySQL BIT as a string type, rather than a numeric type. When the value of the BIT retrieved (1), the result is a binary string containing 0 or 1, rather than ASCII "0" or "1."

SET

To save a lot of true / false values, these can be considered merged into a SET column data type, which is a collection of bits represented packed inside MySQL.

Enum string instead of the usual type, as MySQL enumeration is very compact when stored, the stored value for each enumeration MySQL is an integer, and stored in the file table .firm "Digital - string" mappings "lookup table."

DATATIME broader range of storage, the stored value from 1001 to 9999, accurate to the second, regardless of the time zone, 8 bytes of storage, sorting using an unambiguous display time format, saving type TIMESTAMP the number of seconds since midnight of January 1970 1, using 4 bytes of storage space, can only say that from 1970 to 2038, depending on the time zone, more space-efficient, it is recommended to use TIMESTAMP

For strings BOLB and TEXT types are for storing their great data designed, and are binary strings stored.

You can not have too many columns

The best single query within 12 associated table do

When faced with unknown values ​​do not be afraid to use NULL

In practical applications need to mix paradigm and counter-paradigm, using part of the normalization of the schema, cache, and other techniques, the most common method of anti-normalization data is copied or cached, the same is stored in different tables of specific column.

Modify .frm file to speed up ALTER TABLE operation speed

Select the most suitable field attribute definition field width to minimize, as far as possible field NOTNULL provided, for example, 'Department', 'Sex' best applicable ENUM

Using the connection (the JOIN) instead subquery

With United (UNION) instead of manually create the temporary table

Lock the table, optimized transaction processing

Guess you like

Origin blog.51cto.com/14587687/2449141