A four-month internship soon coming to an end, when the line of actual business data is processed using SQL, to get the maximum performance improvement is the face of one million HIVE table data optimization. In this sort of knowledge.

A full table scan and index scan

In the database, the index for non-table queries are generally referred to as full table scan. Full table scan is the database server used to process each record of the search list until all records that match a given criteria returned.

Index refers to the directory database, is a database a configuration value of one or more columns in the table is sorted, index-specific information can quickly access the database tables. If you want a specific name by the staff to find him or her, then compared with the search for all rows in the table, the index helps to get information faster. A major purpose of the index is to speed up the retrieval data table method, that can help searchers find information auxiliary data structure in line with the record ID restrictions as soon as possible.

In short, a full table scan, which means that should go over all the data in the table to display the data results, the index is an index scan only needs to scan some of the data you can get results.

basically. To get the data accounted for most of the entire table, a full table scan is more appropriate (to index one is not conducive to data additions and deletions, and second, the index also require overhead, but if improperly influence the establishment of the system query performance); if only the whole table a small part, it is usually better index scan.

Some little knowledge about the index:

1. What is the index

Index refers to the directory database, such as: dictionaries letters above directory (applicable to large amounts of data)

2. The advantages and disadvantages of indexing

Pros: fast query speed
Cons: Slow additions and deletions, as the database to be synchronized to maintain the index file, so slow

3. What are index

Normal primary key unique combination

4. Why fast index search

Index structure: B + Tree

5. usually indexed under what circumstances

1. The primary key automatically create unique index
2. frequent as the query condition field should create an index
3. Query table associated with other fields, foreign key relationship index
choice 4. Speed / combination index, the more cost-effective combination index high
5. query sort fields, if the sort field through an index to access will greatly improve the sorting speed of
6 or statistical query packet field.
7. The filter selection field conditions are good for a selected indexed

6. know how useless the use index

By explain sql query execution plan, mainly to see the key index which is used

7. The composite index used it, it is ordered

Used orderly

8. The index will fail under what circumstances? (Related to the second part SQL optimization)

1.like
2.like "% 123%, not front% +
3. Using keywords in, or, null,! =

4....

Indexing

First, the principle of indexing database

1. Determine a large number of queries for the operating table or the large number of additions and deletions to the operation.

2. Try to help indexing a particular query. Check your sql statement, indexing field that frequently appear in the where clause.

3. Try to establish a composite index to further improve system performance. Modify the composite index will consume more, while the composite index also occupy disk space.

4. For small tables, indexing may affect performance

5. It should be avoided fields with less value index (such as gender, only men and women. Name, street or zip code more suitable for indexing).

6. avoid selecting a large type of column as the index data.

A.mysql principle of indexing

Index query the database is an important record query methods, not to build queries into the index, and index requirements and the actual database system should be considered in conjunction with those fields, some of the principles common practice is given below:

1. index often used in the field of the filter;

2. often in SQL GROUP BY statement, ORDER BY index on the field;

3. unnecessary index, such as gender field on fewer different value field;

4. For avoiding frequently accessed row index;

5. A coupled columns (primary key / foreign key) on the index;

6. The composite index established on a plurality of columns that are frequently accessed, but note that the order of the establishment of a composite index is to be determined according to the frequency of use;

7. The non-clustered index established by default, but may want to consider the following cases clustered indexes, such as: having a finite number (less than) unique column; a wide range of query; index can be fully utilized reducing the number of table scan I / 0, and effectively avoid searching the entire table. Of course, a reasonable index to be built on the analysis and prediction of the various queries, the database structure also depends on the design of the DBA.

Second, the database indexed statement

1.PRIMARY  KEY（主键索引）
mysql>ALTER  TABLE  `table_name`  ADD  PRIMARY  KEY (  `column`  )
2.UNIQUE(唯一索引)
      mysql>ALTER  TABLE  `table_name`  ADD  UNIQUE (`column` )
3.INDEX(普通索引)
mysql>ALTER  TABLE  `table_name`  ADD  INDEX index_name (  `column`  )
4.FULLTEXT(全文索引)
mysql>ALTER  TABLE  `table_name`  ADD  FULLTEXT ( `column` )
5.多列索引
mysql>ALTER  TABLE  `table_name`  ADD  INDEX index_name (  `column1`,  `column2`,  `column3`  )

6. Delete Index

drop index index_name on table_name

Second, some methods of optimization of SQL

Optimize the overall steps

A. First, open the database slow query log, navigate to the relatively low efficiency of the sql query, find the corresponding sql statement and analysis

1. The table design is standard, if three standard paradigm
(1) The first paradigm: guaranteed atomicity (not resolved)
(2) The second paradigm: Each table primary key
(3) a third paradigm (each column are related to the primary key)
2. are there are a lot of redundancy field to view the data in the table, field data type is reasonable
3. using varchar instead of char data type as the construction of the table, the absolute value can not store characters
4. avoid null value, the default value for nulls, numerical 0 can be used, an empty character string may be used

II. Sql statement to see whether the norms

(1) Avoid using keyword: or, in, not in, =, <>, * Avoid using the SELECT!
(2) try to avoid sub-queries, most subqueries can join query
(3) or can be used in place instead of using the union to achieve
(4) is used in place that can be used to replace exists

III. Analysis sql whether the index can be used for

(1) explain sql query execution plan is to focus on several columns, type is not a full table scan
(2) look on whether the index can be used, mainly to see which key to use the index
(3) look at the rows number of scan lines is not great

explain

explain show how to use the index to handle mysql select statement and connection table. You can help choose a better index and write more optimized queries.

Use, plus explain before the select statement on it:

Such as:

explain select surname,first_name form a,b where a.id=b.id

Some trick optimization

1. query optimization, should try to avoid full table scan, should first consider indexing by the column involved in where and order.

2. fields should be avoided to a null value is determined in the where clause, will cause the engine to give up using the index and a full table scan, such as:
SELECT ID from where NUM IS null T
may be provided on a default value of 0 num, to ensure table num column value is not null, then this query:
SELECT ID from T = 0 where num

! 3. should be avoided in the where clause = or <> operator, otherwise the engine to give up using the index and a full table scan .

4 should be avoided in the where clause to connect or condition will cause the engine to give up using the index and a full table scan, such as:
SELECT ID from T where NUM = NUM = 10 or 20 is
could this query:
SELECT ID from the WHERE NUM = 10 t
of Union All
the SELECT t the WHERE NUM from the above mentioned id = 20

5.In should be used with caution and not in, otherwise it will lead to a full table scan, such as:
the SELECT NUM in the above mentioned id from the WHERE t (, 2, 3)
for continuous value, can not use in the between:
the SELECT from the above mentioned id t the WHERE NUM between 1 and 3

6. The following query will result in a full table scan:
SELECT name ID from T where like '%% ABC'

7. The fields should be avoided to operate in the where clause expressions, which will cause the engine to give up using the index and full table scan. Such as:
SELECT NUM ID from where T / 2 = 100
should read:
SELECT ID from where NUM = 100 * T 2

8. The fields should be avoided for the function operated in the where clause that will cause the engine to give up using the index a full table scan. Such as:
SELECT from T ID where the substring (name, l, 3) = 'ABC' - to name id abc beginning
should read:
SELECT ID from T where name like 'ABC%'

9. The where clause do not the "=" left the function performed, arithmetic operations, or other expressions, or the system may not work properly indexed.

10. As a condition of using the index field, if the index is a composite index, you must use the index to the first field to ensure the system uses the index as a condition, otherwise the index will not be used, and should as much as possible so that field order is consistent with the order index.

11. Do not write the query does not make sense, such as the need to create an empty table structure:
the SELECT col1, col2 INTO #t from the WHERE t 1 = 0
This code does not return any result sets, but consumes system resources, should be changed like this:
Table #t Create (...)

12. The place exists in many cases with a good choice:
SELECT NUM NUM from A in WHERE (SELECT NUM from B)
was replaced with the following statement:
SELECT NUM exists from A WHERE (SELECT WHERE from NUM = B. 1 a.num)

13. not all valid query index, the SQL query optimization is performed based on data in the table, when the index data is repeated a large number of columns, SQL queries may not use the index to , such as a table has a field sex, male, female almost each half, even if the index built on sex have no effect on the query efficiency.

14. The index is not possible, the corresponding index can certainly improve the efficiency of select, but also reduces the efficiency of insert and update,
as it is possible when the insert or update will rebuild the index, the index needs to be carefully considered how to build, As the case may be.
An index number table is best not more than six, if too much you should consider some of the less frequently used to build the index column if necessary.

15 make use of numeric fields, if only the fields containing numerical information is not possible for the character design, which reduces the performance of the connections and queries, and increases storage costs.
This is because the engine when processing queries and connections one by comparing each character in the string, and for numeric comparison purposes only once is enough.

16. The use as varchar instead of char, since the first variable length field small storage space, storage space can be saved,
Followed by the query, in a relatively small field of search efficiency is clearly higher.

17. Do not use anywhere select * from t, with a specific list of fields instead of "*", do not return any of the fields with less than.

18. Avoid frequent create and delete temporary tables, system tables to reduce the consumption of resources.

19. A temporary table is not unusable, they can make appropriate use of certain routines more efficient, for example, when it is necessary a large table or tables commonly duplicate references a data set. However, for a one-time event, it is best to use export table.

20. When the new temporary table, if one inserts a large amount of data, it may be used instead of select into create table, to avoid a large number of log,
in order to increase speed; if small data, in order to ease the system resource table, should create table, then insert.

21. If you use a temporary table to be sure all the temporary table explicit deleted at the end of the stored procedure, first truncate table, then drop table, to avoid locking the system tables a long time.

22. Try to avoid using a cursor, because the poor efficiency of the cursor, if the cursor operation more than 10,000 lines, you should consider rewriting.

23. Use the cursor before the method or methods based on temporary tables, you should look for set-based solutions to solve the problem, usually more efficient set-based method.

24. temporary tables, cursors are not unusable. Use FAST_FORWARD cursor on small data sets are usually better than other progressive treatment methods, particularly in reference to several tables must be in order to obtain the required data.
In the result set includes "total" than usual routines performed by using the cursor speed fast. If the development time permits, cursor-based methods and can be set based approach to try to see which method is better.

25. Try to avoid large transaction operations, improve system concurrency.

26. Avoid returned to the client a large amount of data, if the data is too large, you should consider the corresponding demand is reasonable.

Daniel Lee _

Published 10 original articles · won praise 2 · Views 1795

Private letter concerns

Data analysis #SQL optimization