(Reproduced) How to improve query efficiency in a database search of tens of millions?

In a database search of tens of millions, how to improve the query efficiency?

1) Database design: 
a. To optimize the query, avoid full table scan as much as possible, and first consider building indexes on the columns involved in where and order by. 
b. Try to avoid the null value judgment of the field in the where clause, otherwise the engine will give up the use of the index and perform a full table scan, such as: select id from t where num is null You can set the default value of 0 on num to ensure that There is no null value in the num column in the table, and then query like this: select id from t where num=0

c. Not all indexes are valid for queries. SQL optimizes the query based on the data in the table. When a large amount of data in the index column is repeated, the query may not use the index. For example, there are fields sex, male, female in a table. Almost half, so even if an index is built on sex, it will not affect the query efficiency.

d. The more indexes the better, the index can certainly improve the efficiency of the corresponding select, but it also reduces the efficiency of insert and update, because the index may be rebuilt during insert or update, so how to build an index needs to be carefully considered. As the case may be. The number of indexes in a table should not exceed 6. If there are too many indexes, you should consider whether it is necessary to build indexes on some infrequently used columns.

e. Update index data columns should be avoided as much as possible, because the order of index data columns is the physical storage order of table records. Once the value of this column changes, the order of the entire table records will be adjusted, which will consume considerable resources. If the application system needs to update the index data column frequently, it needs to consider whether the index should be built as an index.

f. Use numeric fields as much as possible, and try not to design character fields for fields that only contain numeric information, which will reduce the performance of queries and connections, and increase storage overhead. This is because the engine compares each character of the string one by one when processing queries and joins, whereas only one comparison is required for numbers.

g. Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, the storage space of variable-length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.

h. Try to use table variables instead of temporary tables. If the table variable contains a lot of data, be aware that the indexes are very limited (only the primary key index).

i. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

j. Temporary tables are not unusable, and their proper use can make certain routines more efficient, for example, when a large table or a dataset in a frequently used table needs to be repeatedly referenced. However, for one-time events, it is better to use an export table.

k. When creating a new temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid causing a large number of logs to improve the speed; if the amount of data is not large, in order to ease the resources of the system table, you should first create table, then insert.

l. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure, first truncate table, and then drop table, which can avoid long-term locking of system tables.

2) SQL statement aspects:

a. Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up using the index and perform a full table scan.

b. Try to avoid using or to join conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as: select id from t where num=10 or num=20 You can query like this: select id from t where num=10 union all select id from t where num=20

c. In and not in should also be used with caution, otherwise it will lead to a full table scan, such as: select id from t where num in(1,2,3) For consecutive values, if you can use between, do not use in: select id from t where num between 1 and 3

d. The following query will also result in a full table scan: select id from t where name like '%abc%'

e. If a parameter is used in the where clause, it also causes a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer the choice of an access plan to runtime; it must choose it at compile time. However, if the access plan is built at compile time, the value of the variable is unknown and cannot be used as an input for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num can be changed to force the query to use the index: select id from t with(index(index name)) where num=@num

f. The expression operation on the field in the where clause should be avoided as much as possible, which will cause the engine to give up the use of the index and perform a full table scan. For example: select id from t where num/2=100 should be changed to: select id from t where num=100*2

g. You should try to avoid functional operations on fields in the where clause, which will cause the engine to give up using indexes and perform full table scans. For example: select id from t where substring(name,1,3)='abc'--name starts with abc id select id from t where datediff(day,createdate,'2005-11-30')=0-'2005 The id generated by -11-30' should be changed to: select id from t where name like 'abc%' select id from t where createdate>='2005-11-30' and createdate<'2005-12-1'

h. Do not perform functions, arithmetic operations, or other expression operations to the left of the "=" in the where clause, otherwise the system may not use the index correctly.

i. Don't write some meaningless queries. For example, you need to generate an empty table structure: select col1,col2 into #t from t where 1=0 This kind of code will not return any result set, but it will consume system resources and should be changed Like this: create table #t(…)

j. Substituting exists for in is often a good choice: select num from a where num in(select num from b) Replace with the following statement: select num from a where exists(select 1 from b where num=a.num )

k. Do not use select * from t anywhere, replace "*" with a list of specific fields, and do not return any fields that are not used.

l. Try to avoid using the cursor, because the efficiency of the cursor is poor, if the data operated by the cursor exceeds 10,000 rows, then you should consider rewriting.

m. Try to avoid returning a large amount of data to the client . If the amount of data is too large, you should consider whether the corresponding demand is reasonable.

n. Try to avoid large transaction operations to improve system concurrency.

3) Java side: key content

a. Create as few objects as possible.

b. Reasonably correct the position of the system design. A large number of data operations, and a small amount of data operations must be separated. A large number of data operations are definitely not handled by the ORM framework. ,

c. Use jDBC to link the database to manipulate data

d. Control the memory and let the data flow, instead of reading all the memory and processing it, but processing it while reading;

e. Rational use of memory, some data should be cached

How to optimize the database, how to improve the performance of the database?

answer:

1) The hardware adjustment performance is most likely to affect the performance of the disk and network throughput. The solution is to expand the virtual memory and ensure that there is enough space for expansion; turn off unnecessary services on the database server; Separate servers; maximize throughput of SQL database servers; run SQL on machines with more than one processor.

2) Adjust the database

If the query frequency of the table is relatively high, build an index; when building an index, try to do all the query and search operations on the table, build an index according to the where selection condition, and try to build an integer key with one and only one cluster index. , the data is physically on the data pages in order, shortens the search range, and establishes a non-clustered index for all columns that are often used in the query, which can cover the query to the greatest extent; The overhead of maintaining these indexes increases dramatically; avoid having too many index keys in the index; avoid column indexes with large data types; guarantee a few rows per index key value.

3) Use a stored procedure

In the implementation process of the application program, the operation of the database that can be realized by using the stored procedure should be realized through the stored procedure as much as possible, because the stored procedure is one-time designed, coded, tested and reused on the database server, and needs to be executed. The application of this task can simply execute the stored procedure and only return the result set or value, which can not only make the program modular, but also improve the response speed, reduce network traffic, and accept input through input parameters, so that the logic in the application can be completed. Consistent implementation.

4) Application structure and algorithm

Establishing the query condition index is only a prerequisite for improving the speed, and the improvement of the response speed also depends on the use of the index. because people are

When using SQL, there is often a misunderstanding, that is, too much attention is paid to whether the obtained results are correct, especially when operating on a database with a small amount of data, whether to build an index and use an index is not good for the response speed of the program. Therefore, programmers ignore the possible performance differences between different implementation methods when writing programs. This performance difference occurs when the amount of data is particularly large or in large or complex database environments (such as online transaction processing OLTP or decision-making). This is particularly evident in the support system DSS). In work practice, it is found that bad SQL often comes from inappropriate index design, insufficient join conditions and unoptimizable where clauses. After proper optimization of them, their running speed has been significantly improved!

Original address: http://blog.csdn.NET/xlgen157387/article/details/44156679

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326274525&siteId=291194637