sql efficiency problem

1. Regarding the efficiency of SQL query, 100w data, the query only takes 1 second, share with you:
machine situation
p4: 2.4
memory: 1 G
os: windows 2003
database: ms sql server 2000
Purpose: query performance test, compare the performance of two queries

SQL query efficiency step by step

-- setp 1.
-- create table
create table t_userinfo
(
userid int identity(1,1) primary key nonclustered,
nick varchar(50) not null default '',
classid int not null default 0,
writetime datetime not null default getdate()
)
go

--
create clustered index create clustered index ix_userinfo_classid on t_userinfo(classid)
go

-- step 2.

declare @i int declare @k
int
declare @nick varchar(10)
set @i = 1
while @i<1000000
begin
set @k = @i % 10
set @nick = convert(varchar,@i)
insert into t_userinfo(nick,classid,writetime) values(@nick,@k,getdate())
set @i = @i + 1
end
-- 耗时 08:27 ,需要耐心等待

-- step 3.
select top 20 userid,nick,classid,writetime from t_userinfo
where userid not in
(
select top 900000 userid from t_userinfo order by userid asc
)

-- 耗时 8 秒 ,够长的

-- step 4.
select a.userid,b.nick,b.classid,b.writetime from
(
select top 20 a.userid from
(
select top 900020 userid from t_userinfo order by userid asc
) a order by a.userid desc
) a inner join t_userinfo b on a.userid = b.userid
order by a.userid asc

-- 耗时 1 秒,太快了吧,不可以思议

-- step 5 where 查询
select top 20 userid,nick,classid,writetime from t_userinfo
where classid = 1 and userid not in
(
select top 90000 userid from t_userinfo
where classid = 1
order by userid asc
)
-- 耗时 2 秒

-- step 6 where 查询
select a.userid,b.nick,b.classid,b.writetime from
(
select top 20 a.userid from
(
select top 90000 userid from t_userinfo
where classid = 1
order by userid asc
) a order by a.userid desc
) a inner join t_userinfo b on a.userid = b.userid
order by a.userid asc

-- Query Analyzer shows less than 1 second.


Query efficiency analysis:
Subqueries to ensure that duplicate values ​​are eliminated, must be Results both handle nested queries. In this case, consider using a join query instead.
If you want to use a subquery, replace IN with EXISTS and NOT IN with NOT EXISTS. Because the subquery introduced by EXISTS only tests whether there are rows that meet the specified conditions in the subquery, it is more efficient. In either case, NOT IN is the least efficient. Because it performs a full table traversal of the table in the subquery.

Build a reasonable index, avoid scanning redundant data, and avoid table scanning!
Several million pieces of data can still be queried in tens of milliseconds.
2.
SQL improves query efficiency
2008-05-12 21:20
1. To optimize the query, you should try to avoid full table scans. First of all, you should consider where and order by index on the column.

2. Try to avoid the null value judgment of the field in the where clause, otherwise the engine will give up the use of the index and perform a full table scan, such as:
select id from t where num is null
You can set the default value of 0 on num to ensure that There is no null value in the num column in the table, and then query like this:
select id from t where num=0

3. Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up the use of the index and perform a full table scan .

4. Try to avoid using or to join conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as:
select id from t where num=10 or num=20
You can query like this:
select id from t where num=10
union all
select id from t where num=20

5. In and not in should also be used with caution, otherwise it will result in a full table scan, such as:
select id from t where num in(1,2,3)
For For consecutive values, you can use between instead of in:
select id from t where num between 1 and 3

6. The following query will also cause a full table scan:
select id from t where name like '%abc%'
To improve Efficiency, full-text search can be considered.

7. If a parameter is used in the where clause, it will also cause a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer the choice of an access plan to runtime; it must choose it at compile time. However, if the access plan is built at compile time, the value of the variable is unknown and cannot be used as an input for index selection. For example, the following statement will perform a full table scan:
select id from t where num=@num
can be changed to force the query to use the index:
select id from t with(index(index name)) where num=@num

8. The expression operation on the field in the where clause should be avoided as much as possible, which will cause the engine to give up the use of the index and perform a full table scan. For example:
select id from t where num/2=100
should be changed to:
select id from t where num=100*2

9. You should try to avoid performing functional operations on fields in the where clause, which will cause the engine to abandon the use of indexes instead of Do a full table scan. Such as:
select id from t where substring(name,1,3)='abc'-- id whose name starts with abc
select id from t where datediff(day,createdate,'2005-11-30')=0-- The id generated by '2005-11-30'
should be changed to:
select id from t where name like 'abc%'
select id from t where createdate>='2005-11-30' and createdate<'2005-12-1'

10. Do not perform functions, arithmetic operations or other expression operations on the left side of the "=" in the where clause, otherwise the system may not be able to use the index correctly.

11. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index, otherwise the index will not be used and should be used. As much as possible, make the field order consistent with the index order.

12. Don't write some meaningless queries, such as generating an empty table structure:
select col1,
This kind of code will not return any result set, but it will consume system resources. It should be changed to this:
create table #t(...)

13. Many times it is a good choice to use exists instead of in:
select num from a where Replace num in(select num from b)
with the following statement:
select num from a where exists(select 1 from b where num=a.num)

14. Not all indexes are valid for queries, SQL is based on the data in the table For query optimization, when a large amount of data is repeated in the index column, the SQL query may not use the index. For example, if a table has fields sex, male, and female are almost half, even if an index is built on sex, it will not affect the query efficiency. doesn't work.

15. The more indexes the better, the index can certainly improve the efficiency of the corresponding select, but it also reduces the efficiency of insert and update, because the index may be rebuilt during insert or update, so how to build an index needs to be carefully considered. As the case may be. The number of indexes in a table should not exceed 6. If there are too many indexes, you should consider whether it is necessary to build indexes on some infrequently used columns.

16. Avoid updating the clustered index data column as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the value of this column changes, the order of the entire table records will be adjusted, which will consume considerable resources. If the application system needs to update the clustered index data column frequently, it needs to consider whether the index should be built as a clustered index.

17. Try to use numeric fields. If the fields only contain numeric information, try not to design them as character fields, which will reduce the performance of query and connection and increase the storage overhead. This is because the engine compares each character of the string one by one when processing queries and joins, whereas only one comparison is required for numbers.

18. Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, the storage space of variable-length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.

19. Do not use select * from t anywhere, replace "*" with a list of specific fields, and do not return any fields that are not used.

20. Try to use table variables instead of temporary tables. If the table variable contains a lot of data, be aware that the indexes are very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable, and their proper use can make certain routines more efficient, for example, when a large table or a dataset in a frequently used table needs to be repeatedly referenced. However, for one-time events, it is better to use an export table.

23. When creating a new temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid causing a large number of logs to improve the speed; if the amount of data is not large, in order to ease the resources of the system table, you should first create table, then insert.

24. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure, first truncate table, and then drop table, which can avoid long-term locking of system tables.

25. Try to avoid using the cursor, because the efficiency of the cursor is poor, if the data operated by the cursor exceeds 10,000 rows, then you should consider rewriting.

26. Before using the cursor-based method or the temporary table method, you should look for a set-based solution to solve the problem, and the set-based method is usually more efficient.

27. Like temporary tables, cursors are not unavailable. Using FAST_FORWARD cursors on small datasets is often preferable to other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. Routines that include "totals" in the result set are usually faster than using cursors. If development time allows, try both the cursor-based approach and the set-based approach to see which one works better.

28. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of stored procedures and triggers is executed.

29. Try to avoid large transaction operations and improve system concurrency capabilities.

30. Try to avoid returning a large amount of data to the client. If the amount of data is too large, you should consider whether the corresponding requirements are reasonable

1. Avoid setting the field to "allow to be empty"
2. The data table design should be standardized
3. In-depth analysis of data operations Operations on the database
4. Try not to use temporary tables
5. Use transactions
6. Try not to use cursors
7. Avoid deadlocks
8. Pay attention to the use of read-write locks
9. Do not open large data sets
10. Do not use servers End cursor
11. Use a database with a large amount of data when coding the program
12. Don't create an index for the "gender" column
13. Pay attention to the timeout problem
14. Don't use Select *
15. When inserting records in the detail table, do not execute Select MAX(ID) in the main table.
16. Try not to use the TEXT data type
. 17. Use parameter query 18. Do not use
Insert to import large amounts of data
.
Referential integrity
21. Use INNER JOIN and LEFT JOIN instead of Where


to improve SQL query efficiency (points and skills):
· Skill 1:
Problem type: When the ACCESS database field contains Japanese katakana or other unknown characters, the query will prompt memory overflow.
Solution: Modify the query statement
sql="select * from tablename where column like '%"&word&"%'"
to
sql="select * from tablename"
rs.filter = " column like '%"&word&"%'"
= ===================================================== ========
Skill 2:
Question Type: How to implement a multi-keyword query similar to Baidu in a simple way (multi-keywords are separated by spaces or other symbols).
Solution:
'//Split the query string with spaces
ck=split(word," "


sql="select * tablename where"
query in a field
For i = 0 To sck
SQL = SQL & tempJoinWord & "(" & _
"column like '"&ck(i)&"%')"
tempJoinWord = " and "
Next
Query both fields at the same time
For i = 0 To sck
SQL = SQL & tempJoinWord & "(" & _
"column like '"&ck(i)&"%' or " & _
"column1 like '"&ck(i )&"%')"
tempJoinWord = " and "
Next
======================================= =======================
Trick 3: Several techniques to greatly improve query efficiency

1. Try not to use or, using or will cause a full table scan, which will greatly Reduce query efficiency.
2.


(referring to the sqlserver database)
4. The difference between '%"&word&"%' and '"&word&"%' when querying:
For example, your field content is a vulnerable woman
'%"&word&"%': it will match all Strings, whether looking for "injured" or "one", will display the results.
'"&word&"%' : Only wildcard the preceding string. For example, checking "injured" has no result. Only checking "one" will display the result.
5. Field extraction should follow the principle of "how much is needed and how much to mention", avoid "select *", and try to use "select field 1, field 2, field 3........". Practice has proved that each time one less field is extracted, the speed of data extraction will be improved accordingly. The speed of improvement also depends on the size of the fields you discard.
6. order by is the most efficient to sort by the clustered index column. A sqlserver data table can only create one clustered index, which is generally ID by default, and can also be changed to other fields.
7. Build appropriate indexes for your tables, and building indexes can increase your query speed by tens of hundreds of times.
(Referring to sqlserver database) · The following is an analysis of query efficiency with and without indexing:
Sqlserver index and query efficiency analysis.
Table News
Field
Id: Automatic Number
Title: Article Title
Author: Author
Content: Content
Star: Priority
Addtime: Time
Record : 1 million
Test Machine: P4 2.8/1G Memory/IDE Hard Disk
===================================================== =====
Scheme 1:
Primary key Id, clustered index by default, no other non-clustered indexes are created
select * from News where Title like '%"&word&"%' or Author like '%"&word&"%' order by Id desc
Fuzzy retrieval from the fields Title and Author, sorted by Id
Query time: 50 seconds
================================= =======================
Scheme 2:
Primary key Id, the default is a clustered index Create a non-clustered index
on Title, Author, Star
select * from News where Title like '"&word&"%' or Author like '"&word&"%' order by Id desc
Fuzzy retrieval from the fields Title and Author, sorted by Id
Query time: 2 - 2.5 seconds
=================================================== =======
Scheme 3:
Primary key Id, the default is a clustered index Create a non-clustered index
on Title, Author, Star
select * from News where Title like '"&word&"%' or Author like '"&word&"%' order by Star desc
Fuzzy retrieval from fields Title and Author, sorted by Star
Query time: 2 seconds
======= =======================================================================================================================
_ :
Primary key Id, clustered index by default Create a non-clustered index
on Title, Author, Star
select * from News where Title like '"&word&"%' or Author like '"&word&"%'
Fuzzy retrieval from the fields Title and Author, Unsorted
Query time: 1.8 - 2 seconds
=========================================== =============
Scheme 5:
Primary key Id, clustered index by default Create a non-clustered index
on Title, Author, Star
select * from News where Title like '"&word&"%'
Or
select * from News where Author like '"&word&"%'
is retrieved from the fields Title or Author, without sorting
Query time: 1 second
· How to improve the query efficiency of SQL language?
Q: How can I improve the query efficiency of SQL language?
A: This has to start from the beginning:
   Since SQL is a result-oriented rather than a procedure-oriented query language, large relational databases that generally support the SQL language use a query cost-based optimizer to provide an optimal execution for immediate queries. Strategy. For the optimizer, the input is a query statement and the output is an execution strategy.
    A SQL query statement can have multiple execution strategies, and the optimizer will estimate the so-called lowest cost method that takes the least time among all the execution methods. All optimizations are based on the where clause in the query statement used in the memory, and the optimizer mainly uses the search parameter (Serach Argument) for the optimization in the where clause.
    The core idea of ​​search parameters is that the database uses the index of the fields in the table to query the data, instead of directly querying the data in the record.
    Conditional statements with operators such as =, <, <=, >, >= can use indexes directly, such as the following are search parameters:
    emp_id = "10001" or salary > 3000 or a =1 and c = 7
    and the following Not a search parameter:
    salary = emp_salary or dep_id != 10 or salary * 12 >= 3000 or a=1 or c=7
    Some redundant search parameters should be provided as much as possible to give the optimizer more choices. Please see the following 3 methods:
    The first method:
    select employee.emp_name,department.dep_name from department,employee where (employee.dep_id = department.dep_id) and (department.dep_code="01") and (employee.dep_code="01");
    它的搜索分析结果如下:
    Estimate 2 I/O operations
    Scan department using primary key
    for rows where dep_code equals "01"
    Estimate getting here 1 times
    Scan employee sequentially
    Estimate getting here 5 times
    第二种方法:
    select employee.emp_name,department.dep_name from department,employee where (employee.dep_id = department.dep_id) and (department.dep_code="01");
    它的搜索分析结果如下:
    Estimate 2 I/O operations
    Scan department using primary key
    for rows where dep_code equals "01"
    Estimate getting here 1 times
    Scan employee sequentially
    Estimate getting here 5 times
    The first method runs the same as the second one, but the first method is the best as it gives the optimizer more choice opportunities.
    The third method:
    select employee.emp_name,department.dep_name from department,employee where (employee.dep_id = department.dep_id) and (employee.dep_code="01");
    This method is the worst because it cannot use indexes , that is, it cannot be optimized...
When using SQL statements, you should pay attention to the following points:
    1. Avoid using incompatible data types. For example, Float and Integer, Char and Varchar, Binary and Long Binary are not compatible. Data type incompatibilities may prevent the optimizer from performing some optimizations that it could. For example:
    select emp_name form employee where salary > 3000;
    In this statement, if salary is of type Float, it is difficult for the optimizer to optimize it, because 3000 is an integer, we should use 3000.0 in programming instead of waiting for runtime to let DBMS for conversion.
    2. Try not to use an expression, because it cannot be obtained when compiling, so SQL can only use its average density to estimate the number of records to be hit.
    3. Avoid using other mathematical operators for search parameters. Such as:
       select emp_name from employee where salary * 12 > 3000;
       should be changed to:
       select emp_name from employee where salary > 250;
    4. Avoid using operators such as != or <>, because it will make the system unable to use the index, Instead, you can only directly search the data in the table.
· Application in ORACAL
A 16 million data table - SMS upstream table TBL_SMS_MO
structure:
CREATE TABLE TBL_SMS_MO
(
SMS_ID NUMBER,
MO_ID VARCHAR2(50),
MOBILE VARCHAR2(11),
SPNUMBER VARCHAR2(20),
MESSAGE VARCHAR2(150),
TRADE_CODE VARCHAR2(20),
LINK_ID VARCHAR2(50),
GATEWAY_ID NUMBER,
GATEWAY_PORT NUMBER,
MO_TIME DATE DEFAULT SYSDATE
);
CREATE INDEX IDX_MO_DATE ON TBL_SMS_MO (MO_TIME)   PCTFREE   10     INITRANS
  2   MAXTRANS
  255   STORAGE
  (
  INITIAL
  1M
    NEXT
    1M     MINEXTENTS
    1
    MAXEXTENTS UNLIMITED
    PCTINCREASE 0
  );
CREATE INDEX IDX_MO_MOBILE ON TBL_SMS_MO (MOBILE)
  PCTFREE 10     MINEXTENTS 1     MAXEXTENTS UNLIMITED     PCTINCREASE 0   );   Question: Query short messages sent by a mobile phone within a certain period of time from the table, the following SQL statement: SELECT MOBILE,MESSAGE,TRADE_CODE,MO_TIME












FROM TBL_SMS_MO
WHERE MOBILE='130XXXXXXXX'
AND MO_TIME BETWEEN TO_DATE('2006-04-01','YYYY-MM-DD HH24:MI:SS') AND TO_DATE('2006-04-07','YYYY-MM- DD HH24:MI:SS')
ORDER BY MO_TIME DESC
takes about 10 minutes to return results, which is unbearable for web queries.
Analysis:
In PL/SQL Developer, click the "Explain Plan" button (or the F5 key) to analyze the SQL and find that the default index used is IDX_MO_DATE. The problem may be here, because compared to the total number of 16 million data, the mobile data is very small, and it is easier to lock the data if IDX_MO_MOBILE is used.
Optimize as follows:
SELECT /*+ index(TBL_SMS_MO IDX_MO_MOBILE) */ MOBILE,MESSAGE,TRADE_CODE,MO_TIME
FROM TBL_SMS_MO
WHERE MOBILE='130XXXXXXXX'
AND MO_TIME BETWEEN TO_DATE('2006-04-01','YYYY-MM-DD HH24: MI:SS') AND TO_DATE('2006-04-07','YYYY-MM-DD HH24:MI:SS'


Press F8 to run this SQL, wow~... 2.360s, that's the difference.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326352047&siteId=291194637