How to optimize SQL big data query? sqlserver and oracle finishing

How to optimize SQL big data query? sort out

Optimization of SQL_where conditions

In principle, most databases process conditions in order from left to right, putting the conditions that can filter more data in the front, and the conditions with less filtering in the back

SQL1: select * from employee

            where salary >1000 ---condition 1, filter data less

                 and dept_id='01' - condition 2, filtered data more than condition 1

The above SQL does not meet our principles. More conditions for filtering data should be placed first, so it is better to change to the following

             select * from employee

              where dept_id='01' --The conditions for filtering more data are placed first

                  and   salary > 1000

In relational databases, in addition to optimizing the physical design of the database and relational standardization, a simple, direct and effective method is to adjust the SQL statement to reduce the amount of calculation and memory requirements, and improve the response speed. 
  a. Perform multiple selection operations on the same table.   The arrangement order of the selection conditions has a greater impact on performance, because it not only affects the index selection, but also relates to the size of the temporary table. Now take the following query statement as an example:   select * from customer   where city='beijing' and fname='li'   If there are 1 million records in the table, where city=100,000 of'beijing', fname='li 'Is 20,000, of which city='beijing' is 2,000. In SQL Server, the selection of query conditions is used from left to right. Therefore, the execution of the first condition results in a temporary table of 100,000 rows. , And then choose from them to get the final result. If you change the selection criteria to where fname='li' and city='beijing', you will first get a 20,000-row temporary table, and then get the same result. It can be seen that the selection of selection conditions greatly affects the amount of calculation of the query statement. Therefore, to improve the response speed of the query, you can write the stricter conditions in the front and the weaker conditions in the back. 
 
 
 

Collection of 30

1. To optimize the query, you should try to avoid full table scans, first consider where and order by 

Establish a search on the involved columns  
2. Try to avoid the null value judgment of the field in the where clause, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, such as: quote.
select id from t where num is null
can set the default value of 0 on num to ensure that the num column in the table does not have a null value, and then query like this:
select id from t where num=0
3. Try to avoid
using it in the where  clause! = Or <> operator, otherwise the engine will give up using the index and perform a full table scan.
4. Try to avoid using or in the where clause 
to connect conditions, otherwise it will cause the engine to give up using the index and perform a full table scan, such as:
select id from t where num=10 or num=20
can be queried like this:
select id from the WHERE NUM = 10 t
of Union All
the SELECT t the WHERE NUM from the above mentioned id = 20
5.In should be used with caution and not in, otherwise it will lead to a full table scan, such as:
the SELECT NUM in the above mentioned id from the WHERE t (, 2, 3)
for For continuous values, do not use in if you can use between:
select id from t where num between 1 and 3
6. The following query will also cause a full table scan:
select id from t where name like'%abc%'
To improve efficiency, you can consider full-text search.
7. If you
use parameters in the where  clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of access plan until runtime; it must choose at compile time. However, if the access plan is created at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan:
select id from t where num=@num
can be changed to force the query to use an index:
select id from t with(index (index name)) where num=@num
8. Try to avoid where num=@num Perform expression operations on the fields in the clause, which will cause the engine to abandon the use of the index and perform a full table scan. Such as:
select id from t where num/2=100
should be changed to:
select id from t where num=100*2
9. Should try to avoid performing functional operations on the fields in the where clause, which will cause the engine to abandon the use of indexes and Perform a full table scan. Such as:
select id from t where 
substring(name,1,3)='abc'--id whose name starts with abc
select id from t where 
datediff(day,createdate,'2005-11-30')=0--The id generated by '2005-11-30'
should be changed to:
select id from t where name
like'abc %' select id from t where createdate> ='2005-11-30' 
and createdate<'2005-12-1'
10. Do not
perform functions, arithmetic operations or other expression operations on the left side of the "=" in the where  clause, otherwise the system may not be able to use the index correctly .
11. When using an index field as a condition, if the index is a compound index, then the first field in the index must be used as a condition to ensure that the system uses the index, otherwise the index will not be used and should Try to make the field order consistent with the index order.
12. Do not write some meaningless queries, such as the need to generate an empty table structure:
select col1,col2 into #t from t where 1=0
This type of code will not return any result set, but will consume system resources, should be changed Into this:
create table #t(...)
13.



num=a.num)
14. Not all indexes are effective for queries. SQL is optimized for queries based on the data in the table. When a large amount of data is repeated in the index column, the SQL query may not use the index, such as in a table There are almost half of the fields sex, male and female, so even if an index is built on sex, it will not play a role in query efficiency.
15. Indexes are not as many as possible. While indexing can improve the efficiency of the corresponding select, it also reduces the
efficiency of insert and  update, because the
index may be rebuilt during insert or update  , so how to build an index needs to be carefully considered. As the case may be. The number of indexes of a table should not exceed 6, if there are too many, you should consider whether it is necessary to build indexes on columns that are not frequently used.
16. Avoid updating clustered index data columns as much as possible, because
the order of clustered  index data columns is the physical storage order of table records. Once the column value changes, it will cause the adjustment of the order of the entire table records, which will consume considerable resources. If the application system needs to frequently update the clustered 
index data columns, then you need to consider whether the index should be built as a clustered index.
17. Use numeric fields as much as possible. If fields that only contain numerical information, try not to design them as character types. This will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in the string one by one when processing queries and concatenations. For numeric types, only one comparison is sufficient.
18. Use varchar/nvarchar instead of char/nchar as much as possible 
, because the storage space of variable-length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.
19. Do not use select * from t anywhere 
, replace "*" with a specific field list, and do not return any fields that are not used.
20. Try to use table variables instead of temporary tables. If the table variable contains a lot of data, please note that the index is very limited (only the primary key index).
21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
22. Temporary tables are not unusable, using them appropriately can make certain routines more effective, for example, when you need to repeatedly refer to a large table or a data set in a commonly used table. However, for one-time events, it is better to use an export table.
23. When
creating a new temporary table, if you insert a large amount of data at one time, you can use select into instead of create  table to avoid a large amount of log and increase the speed; if the amount of data is not large, in order to ease the resources of the system table, you should first create table, then insert.
24. If a temporary table is used, all temporary tables must be explicitly deleted at the end of the stored procedure, first truncate table 
, and then drop table, so that a long time lock of the system table can be avoided.
25. Try to avoid using cursors, because the efficiency of cursors is poor. If the data operated by the cursor exceeds 10,000 rows, then you should consider rewriting.
26. Before using the cursor-based method or the temporary table method, you should first find a set-based solution to solve the problem. The set-based method is usually more effective.
27. Like temporary tables, cursors are not unusable. Use FAST_FORWARD for small data sets 
Cursors are usually superior to other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. Routines that include "total" in the result set usually execute faster than using cursors. If development time permits, you can try both the cursor-based method and the set-based method to see which method works better.
28. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set SET NOCOUNT OFF at the end 
. There is no need to send a DONE_IN_PROC message to the client after executing each statement of the stored procedure and trigger.
29. Try to avoid large transaction operations and improve system concurrency.

30. Try to avoid returning a large amount of data to the client. If the amount of data is too large, you should consider whether the corresponding demand is reasonable.


Oracle benign SQL recommendations

(1) Choose the most efficient table name order (only valid in rule-based optimizers): 
Oracle's parser processes the table names in the FROM clause in a right-to-left order, and the FROM clause is written at the end The table (basic table driving table) will be processed first. In the case of multiple tables in the FROM clause, you must choose the table with the least number of records as the basic table. If there are more than 3 tables to join the query, then you need to choose the intersection table as the basic table. The cross table refers to the table that is referenced by other tables.
(2) The connection order in the WHERE clause: 
Oracle uses the bottom-up order to parse the WHERE clause. According to this principle, the connection between tables must be written before other WHERE conditions, and those conditions that can filter out the maximum number of records Must be written at the end of the WHERE clause. China. Country. Webmaster
(3) Avoid using'*' in the SELECT clause:
During the parsing process, Oracle will convert'*' into all column names in turn. This work is done by querying the data dictionary. This means more time will be spent.
(4) Reduce the number of access to the database:
Oracle has performed a lot of work internally: parsing SQL statements, estimating index utilization, binding variables, reading data blocks, etc.
(5) Resetting the ARRAYSIZE parameter in SQL*Plus, SQL*Forms and Pro*C can increase the amount of retrieved data for each database access. The recommended value is 200.
(6) Use the DECODE function to reduce processing time: 
Use the DECODE function to avoid repeated scanning of the same record or repeated connection to the same table. 
(7) Simple integration, unrelated database access:
If you have several simple database query statements, you can integrate them into one query (even if there is no relationship between them).
(8) Delete duplicate records: The 
most efficient way to delete duplicate records (because ROWID is used) Example: 
DELETE FROM EMP E WHERE E.ROWID> (SELECT MIN(X.ROWID) FROM EMP X WHERE X.EMP_NO = E.EMP_NO );
(9) Replace DELETE with TRUNCATE: 
When deleting records in a table, under normal circumstances, rollback segments are used to store information that can be restored. If you do not have a COMMIT transaction, ORACLE will restore the data To the state before the deletion (precisely restore to the state before the execution of the delete command) and when using TRUNCATE, the rollback segment no longer stores any information that can be restored. When the command is run, the data cannot be restored. Therefore, few resources are called and the execution time is short. (TRUNCATE is only applicable to delete the entire table, TRUNCATE is DDL not DML). Chinaz_com
(10) Use COMMIT
as much as possible : Whenever possible, use COMMIT as much as possible in the program, so that the performance of the program will be improved, and the demand will be reduced because of the resources released by COMMIT. The resources released by COMMIT:
a. Rollback Information used to restore data on the segment.
b. The lock acquired by the program statement.
c. The space in the redo log buffer. Chinaz@com
d. Oracle manages the internal expenses of the above three resources.
(11) Replace the HAVING clause with the Where clause: 
Avoid using the HAVING clause, HAVING will only filter the result set after all the records have been retrieved. This processing requires sorting, totaling and other operations. If you can limit the number of records through the WHERE clause, it can reduce this overhead. (Non-oracle) Among the three clauses where conditions can be added, on, where, and having, on is the first to be executed, where is the second, and having last, because on is the first to filter the records that do not meet the conditions before they are counted. , It can reduce the data to be processed in the intermediate operation. It is reasonable to say that the speed should be the fastest, and where should be faster than having, because it performs sum after filtering the data, and only uses on when the two tables are connected. So in the case of a table, where is left to compare with having. In the case of single-table query statistics, if the conditions to be filtered do not involve the fields to be calculated, then their results are the same, but where can use rushmore technology, but having not, the latter is slower in terms of speed. When it comes to the calculated field, it means that the value of this field is uncertain before the calculation. According to the work flow written in the previous article, the action time of where is completed before the calculation, and having is only after the calculation. Works, so in this case, the results of the two will be different. In the multi-table join query, on works earlier than where. The system first synthesizes multiple tables into a temporary table according to the connection conditions between the tables, and then filters by where, then calculates, and then filters by having after the calculation. It can be seen that in order for the filter condition to play a correct role, you must first understand when the condition should work, and then decide where to put it.
(12) Reduce the query to the table: 
In the SQL statement containing the subquery, special attention should be paid to reducing the query to the table. Example: Chinaz
SELECT TAB_NAME FROM TABLES WHERE (TAB_NAME,DB_VER) = (SELECTTAB_NAME,DB_VER FROM TAB_COLUMNS WHERE VERSION = 604)
(13) Improve SQL efficiency through internal functions:
Complex SQL often sacrifices execution efficiency. Being able to master the above methods of using functions to solve problems is very meaningful in actual work. 
(14) Use table alias (Alias): 
When connecting multiple tables in a SQL statement, please use the table alias and prefix the alias on each Column. In this way, you can reduce the parsing time and reduce the grammatical errors caused by Column ambiguity. Chinaz_com
(15) Replace IN with EXISTS and NOT IN with NOT EXISTS:
In many basic table-based queries, in order to meet a condition, it is often necessary to join another table. In this case, using EXISTS (or NOT EXISTS) will usually improve the efficiency of the query. In the subquery, the NOT IN clause will perform an internal sort and merge. In either case, NOT IN is the least efficient (because it performs a full table traversal on the table in the subquery). In order to avoid using NOT IN, we can rewrite it as Outer Joins or NOT EXISTS.
Example: 
(high efficiency) SELECT * FROM EMP (base table) WHERE EMPNO> 0 AND EXISTS (SELECT'X' FROM DEPT WHERE DEPT.DEPTNO = EMP.DEPTNO AND LOC ='MELB') (low efficiency) SELECT * FROM EMP ( Basic table) WHERE EMPNO> 0 AND DEPTNO IN (SELECT DEPTNO FROM DEPT WHERE LOC ='MELB')
(16) Identify SQL statements that are'inefficiently executed':
Although there are currently an endless stream of graphical tools for SQL optimization, write It is always the best way to develop your own SQL tool to solve the problem: 
SELECT EXECUTIONS, DISK_READS, BUFFER_GETS, ROUND((BUFFER_GETS-DISK_READS)/BUFFER_GETS,2) Hit_radio, ROUND(DISK_READS/EXECUTIONS,2) Reads_per_run, SQL_TEXT FROM V$SQLAREA WHERGETSE EXECUTION_FFER_READ> 0 ANDDISK_GETS AND /BUFFER_GETS <0.8 ORDER BY 4 DESC;
Index is a conceptual part of the table, used to improve the efficiency of data retrieval, Oracle uses a complex self-balancing B-tree structure. Generally, querying data through an index is faster than a full table scan. When Oracle finds the best path to execute queries and Update statements, the Oracle optimizer will use indexes. Similarly, using indexes when joining multiple tables can also improve efficiency. Another advantage of using an index is that it provides the uniqueness verification of the primary key. With those LONG or LONG RAW data types, you can index almost all columns. Generally, using indexes in large tables is particularly effective. Of course, you will also find that using indexes can also improve efficiency when scanning small tables. Although the use of indexes can improve query efficiency, we must also pay attention to its cost. The index needs space for storage and regular maintenance. Whenever a record is added or decreased in the table or the index column is modified, the index itself will also be modified. This means that the INSERT, DELETE, and UPDATE of each record will cost 4 or 5 more disk I/Os. Because indexes require additional storage space and processing, unnecessary indexes will slow down query response time. Regular index reconstruction is necessary:
ALTER INDEX <INDEXNAME> REBUILD <TABLESPACENAME>
(18) Replace DISTINCT with EXISTS:
When submitting a query that contains one-to-many table information (such as department tables and employee tables), avoid using DISTINCT in the SELECT clause. Generally, you can consider replacing with EXIST. EXISTS makes the query faster, because the RDBMS core module will return the result immediately once the conditions of the subquery are met. Example:
(Inefficient): SELECT DISTINCT DEPT_NO,DEPT_NAME FROM DEPT D, EMP E WHERE D.DEPT_NO = E.DEPT_NO (High efficiency): SELECT DEPT_NO,DEPT_NAME FROM DEPT D WHERE EXISTS (SELECT'X' FROM EMP E WHERE E. DEPT_NO = D.DEPT_NO);
(19) SQL statements are in uppercase; because Oracle always parses SQL statements first, converts lowercase letters into uppercase and then executes them. 
(20) Use the connector "+" to connect strings as little as possible in Java code.
(21) Avoid using NOT on indexed columns. Usually, we want to avoid using NOT on indexed columns. NOT will have the same effect as using functions on indexed columns. When Oracle "encounters" NOT, he will stop using the index and perform a full table scan.
(22) Avoid using calculations on index columns. In the WHERE clause, if the index column is part of the function. The optimizer will not use indexes but use full table scans.
Low efficiency: SELECT… FROM DEPT WHERE SAL * 12> 25000; High efficiency: SELECT… FROM DEPT WHERE SAL> 25000/12;
(23) Replace with >=>:
High efficiency: SELECT * FROM EMP WHERE DEPTNO >=4 Low efficiency: SELECT * FROM EMP WHERE DEPTNO >3
The difference between the two is that the former DBMS will directly jump to the first record with DEPT equal to 4 and the latter will first locate the record with DEPTNO=3 and scan forward to the first record with DEPT greater than 3.
(24) Replace OR with UNION (applicable to index columns):
Under normal circumstances, replacing OR in the WHERE clause with UNION will have a better effect. Using OR on the index column will cause a full table scan. Note that the above rules are only valid for multiple index columns. If there are columns that are not indexed, query efficiency may be reduced because you did not choose OR. In the example below, there are indexes on both LOC_ID and REGION.
Efficient: SELECT LOC_ID. LOC_DESC, REGION FROM LOCATION WHERE LOC_ID = 10 UNION SELECT LOC_ID, LOC_DESC, REGION FROM LOCATION WHERE REGION = "MELBOURNE" is 
inefficient: SELECT LOC_ID, LOC_DESC, REGION FROM LOCATION WHERE LOC_ID = 10 with REGION = "
25 " To replace OR: 
This is a simple and easy-to-remember rule, but the actual execution effect needs to be tested. Under Oracle8i, the execution path of the two seems to be the same: 
Inefficient:
SELECT... FROM LOCATION WHERE LOC_ID = 10 OR LOC_ID = 20 OR LOC_ID = 30
High efficiency:
SELECT… FROM LOCATION WHERE LOC_IN IN (10,20,30);
(26) Avoid using IS NULL and IS NOT NULL on index columns:
Avoid using any nullable columns in the index, Oracle will not be able to use the index. For a single-column index, if the column contains a null value, the record will not exist in the index. For a composite index, if each column is empty, the record also does not exist in the index. If at least one column is not empty, the record exists in the index. For example: if the unique index is established on the A and B columns of the table, and there is a record in the table with the value of A and B (123, null), Oracle will not accept the next one with the same A, B value (123, null) record (insert). However, if all index columns are empty, Oracle will consider the entire key value to be empty and empty is not equal to empty. So you can insert 1000 records with the same key value, of course, they are all empty! Because the empty value does not exist in the index column, so in the WHERE clause to compare the index column with a null value will make ORACLE deactivate the index.
Inefficient: (index is invalid)
SELECT… FROM DEPARTMENT WHERE DEPT_CODE IS NOT NULL; 
efficient: (index is valid) 
SELECT… FROM DEPARTMENT WHERE DEPT_CODE >=0; Webmaster. Station
(27) always use the first column of the index: 
If the index is built on multiple columns, the optimizer will choose to use the index only when its first column (leading column) is referenced by the where clause. This is also a simple and important rule. When only the second column of the index is referenced, the optimizer uses a full table scan and ignores the index. 
(28) Replace UNION with UNION-ALL (if possible):
When the SQL statement requires UNION two query result sets, the two result sets will be merged in a UNION-ALL manner, and then sorted before outputting the final result. If UNION ALL is used instead of UNION, then sorting is not necessary. Efficiency will therefore be improved. It should be noted that UNION ALL will repeatedly output the same records in the two result sets. Therefore, you still have to analyze the feasibility of using UNION ALL from the business needs. UNION will sort the result set, and this operation will use SORT_AREA_SIZE. The optimization of this memory is also very important. The following SQL can be used to query the consumption of sorting: Www.Chinaz.com is
inefficient: SELECT ACCT_NUM, BALANCE_AMT FROM DEBIT_TRANSACTIONS WHERE TRAN_DATE = '31-DEC-95' UNION SELECT ACCT_NUM, BALANCE_AMT FROM DEBIT_TRANSACTIONS WHERE TRAN_DATE = '31-DEC -95' High efficiency: SELECT ACCT_NUM, BALANCE_AMT FROM DEBIT_TRANSACTIONS WHERE TRAN_DATE = '31-DEC-95' UNION ALL SELECT ACCT_NUM, BALANCE_AMT FROM DEBIT_TRANSACTIONS WHERE TRAN_DATE = '31-DEC-95'
(29) Replace ORDER BY with WHERE: The
ORDER BY clause uses indexes only under two strict conditions.
All columns in the ORDER BY must be included in the same index and maintain the order in the index. 
All columns in ORDER BY must be defined as non-empty. 
The index used in the WHERE clause and the index used in the ORDER BY clause cannot be side by side. 
For example: The table DEPT contains the following columns: 
DEPT_CODE PK NOT NULL 
DEPT_DESC NOT NULL
DEPT_TYPE NULL is 
inefficient: (the index is not used) 
SELECT DEPT_CODE FROM DEPT ORDER BY DEPT_TYPE is
efficient: (the index is used) 
SELECT DEPT_CODE FROM DEPT WHERE DEPT_TYPE> 0 
(30) Avoid changing the type of the index column: 
When comparing data of different data types, ORACLE automatically performs simple type conversion on the column. Suppose EMPNO is a numeric index column: SELECT… FROM EMP WHERE EMPNO = '123'. In fact, after Oracle type conversion, the statement is converted to: SELECT… FROM EMP WHERE EMPNO = TO_NUMBER('123').
Fortunately, the type conversion did not occur on the index column, and the purpose of the index has not been changed. Now, suppose EMP_TYPE is an index column of character type: SELECT… FROM EMP WHERE EMP_TYPE = 123.
This statement is converted by Oracle to: SELECT… FROM EMP WHERETO_NUMBER(EMP_TYPE)=123. Because of the internal type conversion, this index will not be used! In order to avoid Oracle's implicit type conversion of your SQL, it is best to express the type conversion explicitly. Note that when comparing characters and numeric values, Oracle will give priority to converting numeric types to character types.
(31) WHERE clauses that need to be careful: 
Some WHERE clauses in SELECT statements do not use indexes. Here are some examples:
(1)'!=' will not use the index. Remember, the index can only tell you what exists in the table, but cannot tell you what does not exist in the table.
(2)'||' is a character concatenation function. Just like other functions, the index is disabled. 
(3)'+' is a mathematical function. Just like other mathematical functions, indexes are disabled. 
(4) The same index columns cannot be compared with each other, which will enable a full table scan. 
(32) a. If the number of records in the table with more than 30% of the data retrieved, the use of indexes will not significantly improve efficiency. 
b. Under certain circumstances, using an index may be slower than a full table scan, but this is a difference of the same order of magnitude. Under normal circumstances, using an index is several times or even thousands of times larger than a full table scan!
(33) Avoid using resource-consuming operations:
SQL statements with DISTINCT, UNION, MINUS, INTERSECT, ORDER BY will start the SQL engine to perform resource-consuming sorting (SORT) functions. DISTINCT requires one sort operation, while others need to perform at least two sorts. Usually, SQL statements with UNION, MINUS, INTERSECT can be rewritten in other ways. If the SORT_AREA_SIZE of your database is well configured. Use UNION, MINUS, INTERSECT can also be considered, after all, they are very readable.
(34) Optimize GROUP BY:
Improve the efficiency of the GROUP BY statement by filtering out unwanted records before the GROUP BY. The following two queries return the same results but the second one is obviously much faster.
低效: SELECT JOB , AVG(SAL) FROM EMP GROUP JOB HAVING JOB = ‘PRESIDENT' OR JOB = ‘MANAGER' 高效: SELECT JOB , AVG(SAL) FROM EMP WHERE JOB = ‘PRESIDENT' OR JOB = ‘MANAGER' GROUP JOB

Guess you like

Origin blog.csdn.net/cao919/article/details/68485289