db big data processing

How to optimize the multi-table big data query of the database?

Professional answer
  Information System Architect XX Institute Software Development
2015-04-23 21:26
1. Try to avoid the null value judgment of the field in the where clause, otherwise the engine will give up the use of the index and perform a full table scan, such as:
select id from t where num is null
You can set the default value of 0 on num to ensure that There is no null value in the num column in the table, and then query like this:
select id from t where num=0
2. Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up the use of the index and perform a full table scan . The optimizer will not be able to determine the number of rows that will be hit by the index, so it needs to search all the rows of the table.
3. Try to avoid using or to connect conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as:
select id from t where num=10 or num=20
You can query like this:
select id from t where num=10
union all
select id from t where num=20
4. In and not in should also be used with caution, because IN will make the system unable to use the index, but can only directly search the data in the table. For example:
select id from t where num in(1,2,3)
For continuous values, if you can use between, don't use in:
select id from t where num between 1 and 3
5. Try to avoid using non-initial search in indexed character data. This also makes it impossible for the engine to take advantage of the index.
See the following example:
SELECT * FROM T1 WHERE NAME LIKE '%L%'
SELECT * FROM T1 WHERE SUBSTING(NAME,2,1)='L'
SELECT * FROM T1 WHERE NAME LIKE 'L%'
even if the NAME field is indexed , the first two queries still cannot use the index to complete the accelerated operation, and the engine has to operate all the data in the whole table one by one to complete the task. And the third query can use the index to speed up the operation.
6. Force the query optimizer to use an index if necessary, such as using a parameter in the where clause, which also results in a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer the choice of an access plan to runtime; it must choose it at compile time. However, if the access plan is built at compile time, the value of the variable is unknown and cannot be used as an input for index selection. For example, the following statement will perform a full table scan:
select id from t where num=@num
can be changed to force the query to use the index:
select id from t with(index(index name)) where num=@num
7. Try to avoid where The expression operation on the field in the clause will cause the engine to give up the use of the index and perform a full table scan. For example:
SELECT * FROM T1 WHERE F1/2=100
should be changed to:
SELECT * FROM T1 WHERE F1=100*2
SELECT * FROM RECORD WHERE SUBSTRING(CARD_NO,1,4)='5378'
should be changed to:
SELECT * FROM RECORD WHERE CARD_NO LIKE '5378%'
SELECT member_number, first_name, last_name FROM members
WHERE DATEDIFF(yy,datofbirth,GETDATE() ) > 21
should be changed to:
SELECT member_number, first_name, last_name FROM members
WHERE dateofbirth < DATEADD(yy,-21,GETDATE())
That is: any operation on a column will cause a table scan, which includes database functions, calculation expressions Etc., move operations to the right of the equals sign whenever possible.
8. The function operation on the field in the where clause should be avoided as much as possible, which will cause the engine to give up the use of the index and perform a full table scan. For example:
select id from t where substring(name,1,3)='abc'-- id whose name starts with abc
select id from t where datediff(day,createdate,'2005-11-30')=0-- The id generated by '2005-11-30'
should be changed to:
select id from t where name like 'abc%'
select id from t where createdate>='2005-11-30' and createdate<'2005-12-1'
9. Do not perform functions, arithmetic operations or other expression operations on the left side of "=" in the where clause, otherwise The system will likely not be able to use the index correctly.
10. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system can use the index, otherwise the index will not be used and should be As much as possible, make the field order consistent with the index order.
11. Many times using exists is a good choice:
elect num from a where num in(select num from b)
replace with the following statement:
select num from a where exists(select 1 from b where num=a.num)
SELECT SUM(T1.C1)FROM T1 WHERE(
(SELECT COUNT(*)FROM T2 WHERE T2.C2=T1.C2>0)
SELECT SUM(T1.C1) FROM T1WHERE EXISTS(
SELECT * FROM T2 WHERE T2.C2=T1 .C2)
Both produce the same result, but the latter is obviously more efficient than the former because the latter does not generate a large number of locked table scans or index scans.

Recommended for you:

 
The only advantage of a view is that it simplifies the SQL that is queried during development, which is no different from direct query in efficiency.
The entity table can be used for aggregation. Although this query is fast, the disadvantage is that it is static. If the customer information changes, it must be re-aggregated to ensure the accuracy of the information.
Follow up:
Because you need to query in the foreground, if you directly query the sql in the program in the foreground, it will take a lot of time to connect 400w+ data one by one, so you should consider using a table directly in the database to store the spliced ​​data. Business data, so that the front desk can directly check this table when it is used.
Follow up:
That is to do aggregation, static table is definitely the fastest. But the disadvantages I mentioned above.
Follow up:
The disadvantage of re-aggregation does not need to be considered, because the business table is also reloaded every day. The main problem is how to do table aggregation and how to optimize faster.
Follow up:
The aggregation is executed in the background, and a timed batch process or Oracle job is used to run the aggregated SQL around 1:00 a.m. every day.

After the aggregation is completed, the front desk can directly query the static table, and then paginate it when it is presented.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327013561&siteId=291194637