And perform the steps of 2-8-SQL optimization strategy

SQL execution steps

Grammar check: Check the spelling SQL syntax specification compliance

Semantic checks: Check whether the presence or absence of access to the object and the user has the appropriate permissions

Analysis: Is there a check in the shared pool is fully resolved before the same good, if there is, choose to skip the execution plan and produce programs run directly

Hard Analysis: SQL is to submit a complete re-parsed from scratch, create a parse tree, execution plan for the SQL implementation is expensive overhead action, in many projects of the same function code to be consistent with the binding variable

Soft Resolution: found with exactly the same SQL parsing good results will jump two steps behind the strong resolve in the shared pool (shared pool) in

Plan of Implementation: The indented list of the steps displayed SQL statement

 


Hardware optimization:

1. access path: B + tree index access method, hash index access method, method of accessing cluster

2. The physical layout of data: how to store

3. available memory

4. Available Processor

The centralized storage and distributed storage. Refers to the centralized storage of all data is stored on the same node. This will help improve the efficiency of database queries and modifications. But centralized storage has great risk, if the node appears irreversible damage will crash the database. Such as earthquakes cause database corruption.

6. reasonably efficient operation algorithm: full table scan, index scan, nested loop join, sort - merge join

A solid-state storage

8.RAID

 

index

5 kinds of advantages index

By creating a unique index, you can guarantee the uniqueness of each row of data in a database table.

It can greatly accelerate the speed of data retrieval, which is the main reason for creating the index.

You can accelerate the connection between the table and the table, especially in reference to particular interest for data integrity aspects.

When using packet data retrieval and sorting clause, it can also significantly reduce the query time grouping and sorting.

By using the index, you can process the query, use concealer to optimize and improve the performance of the system.

 

Conditions should be indexed

1) In the column often need to search, you can speed up the search;

2) on a primary key column, the only mandatory arrangement structure and organization of the data in the table column;

3) in the column with the regular connection, which mainly some foreign key column, you can speed up connection;

4) created on columns that are frequently required in accordance with the scope of the search index because the index has been sorted, their designated ranges are continuous; foreign key to accelerate the construction of the index because the connection will also reduce the chance of deadlock.

5) often need to create an index on the sort columns, because the index has been sorted, so that the query can be used to sort the index to speed up the sorting query time;

6) Create an index is often used in the WHERE clause of the column above judgment conditions to accelerate the speed.


system design)

1. Algebraic Optimization - heuristic optimizer

2. counter table

3. Summary

4. The sub-table

5. Anti-Paradigm Model. Well-designed logic mode

6. foreign key indexing

7. possible, avoid using a custom function. Should write SQL statements as much as possible to avoid using custom functions, because for the custom function, the optimizer can not make optimization, if excessive abuse of the custom function will lead to database performance degradation. However, in some cases necessary, you can only use a custom function to achieve a particular query.

 

Heuristic query optimizer

1. Select the operation it should perform as much as possible. The most important, the most basic one. Execution cost savings can often several orders of magnitude, so that the intermediate calculation result of the greatly reduced

2. The projection operation and selection operation simultaneously. If a plurality of projection and selection operations, and they have a relation to the same operation, all of these operations can be completed while scanning this relationship in order to avoid rescanning relationship

3. The projection operation with their eyes prior to or after the combination. There is no need to remove some of the fields and scan it again relationship

4. Some choose the Cartesian product of the foregoing it is to be performed to combine into a concatenation. Concatenation relationship than on the same Cartesian product save a lot of time

5. find common subexpression

 

1) Anti paradigm and the paradigm

The first paradigm: Each column is an indivisible atomic data item.

The second paradigm: a first relief portion paradigm dependent basis.

Third Pattern: elimination of the second transmission paradigm dependent basis.

 

Anti paradigm for the third paradigm is by way of adding redundancy destroyed the third normal form, the first two paradigms still have to follow.

 

Paradigm advantages:

a. write fast because no need to write redundant data, reducing the burden of writing.

b. Update the fast because usually only less data needs to be updated.

c. Since there is no redundancy, so it will not cause data inconsistencies.

d., and less need for GROUP BY DISTINCT.

The disadvantage is: the need for association.

 

Paradigm disadvantage is that the advantages of the anti-paradigm, no correlation, and since in the same table can be designed appropriate index.

Practical applications typically do not employ complete paradigm, but put some redundancy to reduce the related table and the table, speed up queries.

 

2) sub-table

If the data can be archived in a state where the data table, such as the completion of the operating mode and state, it can be considered completed and the operating mode table into state data, convert the data to complete the state to the completion state table, since the data is always state operation to complete, so the system is running so no matter how long the run state data tables are almost constant, and the complete state of data in addition to statistical analysis, but does not require almost no inquiries, thus greatly improving the system operation speed, the amount of data in the table under control.

Further statistical analysis For the scene, in order to reduce union table may require service query from the operating mode and state of the two states to complete a second election.

 

For some of the huge amounts of data, you can also consider doing the hash value of a field according to points table storage, of course, increase the complexity of the application, which is no way, no way is usually perfect, the architecture is to make trade-offs based on the actual application scenarios , the so-called loyalty and filial piety are not attained, but it is more appropriate in some way.

Further problems can be solved by points distributed database table, a distributed database automatically merged automatic query table storage, shielded by a distributed database middleware complexity, all kinds of dirty, it wants to dirty work.

 

3) Summary

For reports statistics some large amount of data, real-time, if not required, it can be summarized on a regular basis, such as once per hour, summary or summary once a day, if you require real-time, then, for various large table, various group by, not only very slowly statistics and easy to affect normal business operations. The author of the company to be before, every night open a wide variety of regular data collection tasks, the database is not busy at night, from 12:00 till 6:00 in the morning, timed task schedule is full, it is really exhausted rhythm ah, but fortunately, the computer will not be losing his temper, freak. . Of course, such statements statistical data are as of yesterday, every day late, usually this is allowed.

 

4) counter table

web application in order to record clicks, clicks can design a table,

create table hit_counter(cnt int unsigned not null);

 

Since there is only one record so lock contention too serious, think of what solution do the same with concurrenthashmap lock split.

Table structure amended as follows:

create table hit_counter(slot tinyint unsigned not null primary key,cnt int unsigned not null);

100 advance into the data thus modified can be used when the following statement,

update hit_counter set cnt = cnt+1 where slot = RAND()*100;

When summing get on it, select sum (cnt) cnt from hit_counter;

 


System (application)

1. rational and efficient use of SQL statements and methods, suggesting that optimize SQL optimizer

Good query

Follow the result set size and the intermediate data set

2. Watch the number of users, concurrency. And accordingly to check their physical design, system design

3. Optimize Server Configuration

 

SQL statement using the factors to be considered

1) the total amount of data

Sql consider the most important factors: the amount of data that must be accessed; the absence of established target capacity, it is difficult to judge the efficiency of query execution

Query 2) define the result set

Good query: this condition is met little data, you can filter a lot of data

Where words: especially in sub-query or view, where there may be multiple words

Filtration efficiency, high and low, affected by other factors

Factors: the filter conditions, the main sql statement, query a huge impact on the data

Size 3) result set

The amount of data returned by the query, it is important to be ignored

Size depends on the details of the table and filters

An exception is the number of independent inefficient use of the combination of conditions is very efficient

From a technical point of view, the query result set size is not important, important is the user feeling

Skilled developers should strive to make the response time is proportional to the number of records returned

Number 4) to obtain a result set table involved: the number of tables may affect the performance of

Connection: Too many table joins (eight) in relation to question the correctness of the design; For Optimizer, with the increase in the number of tables, the complexity of exponential growth; the time of writing complex queries too many tables, multi-linked options failure probability is very high

View: obscure the fact that multi-table joins

Reducing complex queries and complex view

5) the number of concurrent users (the number of users simultaneously modify data)

Pay attention to the design: block access contention, blocking, latching, ensure read consistency

In general, the overall throughput of> individual response time

Data storage using fixed-size blocks, access to multiple records, I / O interactions is simple, and the buffer memory in the process; however, when the modified data is too long, will be migrated to another storage block; too much data blocks can cause problems accessing the data block contention, the impact of concurrent performance

 


SQL optimizer and concepts

 

Optimizer: With original query relational theory (relational algebra) provide for effective and correct semantic equivalent transformation, to find the optimal path, can generate new optimal implementation of the program

Optimization: real time data processing to be executed happen

Optimization influence factors: the index, the physical layout of the data, available memory size, the number of available processors, the amount of directly or indirectly involved in the index table and data

Sql statement executes relations operation, then perform non-relational operations (order by)

 

About logical query processing phases (the Internet to find, understand it)

FROM: FROM clause of the two tables in the previous execution Cartesian product (Cartesian product) (cross-coupling), to generate a virtual table VT1

ON: Filters for VT1 applications ON. Only those who make <join_condition> true line was only inserted VT2.

OUTER (JOIN): If OUTER JOIN specified (relative CROSS JOIN or (INNER JOIN), reservation table (preserved table: LEFT OUTER JOIN the left table marked as reserved table, the right outer join the right table marked as reserved table, fully the outer join two tables are marked as reserved table) match is found as an external row rows are added to VT2, VT3 generated. If the FROM clause contains two or more tables, a coupling is generated on the table and the results a table repeat steps 1 through 3 until all tables completely processed.

WHERE: WHERE filters to VT3 applications. Only when the <where_condition> row was only true of insertion VT4.

GROUP BY: a list of columns in the GROUP BY clause to group rows in VT4, generated VT5.

CUBE | ROLLUP: the super-group (Suppergroups) insert VT5, generate VT6.

HAVING: HAVING filters to VT6 applications. Only when the <having_condition> is set to true will be inserted

VT7.

SELECT: SELECT list processing, produce VT8.

DISTINCT: removing duplicate rows from VT8, resulting VT9.

ORDER BY: the VT9 rows sorted list of columns in the ORDER BY clause, generate a cursor (VC10).

TOP: selecting a specified number or percentage of rows from the beginning of the VC10, VT11-table, and returned to the caller.

 

Effective range Optimizer

The optimizer needs the information found in the database

Equivalent transformation can be performed in a mathematical sense

Optimizer considers the overall response time

Optimizer improvement is independent of the query

Strategy is: If it is a small number of queries, the optimizer will optimize all; if it is a big query optimizer will optimize it as a whole

 

filter

1) How to define the most critical factor in the result set, using a variety of techniques SQL determination factor

2) filter conditions meanings:

Where words and having words

Join filter conditions

Select filter criteria

3) The quality of the filter conditions, depending

What data is ultimately needed is a table from which

What input value is passed to the DBMS engine

Can filter out unwanted data of conditions which

High efficiency filter condition is the main driving force of the query

 

SQL query optimization summary

1. How suggesting that the query optimizer optimization

Use join table join order to imply that when multiple table join operation, consider using exists and in operation to optimize; if you do not use the join is to allow the query optimizer to optimize their own, their own tables to determine the connection order (start with a small table, and then a big table), may be less efficient

2. The multi-dimensional query dimensionality reduction, a table Do not connect more than three, more than the non-correlated subquery will become embedded view, dimensionality reduction

3. Consider the proportion of extracted data in the table, when the query returns the total amount of data recorded more than 10% do not use the index, the query result set less than 10% is a good query

4. Avoid using distinct at the top, and in use exists to deal with

5. Avoid using select * at the top, this will produce a result set redundancy, reduce performance

 

Large amount of data queries principle

Principle: Do not kick the faster the data, the amount of data queries later stage must deal with the lack, the higher the efficiency of the query

application:

Set operations, such as union statement, but do not cut-and-paste

Group by&having 字句

All results polymerization conditions affect the function of the sentence should be placed hanving

Any independent polymerization conditions should be placed where clause

Group by reducing the amount of data that must perform sort processing operation

 

Non-correlated subqueries become embedded view - reducing query dimensions

example:

‘’

 


 

Image processing sequence of statements

1. Combined table in the FROM clause (Cartesian product)

2. Use the WHERE clause conditions selected tuples, discard condition is not satisfied clause tuple selection discard those that do not satisfy the condition WHERE tuple

3. retained grouped according to the GROUP BY clause tuple

4. Use the HAVING clause conditions grouped set of tuples clause conditions tuples (group) after a packet is selected, discard those conditions is not satisfied HAVING tuple

5. The SELECT clause statistical calculations to generate results relationship tuple clause statistical calculations to generate results relationship tuple

6. sort the results according to the ORDER BY clause

 

Published 137 original articles · won praise 2 · views 20000 +

Guess you like

Origin blog.csdn.net/m0_37302219/article/details/104856963