The method of "transfer" sql handle more than one million data to improve query speed

Over one million data processing speed up the search process:
 1 should be avoided in the where clause = or <> operator, otherwise the engine to give up using the index and a full table scan!.
 2. query optimization, should try to avoid full table scan, should first consider indexing by the column involved in where and order.
 3. fields should be avoided to a null value is determined in the where clause, will cause the engine to give up using the index and a full table scan, such as:
     SELECT ID from where NUM IS null T
     may be provided on a default value of 0 num, to ensure num table column value is not null, then this query:
     SELECT ID from T = 0 where num
 4. should be avoided or used to join condition in the where clause, will cause the engine to give up using the index and a full table scan, such as :
     SELECT from T WHERE ID NUM = NUM = 10 or 20 is
     could this query:
     SELECT ID NUM WHERE T = 10 from
     Union All
     SELECT ID 20 is from T WHERE NUM =
 5. the following query will not result in a full table scan before :( Percent set)
     SELECT from T WHERE ID name like '%% ABC'
    To improve efficiency, can be considered full-text search.
 6.in should be used with caution and not in, otherwise it will lead to a full table scan, such as:
     select id from t where num in ( 1,2,3)
     for successive values, can not use in the BETWEEN:
     SELECT ID from T where NUM BETWEEN. 3. 1 and
  8. The field should be avoided in the where clause of the expression to the operation, which will cause the engine to give up using the index and full table scan. Such as:
     SELECT NUM ID from where T / 2 = 100
     should read:
     SELECT ID from NUM = 100 where T 2 *
 9. The field should be avoided to a function operation in the where clause that will cause the engine to give up using the index a full table scan. Such as:
     SELECT ID WHERE T from the substring (name, l, 3) = 'name abc'-id abc beginning to
     select id from t where datediff (day , createdate,' 2005-11-30 ') = 0-'2005 -11-30 'id generated
     should read:
     SELECT id from T WHERE name like' ABC% '
     SELECT id from CreateDate WHERE T> =' 2005-11-30 'and CreateDate <' 2005-12-1 '
 10. Do not functions, arithmetic operations, or other expressions in the where clause "=" left, or the system may not work properly indexed.
 11. As a condition of using the index field, if the index is a composite index, you must use the index to the first field to ensure the system uses the index as a condition, otherwise the index will not be used, and should as much as possible so that field order is consistent with the order index.
 12. Do not write the query does not make sense, such as the need to create an empty table structure:
     the SELECT col1, col2 INTO #t from the WHERE t 1 = 0
     This code does not return any result sets, but consumes system resources, should be changed like this:
     Create Table #t (...)
 13. exists replaced in many cases by a good choice:
     sELECT NUM NUM from a in WHERE (sELECT NUM from B)
     was replaced with the following statement:
     sELECT NUM WHERE exists from a ( WHERE NUM. 1 from B SELECT = a.num)
 14. a not all indexes are valid query, the SQL query optimization is performed based on data in the table, when the index data is repeated a large number of columns, SQL queries may not to use index, such as a table has a field sex, male, female almost each half, even if the index built on sex have no effect on the query efficiency.
 15. The index is not possible, the corresponding index can certainly improve the efficiency of select, but also reduces the efficiency of insert and update, because it is possible when the insert or update will rebuild the index, the index needs to be carefully considered how to build, As the case may be. An index number table is best not more than six, if too much you should consider some of the less frequently used to build the index column if necessary.
 16. The update should be avoided as much as possible clustered index data column, the column because the order of the index data is clustered physical storage order recorded in the table, once the column will result in the adjustment value changes in the order of recording the entire table, will consume considerable resources. If applications require frequent updates clustered index data columns, you need to consider whether it should be built for the clustered index index.
 17 make use of numeric fields, if only the fields containing numerical information is not possible for the character design, which reduces the performance of the connections and queries, and increases storage costs. This is because the engine when processing queries and connections one by comparing each character in the string, and for numeric comparison purposes only once is enough.
 18. The possible use varchar / nvarchar instead of char / nchar, because first of all variable-length fields small storage space, you can save storage space, followed by the query, in a relatively small field of search efficiency is clearly higher.
 19. Do not place any use select * from t, with a specific list of fields instead of "*", do not return any of the fields with less than.
 20. Table variables instead make use of a temporary table. If the table variable contains a large amount of data, please note that the index is very limited (only the primary key index).
 21. Avoid frequent create and delete temporary tables, system tables to reduce the consumption of resources.
 22. A temporary table is not unusable, they can make appropriate use of certain routines more efficient, for example, when it is necessary a large table or tables commonly duplicate references a data set. However, for a one-time event, it is best to use export table.
 23. When the new temporary table, if one inserts a large amount of data, it may be used instead of select into create table, to avoid a large number of log, in order to increase speed; if small data, in order to ease the system resource table, should create table, then insert.
 24. If you use a temporary table to be sure all the temporary table explicit deleted at the end of the stored procedure, first truncate table, then drop table, to avoid locking the system tables a long time.
 25. Try to avoid using a cursor, because the poor efficiency of the cursor, if the cursor operation more than 10,000 lines, you should consider rewriting.
 26. Use the cursor before the method or methods based on temporary tables, you should look for set-based solutions to solve the problem, usually more efficient set-based method.
 27. temporary tables, cursors are not unusable. Use FAST_FORWARD cursor on small data sets are usually better than other progressive treatment methods, particularly in reference to several tables must be in order to obtain the required data. In the result set includes "total" than usual routines performed by using the cursor speed fast. If the development time permits, cursor-based methods and can be set based approach to try to see which method is better.
 28. Set SET NOCOUNT ON at the beginning of all the stored procedures and triggers, SET NOCOUNT OFF disposed at the end. DONE_IN_PROC not need to send a message to the client after each statement is executed and triggers stored procedure.
 29. Avoid returned to the client a large amount of data, if the data is too large, you should consider the corresponding demand is reasonable.
 30. Try to avoid large transaction operations, improve system concurrency.
The reason of slow queries:
1, no index or index is not used (this is the most common query slow problem is the program design defects)
 
2, the I / O throughput of small and formed a bottleneck effect.
3, did not create computed column does not result in query optimization.
 
4, insufficient memory
5, the network is slow
6, check out the data too large (can use multiple queries, other ways to reduce the amount of data)
7, lock or deadlock (which is the most common query slow problem is the program defective design)
8, user sp_lock, sp_who, activities of view, because reading and writing competition for resources.
 
9, return unnecessary rows and columns
10, the query is not good, no optimization
may be to optimize the query by the following method
1, the data log, the index placed on a different I / O devices, increased reading speed, before You can Tempdb should be placed on RAID0, SQL2000 is not supported. Data amount (size) increases, improving I / O more important.
2, vertical, horizontal partition table, to reduce the size of the table (sp_spaceuse)
. 3, the hardware upgrade
4, according to query, indexing, index optimization, optimizing access mode, limiting the amount of data in the result set. Note that the fill factor should be appropriate (preferably using the default value 0). Index should be as small as possible, use a small number of bytes a column built a good index (reference index's creation), not to field a limited number of values of a single index such as the construction of gender fields
5, to improve network speed;
6, expand the server's memory, Windows 2000 and SQL server 2000 to support 4-8G of memory. Configure virtual memory: virtual memory size should be configured on the computer-based service run concurrently. Running Microsoft SQL Server? 2000, may consider virtual memory size to 1.5 times the physical memory installed in the computer. If another installation of the full-text search, and intended to run the Microsoft Search service to perform full-text indexing and querying, consider: configure the virtual memory size to be at least 3 times the physical memory installed in the computer. The SQL Server max server memory server configuration option is configured to 1.5 times (half the size of the virtual memory settings) of physical memory.
7, the server increases the number of the CPU; but must understand the need of parallel processing serial processing resources such as memory. Use the parallel or serial assessment of stroke is automatically selected MsSQL. Single task into multiple tasks to run on a processor. For example, the query delay sorting, connectivity, scanning and words simultaneously GROUP BY execution, SQL SERVER to determine the optimal level of parallelism depending on the load of the system, complex consumes a lot of CPU for parallel processing of most queries. However, the update operation UPDATE, INSERT, DELETE can not be processed in parallel.
8. If you are using a query like, then simply use the index does not work, but the full-text index, consumption of space. like 'a%' use of the index like '% a' do not use an index with like '% a%' query, the query proportional to the total length of the value field and time consuming, it is not used CHAR type, but VARCHAR. Long full-text index built for the value of the field.
9, DB Server and APPLication Server separation; OLTP and OLAP separation
10, distributed partitioned view can be used to implement database server complex. Commonwealth is a group of servers managed separately, but they cooperate to share the processing load of the system. This mechanism of formation of the Commonwealth of database server through the partition data to expand a group of servers to support the processing needs of large multi-layered Web site. For more information, see the joint design of the database server. (See SQL Help file 'partitioned view')
    A, before implementing partitioned view, must first horizontal partition table 
    B, table after creating members define a distributed partitioned view on each member server, and each view has the same The name. Thus, the distributed partitioned view name references the query can be run on any member server. As each member operating system on the server has a copy of the original table, like, but in fact only one member of each server table and a distributed partitioned view. Position data is transparent to the application.
11, rebuild the index DBCC REINDEX, DBCC INDEXDEFRAG, shrink data and log DBCC SHRINKDB, DBCC SHRINKFILE. Settings automatically shrink the log. For large databases do not set the database to grow automatically, it will reduce server performance. T-sql there on the wording of great stress, the common elements listed below: First, DBMS query plan is the process of handling this:
    1, lexical, syntax check of the query  www.2cto.com          
    2, the statement submitted to the DBMS query optimizer 
    3, optimizer algebra optimization and optimization of access paths 
    4, a query generated by the planning module precompiled 
    5, and then submitted to the system process performed at the right time 
    6, the final execution result back to the user Next, look at the structure of data stored in SQL SERVER: a page size of 8K (8060) bytes, 8 pages as a panel, according to the B-tree storage.
12, Commit and rollback difference Rollback: rollback of all things. Commit: presentation of the current things no need to write things in the dynamic SQL where, if the write please write on the outside, such as:. Begin tran exec (@s) commit trans written or dynamic SQL function or stored procedure.
13 Where in the query Select statement using the words limit the number of rows returned, avoid the table scan, if the return unnecessary data, a waste of I / O server resources, adding to the burden of reducing the network performance. If the large table in the table during the scan table locked, join to access the table to prohibit other serious consequences.
14, SQL comment stated no impact on the implementation of
15, as far as possible not to use the cursor, it takes a lot of resources. If a row-by-row you need to be performed to maximize the use of non-cursor techniques, such as: the client cycle, with a temporary table, Table variables, subqueries, etc. with Case statement. The cursor can be classified in accordance with the options it supports extract: only forward line must be extracted in order from the first row to the last row. FETCH NEXT is the only allowed the extraction operation is the default mode. Rolling resistance can be anywhere in the random extraction of any row in the cursor. Cursor technology has become very powerful in SQL2000, his purpose is to support the circulation.
There are four concurrent options 
READ_ONLY: do not allow updates through the cursor positioning (Update), and there is no lock rows in the result set. 
OPTIMISTIC WITH valueS: optimistic concurrency control is a standard part of transaction control theory. Optimistic concurrency control for such a situation, that is, in the interval cursor and update the line open, only a small chance for a second user updates a row. When this option to open a cursor, no lock control of one of the rows, which will help to maximize processing power. If a user attempts to modify a row, then the current value of this trip and the last row to extract the value of this acquisition were compared. If any value is changed, the server will know that other people have updated this trip and will return an error. If the value is the same, the server will perform the modification. Select the concurrent option OPTIMISTIC WITH ROW VERSIONING:? Optimistic concurrency control options for this line-based version control. Use row versioning, which the table must have a version identifier, the server can use to determine whether the bank has changed after read into the cursor.
In SQL Server, this performance provided by the timestamp data type, which is a binary number that represents the relative order of the database changes. Each database has a global current timestamp value: @@ DBTS. Every change in any way the row with a timestamp column, SQL Server to store the current value of @@ DBTS timestamp column, and then increase the value of @@ DBTS. If a table with a timestamp column, the timestamp will be credited to the row level. The server can compare the current timestamp value in a row and the last extracted timestamp stored value in order to determine whether the row has been updated. Server does not have to compare the value of all the columns, you can only compare timestamp column. If the application is no timestamp column of the table requirements are based on-line version of optimistic concurrency control, the cursor defaults to values based on optimistic concurrency control. 
SCROLL LOCKS this option to implement pessimistic concurrency control. In the pessimistic concurrency control, when the read rows from the database into the cursor result set, the application will attempt to lock the database row. When the server using the cursor, the row read will place an update lock on into the cursor. If the cursor is opened within a transaction, the transaction update lock remains until the transaction is committed or rolled back; when extracting the next row, the cursor lock is removed. If the cursor is outside of a transaction is opened, extracting the next row, the lock is dropped. Therefore, whenever the user needs to complete the pessimistic concurrency control, the cursor should be opened within a transaction. Update lock will prevent any other task acquire update locks or exclusive lock to prevent other tasks to update the row.
However, the update lock does not prevent shared locks, so it does not prevent other tasks to read lines, unless the second task with update locks are required to read. Scroll lock the cursor according to the definition specified in the SELECT statement locks suggests that these options can generate concurrent cursor scroll lock. Scroll lock acquisition at the time of extraction on each line, and keep the cursor to the next extraction or closed, whichever occurs first. The next extraction, the server for the new extraction line acquisition scroll lock, and the release of the last extract Bank of rolling lock. After the scroll lock lock independent of the transaction, and can be kept to a commit or rollback operation. If the option is turned off after the time of filing the cursor is off, the COMMIT statement does not close any open cursors, and scroll lock is retained to submit, in order to maintain the isolation of the extracted data. The acquisition depends on the type of scroll lock cursor concurrency options, and cursor SELECT statement locks tips. 
Tips optimistic lock read-only row versioning value optimistic locking silently unlocked unlocked unlocked update NOLOCK unlocked unlocked unlocked unlocked HOLDLOCK share share share update update update update TABLOCKX UPDLOCK wrong wrong unlocked unlocked unlocked not update other locked unlocked update * specified NOLOCK will prompt the prompt specified in the cursor table is read-only.
16, using Profiler to track queries, the time required to get the query to find the problem with SQL; using index optimizer optimize the index
17, note the difference UNion and UNion all of. UNION all good
18, note that using DISTINCT, in the absence of necessary, do not use it the same as UNION queries will slow down. Duplicate records in the query there is no problem
19, query not to return unwanted rows, columns
20, with the sp_configure 'query governor cost limit', or SET QUERY_GOVERNOR_COST_LIMIT to limit the query consumes resources. When evaluating queries exceed the limit consumption of resources, the server automatically cancel the query, the query before it is killed off. SET LOCKTIME set the lock time
21, with select top 100/10 Percent to limit the number of rows returned by the user or SET ROWCOUNT to limit the operation of the line
22, in SQL2000 before, generally do not use the following words "IS NULL", "<> ","! = "," !> ","! < "," nOT "," nOT EXISTS "," nOT IN "," nOT LIKE ", and" LIKE '% 500' ", because they do not take the index full table scan.
. Do not add function WHere column name in words, such as Convert, substring, etc., if necessary, when a function, then create an index to create a calculated column can also work to replace the wording: WHERE SUBSTRING (firstname, 1,1) = ' m 'to WHERE firstname like' m% 'separated (index scan), be sure to function and column names. And the index can not build too much and too big.
NOT IN will repeatedly scan the table, use EXISTS, NOT EXISTS, IN, LEFT OUTER JOIN to replace, especially the left connection and Exists IN faster than the slowest is NOT operation. If the value of the column containing space, before it the index does not work, and now 2000's optimizer to deal with. The same is IS NULL, "NOT", " NOT EXISTS", "NOT IN" to optimize her, and "<>", or the like can not be optimized, less than the index.
23, the query plan analysis and evaluation using Query Analyzer, view the SQL statement whether the optimized SQL. 20% of the code generally occupy 80% of the resources, we optimize the focus of these slow place.
24, if you use the IN or OR found that query did not take the index, using the show affirms the specified index: SELECT * FROM PersonMember (INDEX = IX_Title) WHERE processid IN ( ' male', 'female')
25, will need the results of the query pre-calculated on the table, the query time and then SELECT. This is the most important tool in SQL7.0 ago. For example, inpatient hospital fee calculation.
26, MIN (), and MAX () can be used to index the appropriate
27, the database has a principle of the code as close to the data, the preference Default, followed by Rules, Triggers, Constraint (such as a foreign key constraint primary key CheckUNIQUE ......, the maximum length of data types and so are the constraints), procedure. this will not only less maintenance work, high quality programming, and execution speed.
28, if you want to insert a large binary value to the Image column, using stored procedures, do not use the embedded INsert to insert (I do not know whether JAVA). Because of this application is first converted to a string of binary values (twice its size), and then converts the server by his character into a binary value stored procedure no such action: Method: Create procedure p_insert as insert into table ( Fimage) values (@image), in the foreground call the stored procedure parameters passed in binary, so processing speed is significantly improved.
29, Between faster, Between able to quickly find the index range according to the speed at certain times than IN. The query optimizer can be seen with a difference. select * from chineseresume where title in ( ' male', 'female') Select * from chineseresume where between ' male' and 'female' is the same. Because in will compare multiple times, so sometimes slower.
30, is necessary to create a global or local temporary table index, and sometimes can improve the speed, but not necessarily be so, because the index also spend a lot of resources. His creation is the same as the actual table.
31, not to build things no effect, such as when generating a report, a waste of resources. Only use it when necessary use of things.
32, with the OR of words can be decomposed into a plurality of queries, queries and is connected by a plurality of UNION. Their speed is only whether to use the same index, and if queries need to use the joint index, with a more efficient implementation of the UNION all. Did not use the words OR multiple index, and then rewritten into the form UNION trying to match the index. A key problem is the index used.
33, minimize the use of view, its efficiency is low. View of the operation of the operation slower than direct table, stored procedure can be used to replace her. In particular, do not use the view nested, nested view to increase the difficulty of looking for raw materials. We see the nature of view: it is well optimized storage on the server has generated SQL queries in the planning. When retrieving data on a single table, do not use the view point to multiple tables, read directly from the table to retrieve or contains only the table view, or an increase of unnecessary overhead, query interference. In order to speed up query views, MsSQL increase a functional view of the index.
34, do not use DISTINCT and ORDER BY when not necessary, these actions can be changed on the client side implementation. They add extra overhead. This is the same as with the UNION and UNION ALL the truth. SELECT top 20 ad.companyname, comid, position , ad.referenceid, worklocation, convert (varchar (10), ad.postDate, 120) as postDate1, workyear, degreedescription FROM jobcn_query.dbo.COMPANYAD_query ad where referenceID in ( 'JCNAD00329667' , 'JCNAD132168', 'JCNAD00337748' , 'JCNAD00338345', 'JCNAD00333138', 'JCNAD00303570', 'JCNAD00303569', 'JCNAD00303568', 'JCNAD00306698', 'JCNAD00231935', 'JCNAD00231933', 'JCNAD00254567', 'JCNAD00254585', ' JCNAD00254608 ',' JCNAD00254607 ',' JCNAD00258524 ',' JCNAD00332133 ',' JCNAD00268618 ',' JCNAD00279196 ',' JCNAD00268613 '

36, when a SELECT INTO, it will lock the system tables (sysobjects, sysindexes, etc.), blocking access to other connections. Affirming statement with the display when you create a temporary table, rather than select INTO. Drop table t_lxh begin tran select * into t_lxh from chineseresume where name = 'XYZ' --commit connections SELECT * from sysobjects can see in another SELECT INTO will lock live system tables, Create table will lock system tables (whether temporary table or system tables). So do not use it within the things! ! ! If this is the case is the use of temporary tables often use the real table, or temporary table variable.
37, usually before a GROUP BY HAVING clause will be able to eliminate extra lines, so try not to use them to do the work of removing the line. Their optimal implementation of the order should be as follows: select the Where the right words to select all the rows, Group By is used to group a statistical line, Having the words used to eliminate redundant packet. Group By Having such a small overhead, the query faster. Having grouping and for large data line is very resource consuming. If the Group BY purpose of calculation does not include only the packet, then the faster with Distinct
38, a plurality of update records each time a score is updated several times faster, that is a good batch
39, less temporary table, as far as possible with the result set and table class of variables to replace it, table type variable better than the temporary table
40, in SQL2000, calculated fields can be indexed, need to meet the following conditions:
  the expression of a, calculated field is determined by 
  B, can not be used TEXT, Ntext, Image data types 
  C, the following options must be formulated in the ON = the ANSI_NULLS, the ON = ANSI_PADDINGS, .......
41 is, as far as possible the processing of data on the server, to reduce the overhead of the network, such as the use of stored procedures. A stored procedure is compiled, optimized, and are organized into an execution plan in, and SQL statements in the database storage, is a collection of control flow language, fast speed, of course. Dynamic SQL is repeatedly performed, use the temporary storage process, which (temporary table) is placed in Tempdb. SQL SERVER ago because the complex mathematical calculations do not support, so this will have to work to increase network overhead on other layers. SQL2000 support for UDFs, now supports the complex mathematical calculations, the function return value not too much, so a lot of overhead. User-defined functions like cursor consume a lot of resources to perform the same, if the return large result using stored procedures
using the same 42, not repeated in the same sentence function, waste of resources, will result in a variable and then call in a faster
43 , SELECT COUNT (*) the efficiency of teaching is low, try to work the way he works, and EXISTS fast Please also note the difference:. select count (Field of null ) from Table and select count (Field of NOT null) from the return value Table is different.
44, when the server's memory is long enough, the preparation of the number of threads +5 = maximum number of connections, this can maximize efficiency; otherwise use the preparation of number of threads <maximum number of connections that enable SQL SERVER the thread pool to solve, or if the maximum number of connections = number +5, serious damage to the performance of the server.
45, according to a certain order to access your table. If you first lock the table A, and then lock the table B, then this must be in accordance with the order to lock them in all of the stored procedure. If you (inadvertently) a stored procedure in the first lock the table B, then lock the table A, which might lead to a deadlock. If the lock is not pre-order the detailed design is good, the deadlock is difficult to find
46, through appropriate hardware monitor Monitor SQL Server Performance Load Memory: Page Faults / sec counter if the value is sometimes higher, indicating that there were competing threads of memory. If you continue high, the memory may be a bottleneck. Process:
    1,% DPC Time refers to the percentage of the processor during the sample intervals used in the slow extension of the program calls (DPC) to receive and provide services. (The lower priority than the standard spacing interval running DPC). As the percentage of DPC is a privileged mode of execution, DPC percentage of time the time was part of the privilege. These time intervals are counted separately and are not part of the total calculated. The total number of shows as an example the percentage of time the average busy.
    2,% Processor Time counter if the parameter value lasting more than 95%, indicating that the bottleneck, a CPU. Consider adding a processor or for a faster processor. 
    3,% Privileged Time refers to the percentage of non-idle processor time for the privileged mode. (Privileged mode is a processing mode for the operating system components and manipulate hardware drivers designed which allows direct access to hardware and all memory. Another mode is the user mode, which is an application, the environment subsystems and a finite processing mode integer subsystem design. the operating system will convert the application threads to privileged mode to access operating system services). % Privileged Time includes the time of service interruption and DPC. Privilege time ratio may be higher due to the large number of equipment failures caused by generation interval. This counter will be displayed as part of the sample time when the average busy.
    4,% User Time represents a CPU consuming database operations such as sorting, aggregate functions performed like. If this value is high, consider increasing the index, to make use of a simple table joins, horizontal dividing method to reduce the large table, etc. the value. Physical Disk: Curretn Disk Queue Length counter the value should not exceed 1.5 to 2 times the number of disks. To improve performance, increase the disk. SQLServer: Cache Hit Ratio counter to the value the higher the better. If you keep less than 80%, should consider increasing the memory. Note that the parameter values from the SQL Server starts, has been cumulative count, so run over time, the value will not reflect the current value of the system.
47, analysis of select emp_name form employee where salary> 3000 in this statement if the salary is the Float type, then the optimizer optimize it for the Convert (float, 3000), because 3000 is an integer, we should be programmed using 3000.0 while Do not wait to let the DBMS transformation runtime. The same character and integer data conversion.

Guess you like

Origin www.cnblogs.com/peipeiyu/p/11448004.html