mysql basics four: slow query and sql optimization

9 What are the specific contents of the MySQL execution plan? What is mysql explain?

What are the specific contents of the MySQL execution plan? explain

10 How to locate a sql slow query statement

10.1 Positioning

For most relational database management systems (RDBMS), there are corresponding tools and techniques to identify slow queries:

  1. log :

    • MySQL : The slow query log ( ) can be enabled slow_query_logto log queries whose execution time exceeds a certain threshold.
    • PostgreSQL : By setting log_min_duration_statementparameters, you can cause queries whose execution time exceeds a certain threshold to be recorded in the log.
    • SQL Server : Use SQL Server Profiler to catch slow queries.
    • Oracle : Use AWR (Automatic Workload Repository) and ADDM (Automatic Database Diagnostic Monitoring) reports to identify slow queries.
  2. Performance and monitoring tools : Most RDBMSs provide tools to view performance statistics for running queries or recently run queries. For example, MySQL has it SHOW PROCESSLIST, and PostgreSQL has pg_stat_statementsmodules.

In conclusion, regular inspection and analysis of database performance is the key to ensure its healthy operation. Through the above methods and tools, slow queries can be effectively identified and optimized.

10.2 How to analyze the located slow query statement and view the analysis plan

  1. EXPLAIN command : When you have a SQL statement that is suspected to be a slow query, use the EXPLAIN command (or in some databases, such as SQL Server, use EXPLAIN PLAN) to view the execution plan of the query. This can help you understand why your query is slow and give you clues to optimize it.

10.3 How to optimize SQL statements

SQL optimization is a very in-depth topic, involving various techniques and strategies. Here are some common SQL optimization strategies:

  1. Proper use of indexes :

    • Create indexes for frequently queried columns and columns in the WHERE clause.
    • Avoid using functions or calculations on columns, which can render indexes unusable.
    • Use composite indexes to cover queries.
  2. Avoid full table scans :

    • This is achieved by using an index in the WHERE clause.
    • Consider queries that only need to scan part of the data rather than the entire table.
  3. Optimize query structure :

    • Reduce the number of columns requested in the SELECT clause to only select the columns you really need.
    • Avoid using SELECT *unless you really need all the columns in the table.
    • Use JOINs instead of subqueries, when appropriate.
  4. Optimize JOINs :

    • Use indexes in JOIN conditions.
    • Reduce the number of JOINs and use them only when necessary.
    • Use EXPLAIN to analyze JOIN operations.
  5. Use the appropriate data type :

    • Using the most appropriate data type can reduce storage and increase query speed.
    • For example, use INT instead of VARCHAR to store integers.
  6. Optimize database structure :

    • Normalize the database to reduce data redundancy.
    • Use denormalization where necessary to improve query performance.
  7. Optimized subquery :

    • Replace subqueries with JOINs whenever possible.
    • In some cases subqueries may be more efficient, but generally JOINs are faster.
  8. Use partitions :

    • Divide large tables into smaller, more manageable pieces based on the values ​​of certain columns.
  9. Limit the result set :

    • Use LIMIT to reduce the size of the result set, especially when only some of the results are of interest.
  10. Optimize storage engine :

  • Choose an appropriate storage engine based on the needs of your application (for example, read or write intensive).
  1. Regular maintenance :
  • Runs periodically OPTIMIZE TABLEto organize data and reclaim unused space.
  • Update statistics to help the query optimizer make better decisions.
  1. Use cache :
  • Take advantage of MySQL's query cache.
  • Use an external caching solution such as Redis or Memcached.
  1. Avoid using certain high-overhead SQL functions and operations , such as LIKE searches, especially with leading wildcards.

  2. Evaluate and possibly rewrite slow queries :

  • Use the slow query log to find out which queries need optimization.
  • Use EXPLAINto view the execution plan of a query and identify bottlenecks.

These are just some common ways to optimize SQL. In practice, each database and application scenario may require a specific optimization strategy.

11 If there is a large amount of data in a table, and you query a record that is not in the database, and find that none of them match, how do you optimize it at this time? Of course, it does not necessarily mean using mysql to optimize, but it may also mean using other tools to optimize.

When querying for records that do not exist in a large table, the key to optimization is to reduce the amount of data that must be scanned. Here are some ways to optimize this kind of query:

  1. Build proper indexes :

    • Make sure that the fields being queried are indexed so that the database can quickly lookup the data instead of doing a full table scan.
    • Use EXPLAINthe command to view the execution plan of the query to confirm whether the index is used.
  2. Database partition :

    • For very large tables, consider using database partitions. This way, queries only need to search one or a few relevant partitions rather than the entire table.
  3. Data summary :

    • For frequently queried fields, a digest (such as a hash) can be calculated and stored in a separate column or table.
    • When querying for values ​​that are not in the database, only the digest needs to be checked and not the actual data.
  4. Use an external search engine :

    • For complex search needs, full-text search tools such as Elasticsearch or Solr can be used. These tools are very effective for quickly finding data.
  5. Use Bloom filters :

    • A Bloom filter is a data structure that can quickly and efficiently check whether an element is in a collection.
    • Use a Bloom filter to do a quick check before querying the database if the filter indicates that the record might exist.
  6. Reduce data :

    • Periodically archive or delete data that is no longer needed to keep the table size within a manageable range.
  7. Use cache :

    • If you frequently query certain datasets, consider using a caching solution such as Redis or Memcached to cache the results. That way, for common queries, you can check the cache first.
  8. Hardware and configuration optimization :

    • Make sure the hardware configuration meets the requirements, such as using SSD, increasing RAM, optimizing network, etc.
    • For databases, consider increasing buffer sizes, tuning query cache, etc. to improve performance.
  9. Application layer strategy :

    • If possible, try to do some preprocessing or checking at the application layer to reduce unnecessary queries.
    • For example, use an application cache, maintain some data structures for fast lookups, etc.
  10. Consider the strategy for data storage :

    • For example, for tables that are infrequently queried but have a huge amount of data, consider using columnar storage (such as Apache Parquet) instead of traditional row storage.

In general, optimizing queries for records that don't exist in large tables requires a deep understanding of your specific needs and data. Different strategies and technologies have their applicable scenarios, so it is usually necessary to combine multiple methods to achieve the best performance.

11.1 Establish appropriate indexes. Why is it still recommended to establish appropriate indexes in the case of large tables? Because if you use a non-existing primary key to query, will you use the index? or full table scan

If you query with a primary key that doesn't exist, the database will use the primary key index instead of doing a full table scan.

Primary key indexes (in many databases, primary keys are unique by default and indexed) are designed for fast lookup of records. When you make a query using a primary key value, the database looks up that index even if the record doesn't exist. Due to the design of the index structure (such as B-tree or hash index), the database can quickly determine whether a record exists without checking the entire table.

Therefore, for non-existing primary key queries, the database will very quickly determine that there are no matching records, and return the results quickly without performing a full table scan .

11.2 Why do other optimization measures need to be taken after the index is built to greatly reduce the number of disk IOs?

Answer: Because a large table may have tens of millions of data, the height of its index tree may be between three and four layers, and the number of disk IOs is about four to five times, which is also very time-consuming.

11.3 Why does database partitioning help avoid full table scans? If you use partitions to query data that does not exist, will you query the B+ indexes of all partitions?

When you use database partitioning, query operations avoid scanning all partitions as much as possible. If the query condition contains partition key information, the database system usually only operates on the relevant partition. However, if the conditions of the query do not contain information that can locate a specific partition, the query may need to be performed on all partitions. Therefore, for queries on non-existent data, if the condition does not have a partition key, it is indeed possible to query the B+ indexes of all partitions.

For example, if date is used as the condition for range partitioning, when querying a date that does not exist in the table, only one partition needs to be queried. For example, the first partition stores 2021-1-1~2022-1- 1, the second partition stores data between 2022-1-1 and 2023-1-1, but when querying a data with a date of 2022-8-8, no matter whether the date exists or not, Only the second partition will be queried, not the first.

11.4 Data summary: It means to extract frequently queried fields separately, form a hash value and store them, and then calculate the hash value of the query field before querying to see if there is a match. If so, go to the mysql database to query. If you don't have it, don't ask, right?

Yes, you understand correctly. The purpose of data summarization is to provide a quick check mechanism to determine whether a value is likely to exist in the database without directly querying the raw data.

This approach is based on the properties of hash functions. Hash functions convert input data (such as strings or numbers) into a fixed-length string of numbers (hash value). If the hash values ​​of two input values ​​are different, then the two input values ​​must be different. However, if two different input values ​​produce the same hash value, this is called a "hash collision". Although collisions are possible, in practice, the probability of a well-designed hash function producing a collision is very low.

Here's how to use it in this scenario:

  1. For frequently queried fields, compute their hash and store it in a separate column or table.
  2. When a specific value needs to be queried, the hash value of the value is first calculated.
  3. Query this hash value where it is stored. If the hash value doesn't exist, then the original value certainly doesn't exist in the original data either, so there's no need to query the original data.
  4. If the hash exists, the original data can be queried further to determine the actual match (since there may be a hash collision).

This strategy can significantly speed up the query in some scenarios, especially when the amount of raw data is very large or the query operation is very frequent. However, also note that this approach adds storage complexity (because of the additional storage of the hash value) and some additional computational overhead (because of the need to calculate the hash value).

11.5 Using the Bloom filter, is this method particularly similar to the data summary mentioned above?

You are right, Bloom filters function much like the index summaries I mentioned earlier. They are both used to quickly determine whether data is likely to exist without unnecessary storage or database lookups. However, a Bloom filter has the property that it may produce false positives (falsely believing that an element exists), but not false negatives (falsely believing that an element does not exist). Using Bloom filters can greatly improve query efficiency, especially in big data scenarios.

11.6 Why can using search engines such as ES speed up queries?

ES rules of use
Of course. Both Elasticsearch and Apache Solr are built on top of Lucene. They are mainly used for full-text search, but they are also often used as search and analysis tools for big data. Here are the main reasons why they work efficiently with large datasets and return query results quickly:

  1. Inverted index :

    • This is the main reason Lucene (and by extension Elasticsearch and Solr) are so fast to query. In simple terms, an inverted index is a mapping from words extracted from a document to the document in which they are contained. When a user queries a word or phrase, the search engine only needs to find the entry of the word in the inverted index, and then quickly find all documents containing the word without scanning each document.
  2. Sharding and parallel processing :

    • Both Elasticsearch and Solr support splitting data into pieces, or shards, which can be distributed across multiple nodes. When queries are issued, these queries can run in parallel on multiple shards, greatly speeding up the query process. Additionally, this sharding mechanism provides high availability and fault tolerance.
  3. Near real-time indexing :

    • When data is joined to Elasticsearch or Solr, they are quickly indexed and ready for querying, which ensures that data is always queryable, even if they have just been joined.
  4. Compressed and optimized storage :

    • Lucene uses various techniques to compress its indexes to reduce storage space and speed up queries.
  5. Caching mechanism :

    • Both Elasticsearch and Solr use various caches (such as filter cache, query cache, etc.) to reduce repeated calculations and speed up the response time of common queries.
  6. Query optimization :

    • For example, when you query for multiple terms, the tools may query for the least common term first because it narrows down the result set the fastest.

Therefore, when you query a record that does not exist, due to the inverted index and other optimizations, the search engine can quickly determine that the record does not exist without a full table scan.

These properties make Elasticsearch and Solr ideal tools for fast search and analysis on large datasets. However, like any technology choice, they are not always the best solution. The decision to use these tools must be based on the specific needs of the application.

11.7 Reduce data : regularly archive or delete data that is no longer needed, and keep the size of the table within a manageable range. Is it suitable for regular archiving like the seckill commodity list?

For the flash sale product table, this kind of table often has some expired or no longer used data, because the flash sale activity starts and ends within a specific time. Archiving or deleting this data on a regular basis is indeed a good strategy. Archiving data that is no longer active but may need to be referenced in the future (for example, to analyze historical trends or satisfy audit requirements), while deleting data that is really no longer needed, preserves table efficiency and reduces storage overhead.

Guess you like

Origin blog.csdn.net/yxg520s/article/details/132635254
Recommended