001. SQL slow query troubleshooting (inconsistent field types)

One: background

  • After the new business was launched, the cooperation department found that our engine was executed, and the time to call them back was significantly increased. Let us investigate the reason.

Two: The investigation process

2.1: SQL slow query positioning

  • First of all, I traced the log and found that the execution time of a SQL was about 3 minutes, so it was judged that it was caused by the slow query.

  • Execute the SQL as follows:

    select overdue_total from nc_cases where status=0 and uid = 12345678990; 
    
  • Check the index used by SQL execution, and found that the uid index is not used, but other indexes are used, and the number of scanned rows is more than 5 million rows.
    insert image description here

  • Check the type of uid and find that the uid type is varchar type, but I use a number, which causes the index to fail.

  • Change the sql to a string type, check again whether to use the index, and the number of scanned rows, and find that the uid index is used and the number of scanned rows is 7 rows.
    insert image description here

  • Therefore, the result should be that the uid type in the sql written by myself is wrong, resulting in slow query.

2.2: Analyzing the reasons for translating String into Int at the Python level

  • In the current way of writing code, I found that even if I use str type after %, it will become int type when passed in.
    sql = "select overdue_total from nc_cases where status=0 and uid = %s" % str(uid)
    rows = db.collection.select(sql)
    
  • Adjust the way of writing the code, put a layer of single quotation marks around %s, after testing, it is found that the running speed has been greatly improved.
    	sql = "select overdue_total from nc_cases where status=0 and uid = '%s'" % str(uid)
    	rows = db.collection.select(sql)
    
  • Since I am using the DButils tool, I checked the official way of writing
    • In the official writing method, the select statement can not only pass sql, but also pass a value, which type is passed in the value, and the type will be automatically written into the SQL.
    • For example, in the following SQL, if there is a string in the list, the string will be automatically parsed into the SQL statement. This type will not change.
    sql = "select overdue_total from nc_cases where status=0 and uid = %s"
    rows = db.collection.select(sql, ["12345678990"])
    

2.3: Problems in Python's SQL execution time detection

  • After analyzing the problem, we need to use python statements to verify the execution time of SQL, because I wrote a script to test the execution efficiency of two SQLs.
    sql_3 = "select overdue_total from nc_cases where status=0 and xiaoying_uid = %s " % str(12345678990)
    sql_4 = "select overdue_total from nc_cases where status=0 and xiaoying_uid = '%s' " % str(12345678990)
    
    
    start_time3 = time.time()
    rows_3 = db.collection.select(sql_3)
    end_time3 = time.time()
    print(sql_3 + " use time is {}".format(end_time3 - start_time3))
    
    start_time4 = time.time()
    rows_4 = db.collection.select(sql_4)
    end_time4 = time.time()
    print(sql_4 + " use time is {}".format(end_time4 - start_time4))
    
  • When I run it, I find that there is not much difference in the execution time. Why is this?
    insert image description here
  • After tracing the underlying code of the SQL tool, it is found that the select() statement returns a generator object, and the query will not be triggered at all without traversing the results.
    insert image description here
  • Therefore, the test code increases the traversal statement
    start_time3 = time.time()
    rows_3 = db.collection.select(sql_3)
    for row in rows_3:
        overdue_total = row[0]
        print(overdue_total)
    end_time3 = time.time()
    print(sql_3 + " use time is {}".format(end_time3 - start_time3))
    
    start_time4 = time.time()
    rows_4 = db.collection.select(sql_4)
    for row in rows_4:
        overdue_total = row[0]
        print(overdue_total)
    end_time4 = time.time()
    print(sql_4 + " use time is {}".format(end_time4 - start_time4))
    
  • In the end it turned out that the times were very different.
    insert image description here

Three: Summary

  • SQL slow query problem, first use explain to locate the index used and the number of rows scanned.
  • When python operates SQL statements, it is recommended to use the official standard format.
  • When pyhton calls the select statement, the SQL may not necessarily be executed. If the underlying package of select is to return the generator object, it will actually go to the database to query data when traversing.

Guess you like

Origin blog.csdn.net/qq_41341757/article/details/128862531