Django ORM tuning practice

First, the request analysis mainly due to the slow response

The task execution is requested by function into pieces, the print execution time of each module with the time.time (), in most cases the performance will mainly consumed in one module, i.e., 80% of the performance problems in the 20 % of the code on

After finding the main reason, this one will focus on optimizing module

Second, the use django.db.connection.queries view sql implementation of a request

from django.db import connection
...
print(connection.queries)
# [{'sql':--执行的sql语句--, 'time':--sql语句执行的时间--}...]

Note that only get connection.queries in mode debug = True

Multi-database

db.connections is a dictionary-like objects, can obtain this data source connection through an alias database connection. such asconnections['my_db_alias']

from django.db import connections
for key in connections:
    print(key)
# 可以打印出所有配置了的数据源别名,django会为每个数据源创建一个connection

By Django / DB / the init .py in

class DefaultConnectionProxy:
    """
    Proxy for accessing the default DatabaseWrapper object's attributes. If you
    need to access the DatabaseWrapper object itself, use
    connections[DEFAULT_DB_ALIAS] instead.
    """
    def __getattr__(self, item):
        return getattr(connections[DEFAULT_DB_ALIAS], item)

    def __setattr__(self, name, value):
        return setattr(connections[DEFAULT_DB_ALIAS], name, value)

    def __delattr__(self, name):
        return delattr(connections[DEFAULT_DB_ALIAS], name)

    def __eq__(self, other):
        return connections[DEFAULT_DB_ALIAS] == other


connection = DefaultConnectionProxy()

Because DEFAULT_DB_ALIAS='default', we can know from django.db import connectionget isconnections['default']

Thus, in the case of multiple database queries or cursor may be obtained by connecting a specific database connections

from django.db import connections
connections['my_db_alias'].queries
cursor = connections['my_db_alias'].cursor()

The total output of sql execution time

sql_time = 0.0
for q in connections['my_db_alias'].queries:
    sql_time += float(q['time'])
print('sql_time', sql_time)

Third, the wording of various update execution speed

The amount of data to the database 60w

If no update is actually updated, sql execution time will be substantially reduced recorded when the sql execution time is an updated actual data update.

1, using a custom query raw_sql

cursor = connections['my_db_alias'].cursor()
# 实例化cursor的时间不计入
cursor.execute("update item set result=%s, modified_time=Now() where id=%s", (result, 10000))
print(time()-start)
print(connections['my_db_alias'].queries)
# 0.004s左右,与sql执行时间相同

2, using the update method ORM

Item.objects.using('my_db_alias').filter(id=10000).update(result=result)
# 0.008s左右,sql执行时间是0.004s

3, using object.save () method

item = Item.objects.using('my_db_alias').filter(id=10000).first()
item.result = result
item.save(using='my_db_alias')
# 0.012s左右,sql执行时间是0.004s

Thus, the implementation of efficient update raw_sql> update method> save () method

Fourth, the use prefetch_related reduce database query

prefetch_related Query using a separate relationship, i.e., first isolated id meets the filter criteria of Table A, then the look-up table id to B, and two in the batch data associated python.

Suppose we have a blog application, there Blog, Comment two tables, a blog can have multiple associations of comments:

from django.db import models

class Blog(models.Model):
    name = models.CharField(max_length=255)
    author = models.CharField(max_length=100)
    content = models.TextField()

class Comment(models.Model):
    author = models.CharField(max_length=100)
    content = models.TextField()
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE, related_name='comments')

There is now a need to identify the content of comments in all blog called "Django tutorial" of.

With this example we can see how to use prefetch_related reduce database query.

def test_prefetch_related():
    blogs = Blog.objects.filter(name="Django教程")
    for blog in blogs:
        comments = Comment.objects.filter(blog_id=blog.id)
        for comment in comments:
            print(comment.content)
    print(len(blogs)) # 34
    print(len(connection.queries)) # 39

Matching the specified name blog have 34, you can get to see each blog comments, they both checked once a Comment table, a total of 34 times Comment query table, the efficiency is very low. Our goal should be to query a Blog tables, queries, tables, ie a Comment obtain the required data

def test_prefetch_related():
    blogs = Blog.objects.filter(name="Django教程").prefetch_related('comments')
    for blog in blogs:
        for comment in blog.comments.all():
            print(comment.content)
    print(len(blogs)) # 34
    print(len(connection.queries)) # 6
    for query in connection.queries:
        print(query)

The number of sql initiated by 39 reduced to six

concrete:

{'sql': 'SELECT @@SQL_AUTO_IS_NULL', 'time': '0.000'}
{'sql': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED', 'time': '0.000'}
{'sql': 'SELECT VERSION()', 'time': '0.000'}
{'sql': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED', 'time': '0.000'}

# 找到所有符合过滤条件的博客文章
{'sql': "SELECT `blog`.`id`, `blog`.`name`, `blog`.`author`, `blog`.`content`  FROM `blog` WHERE `blog`.`name` = 'Django教程'", 'time': '0.014'}

# 根据上面找到的博客文章id去找到对应的评论
{'sql': 'SELECT `comment`.`id`, `comment`.`author`, `comment`.`content`, `comment`.`blog_id` FROM `comment` WHERE `comment`.`blog_id` IN (5160, 1307, 2984, 5147, 5148, 3062, 5148, 5161, 2038, 1923, 2103, 3014, 1466, 2321, 5166, 5154, 1980, 3550, 3542, 5167, 2077, 2992, 3209, 5168, 8855, 1163, 368, 174, 3180, 5168, 8865, 2641, 3224, 4094)', 'time': '0.007'}

In line with our goal

When prefetch_related cached data will be ignored

Note that, in use QuerySet time, once to change the database request in a chain operation, with prefetch_related before the cached data will be ignored. This causes Django to re-request the database to obtain the corresponding data, causing performance problems. Change request database mentioned here refers to a variety filter (), exclude () will eventually change operation like SQL code.

prefetch_related ( 'comments') represents implicit blog.comments.all (), and therefore all () does not change the final database request, and therefore will not result in re-request database.

however

for comment in blog.comments.filter(author="jack"):

Django database will lead to a re-request

Only need to remove part of the field

The amount of field data content blog post can be very large, rather than out may affect performance. Before demand can be further optimized remove only part of the field in the blog and comments

blogs = Blog.objects.filter(name="Django教程").only('id').\
    prefetch_related(
        Prefetch('comments', queryset=Comment.objects.only('id', 'content', 'blog_id'))
    )

Use only the specified field query, use Prefetch objects customized content prefetch_related query (default queryset=Comment.objects.all())

Note comment.blog_id field is required to be removed, because in the comments python go on to the corresponding blog requires field blog.id comment.blog_id fields match, wasted if not removed in comment.blog_id Prefetch object, splice many database query to find comment.blog_id field

Multi-database

In the case of multiple databases, data sources used in the same prefetch_related main query the specified data source.

such as:

blogs = Blog.objects.using('my_db_alias').filter(name="Django教程").only('id').\
    prefetch_related(
        Prefetch('comments', queryset=Comment.objects.only('id', 'content', 'blog_id'))
    )

Blog will use the same data source table queries Comment

Fifth, insert data into the database when to make use of bulk_create

# 以下代码会发起10次数据库插入:
for i in range(10):
    Comment.objects.create(content=str(i), author="kim", blog_id=1)

# 以下代码只会发起一次数据库插入:
comments = []
for i in range(10):
    comments.append(Comment(content=str(i), author="kim", blog_id=1))
Comment.objects.bulk_create(comments, batch_size=5000)

note:

  1. bulk_create不会返回id:When you bulk insert you don't get the primary keys back

  2. Caution database connection timeout: if one inserts a plethora of data will lead to error Mysql has gone away of. Specifies batch_size = 5000 to avoid this problem, when data is inserted> 5000, performs data will be divided into a plurality of bulk insert sql

Sixth, try not to repeat fetch data

Database data can be saved to id as key to the memory of the dictionary, so the next time it is used again without accessing the database, improve efficiency

Guess you like

Origin www.cnblogs.com/luozx207/p/12163380.html