Advanced correlation model layer

Select the appropriate level of work

To corresponding levelto do things such as corresponding calculation count, the lowest level in the database is the fastest (if only need to know whether this record exists, with (MVC). exists()Will be faster).
But to 注意: queryset is the lazy whether, so sometimes control in higher level (such as templates) in queryset really perform, and could be more efficient.

The following code fragment helps explain the different level of meaning:

# QuerySet operation on the database
# fast, because that's what databases are good at
# 执行效率最快, 属于数据库层级
my_bicycles.count()

# counting Python objects
# slower, because it requires a database query anyway, and processing
# of the Python objects
# 很慢, 属于在Python对象的级别层次处理
len(my_bicycles)

# Django template filter
# slower still, because it will have to count them in Python anyway,
# and because of template language overheads
# 仍然很慢, 模板层的本质还是需要在Python层面上进行数据的处理
{{ my_bicycles|length }}

Understand QuerySet object

slice

QuerySet can support slice syntax, this is equivalent to the SQL limit and offset. But it does not support negative indexes

print(models.Book.objects.all()[:4])
print(models.Book.objects.all()[4:8])

(0.002) SELECT `app01_book`.`id`, `app01_book`.`name`, `app01_book`.`pub_time`, `app01_book`.`publish_id` FROM `app01_book` LIMIT 4; 
(0.000) SELECT `app01_book`.`id`, `app01_book`.`name`, `app01_book`.`pub_time`, `app01_book`.`publish_id` FROM `app01_book` LIMIT 4 OFFSET 4;

Iterable

QuerySet object supports iteration.

for book in models.Book.objects.filter(pk__gt=5):
    print(book)

Inert inquiry

Inert query is a more important feature Queryset object. Look at the following example

q = models.Book.objects.filter(name__startswith='西')
q = q.filter(pk__gt=5)
q = q.filter(pub_time__year=2018)

print(q)

Examples of the above appears to be three times the database query, but in fact only when the printing is executed really query the database. After creating a query set only if we need to get specific data, and then will go to the database orm "request" value to us.

Officially recommended wording:

q = models.Book.objects.filter(
    name__startswith='西'
).filter(
    pk__gt=5
).filter(
    pub_time__year=2018
)

So what to specific timing data after it is? Official documentation describes the following several cases.

Iteration: ie to operate Queryset For loop.
slice: q = models.Book.objects.all()[5:10:2]will immediately go to perform database query When you specify a step size slices.
picling/caching
repr / str
len (Note: if you want to know the length of this queryset result of the case, or at the most efficient level call count database () method, which is the sql in the COUNT ().)
list()
bool()

Caching mechanism

Each set contains a query cache to minimize requests to the database, to fully understand caching mechanism can help us to write efficient code.

When we create a new query set, once the seven kinds of situations described above occur, will be in the database after the request, may generate Cache (stored in the query set target) , then do the same operation on the inquiry is not set re going to ask the database to obtain the data.

You can take a look at the following results

# 第一种方式
print([p.name for p in models.Publish.objects.all()])
print([p.addr for p in models.Publish.objects.all()])

# 第二种方式
q = models.Publish.objects.all()
print([p.name for p in q])
print([p.addr for p in q])

The first way is to request the actual database twice, after QuerySet object generated directly abandoned, no access to the cache mechanism.

The second approach has requested only one database, after the first pass QuerySet, the results will be cached, the next is on the same QuerySet object to operate on a level Python.

Cache situation will happen

[entry for entry in queryset]  # 遍历整个查询集
bool(queryset)                # 做布尔值运算
entry in queryset             # in运算
list(queryset)                # 转换成列表

Note particularly that the cache is not going to happen.

q = models.Publish.objects.all()

print(q[2:])  # 做切片操作, 这里会查询数据库, 但不会将结果缓存到原来的查询集中
print(q[2])   # 做索引操作, 也会查询数据库, 也不会将结果缓存.

print(q)      # 这里单纯的打印不会发生缓存
print(q)

# values, values_list都不会发生缓存. 下面也会发生
print(q.values('name', 'addr'))
print(q.values('addr'))

Query Optimization

Several official optimization strategy

Using queryset lazythe codes to optimize the properties, reduce the number of possible connections to the database.
Isolated queryset if only once, you can use the iterator () used to take up to prevent too much memory,
As some database-level work into the database, such as the use of filter / exclude, F, annotate, aggregate (can be understood as groupby)
One out of all the data you want, you do not pick up those data.
Meaning is to be clever with select_related (), prefetch_related () and values_list (), values (), for example, if only id field, then use values_list ( 'id', flat = True) can save a lot of resources. Or use defer()and only()methods: Do not load a field (use this method we should reflect on the problem of table design) / load only certain fields.
If you do not select_related, then pick up the foreign key attributes of the data will not even go look.
Bulk (batch) operation to the data, such as bulk_create
When looking for a piece of data, try to use indexed fields to query, O (1) or O (log n) and O (n) the difference is still very large
Used count()instead of len(queryset)using exists()instead ofif queryset:

Here again a detailed summary of which several optimizations

select_related

For one field (OneToOneField) and foreign key fields (ForeignKey), you may be used to optimize the select_related QuerySet.

select_related a return QuerySet, when it performs its query associated with the query data object along foreign key relationships. It will generate a complex query and cause loss of performance, but at a later time using foreign key relationships will not require database queries.

Briefly, after use select_related of QuerySet () function, Django will obtain the corresponding foreign key corresponding object, so when no longer needed after the database query.

Here is the difference between it and the common queries

# 普通查询
book = models.Book.objects.filter(pk=2).first()  # type: models.Book
print(book.publish.name)

SELECT
    `app01_book`.`id`,
    `app01_book`.`name`,
    `app01_book`.`pub_time`,
    `app01_book`.`publish_id` 
FROM
    `app01_book` 
WHERE
    `app01_book`.`id` = 2 
ORDER BY
    `app01_book`.`id` ASC 
    LIMIT 1;
    
SELECT
    `app01_publish`.`id`,
    `app01_publish`.`name`,
    `app01_publish`.`addr`,
    `app01_publish`.`pub_detail_id` 
FROM
    `app01_publish` 
WHERE
    `app01_publish`.`id` = 2;

The above query performed a total of two sql statement.

The comparison is performed using the query efficiency select_related method.

books = models.Book.objects.filter(pk__lt=4).select_related('publish')
    
for book in books:
    print(book.publish.name)

SELECT
    `app01_book`.`id`,
    `app01_book`.`name`,
    `app01_book`.`pub_time`,
    `app01_book`.`publish_id`,
    `app01_publish`.`id`,
    `app01_publish`.`name`,
    `app01_publish`.`addr`,
    `app01_publish`.`pub_detail_id` 
FROM
    `app01_book`
    INNER JOIN `app01_publish` ON ( `app01_book`.`publish_id` = `app01_publish`.`id` ) 
WHERE
    `app01_book`.`id` < 4;

Due to the use select_related advance the field associated with the back of the cross-table query did not continue to operate the database.

select_related also supports connecting multiple foreign keys, you can always go through a related foreign key field. Here is across the three tables

books = models.Book.objects.filter(pk__lt=3).select_related('publish__pub_detail')

for book in books:
    print(book.publish.pub_detail.email)

summary:

select_related major needle-one and many-to optimize the relationship.
select_related use of SQL JOIN statement optimization, be optimized by reducing the number of SQL queries and improve performance.
You can specify the field names need select_related by variable-length parameters. You can also "__" Connecting to achieve the specified field names recursive queries by using a double underline.
No specified field will not be cached, if you want to access, then Django will conduct SQL query again.

For many to many fields (ManyToManyField) and many fields, you may be used prefetch_related () to optimize.

prefetch_related

prefetch_related () and select_related () is designed very similar, is to reduce the number of SQL queries, but not the same way to achieve. The latter is by JOIN statement, solve the problem within SQL queries. But for many relationships, use the SQL statement to address on a bit unwise, because JOIN resulting table will be very long, it will lead to increased memory footprint and increase the running time of SQL statements. If n objects, each object corresponding to a field-many Mi article, [Sigma result table is generated (n) Mi row.

prefetch_related () The workaround is to query each table separately, and then deal with their relationship with Python.

# 只查询了两次数据库
books = models.Book.objects.prefetch_related('authors')
for book in books:
    print(book.authors.all())

defer与only

only(*field): Returns an object, only the field properties in brackets do query optimization

defer(*field): Returns an object on the field properties outside the parentheses is optimized, and only opposite

The above can still get outside the field properties to optimize, but need to query the database to obtain.

books = models.Book.objects.only('name', 'pk')
books2 = models.Book.objects.values('name', 'pk')
books3 = models.Book.objects.defer('id')
print(books)
print(books2)
print(books3)

SELECT `app01_book`.`id`, `app01_book`.`name` FROM `app01_book` LIMIT 21;
SELECT `app01_book`.`name`, `app01_book`.`id` FROM `app01_book` LIMIT 21;
SELECT `app01_book`.`id`, `app01_book`.`name`, `app01_book`.`pub_time`, `app01_book`.`publish_id` FROM `app01_book` LIMIT 21;

From the above point of view you can execute sql statement, and values are only carried out the same, but only returns a list of sets of objects, and sets of values is a list of dictionary form. Defer principle and only, is contrary to the only query data. so if you only need to use very little data, but also need a form of an object, you can use the above two methods.

only, defer not be optimized across the table like this below, there are many books, how many times you need to perform database, the efficiency is very low.

books = models.Book.objects.only('pk', 'publish')
for book in books:
    print(book.pk, book.publish.name)

Transaction Optimization

Transactional operations not only to ensure safety, there is a great role data is used, can be isolated by Django's default autocommit transactions to avoid Django submit data to the database frequently. This can be a good boost performance.

Open transactions in Django syntax is very simple.

from django.db import transaction
with transaction.atomic():
    pass

Batch operations

There are many ways in QuerySet bulk operations, such as delete update bulk_create...

These operations correspond to database level bulk batch operation, the batch is possible to effectively prevent the frequent requests to the database.

details = [models.PublishDetail(email=f'email{i}') for i in range(5)]
for d in details:
    d.save()

# 批量操作
models.PublishDetail.objects.bulk_create(details)

Above for 5 cycles, database requests needed to 5 times, and bulk_create only need to request a database. The more data, the more obvious gaps in efficiency.