Reprinted from the product is slightly Library http://www.pinlue.com/article/2020/04/0211/3710102817445.html
Foreword
I recently helped a friend to deal with some kind of stock data, using the Web as the Django framework.
Django's Admin module is one of my favorite Django better than Flask important reason.
Small projects, if it is to their own use, not so particular about the Admin interface, use Django Admin can write less code under quite a bit of short-handed.
I would like to sync some data for stocks for five minutes and write a simple Django Admin pages, solve two minor problems, he found himself did not have time to update a column, a water quickly, then people wish to give some optimization reference on the idea
0x01 things began to change from
When the amount of data increased to fifty million to one hundred million, I opened the Django Admin corresponding page you want to see some data. Open the page took about half a minute, this page speed can be quite slow.
Need to explain to my ModelAdmin is so written
@admin.register(Stock5Min)class Stock5MinAdmin(ReadOnlyAdminMixin, admin.ModelAdmin): list_display = ( "stock_name", "code", "datetime", "date", "open", "high", "low", "close", ) def stock_name(self, instance): return instance.stock.name
Well, start positioning problem
Open the Developer Tools returns a response from the judge on content, it should not be too large or html JS infinite loop / memory leak. Excluded are the tip of the problem.
Install django-debug-tools positioning of monitoring.
SQL can be seen from the Panel should basically stuck in the SQL above to enter the page to view details
There are two problems:
Question 1: count of massive data
Red and blue two sql statement is particularly evident in the two hard bone, and found that after deployment, execution is the count
count every time you need a full table scan, of course slightly slow, slow is a problem, even more embarrassing is performed twice
Problems 2: n + 1 issues
Th number query 105, is not duplicated Similar, this is the performance of the standard ORM n + 1.
0x02 solve the problem
Well, to start
count of massive data
Expand the relevant code from the stack information in django-debug-tools
Based Pathfinder code and found there are two areas need to optimize count
# odin-py3.7/lib/python3.7/site-packages/django/contrib/admin/views/main.pyclass ChangeList: def get_results(self, request): paginator = self.model_admin.get_paginator(request, self.queryset, self.list_per_page) result_count = paginator.count # 这里是 count1 优化点 # Get the total number of objects, with no admin filters applied. if self.model_admin.show_full_result_count: full_result_count = self.root_queryset.count() # 这里是 count2 优化点 else: full_result_count = None
Count2 seem easier comparison, as follows ModelAdmin add, skip count2
show_full_result_count = False
Re-optimization count1, as long as the number of changes to make paginator ideal is enough, because the number is close to 100 million, so in ModelAdmin specified in the following paginator just fine
class LargeTablePaginator(Paginator): def _get_count(self): return 100000000 count = property(_get_count)
Ever since, it would have required 40s + pages, are only 6s
This time you jump out clever
This optimization is mentally retarded, how can such a specified number of count. This is not a problem escape it ...
But avoiding shameful, but helpful.
just kidding
Witty writer in fact already know what you think,
I will refrain, go to solve the problems left over 6s come back later.
Solve the problem of N + 1
And filled with a similar repeat queries, N + 1 Bacheng this problem, i.e.,
instance.stock.name when it will be time to take stock database, which resulted in a number of hit database, query the database every hit, although time is running out, but the frequent conversation itself is a waste
N + 1 problem is nothing more than kinds of solutions
django built selectrelated achieve leftjoin
django built prefetchrelated stock taken in advance in order to achieve the objective of reducing the number of hit database
Rolled official documents, found admin.ModelAdmin in support of the first option, then
list_select_related = ["stock"]
Ever since, would have required 6 s page, the page is now open just less than 1s, time database in less than 200ms
0x03 four kinds of quick count program
Well, then pending before that we begin to solve the problem
And think about the number count is really what is particularly important?
In fact, the exact number is not particularly needed, in other words, if the number is now one hundred million, three minutes after I
This table will be a number of $ 201 300, when it was still one hundred million wooden problem.
Obviously, in this scenario, no problem at all.
So Scheme 1, that is, before the program is actually good.
Scenario 1: is the most simple and direct way to estimate a number of human flesh
def _get_count(self): return 100000000
If you say, I want a little real data point, what can
Scheme 2: a timing count value buffer
Option 2 would be more appropriate then some of the
def _get_count(self): key = "stock5min" count = cache.get(key) if not count: count = do_count() cache.set(key, count, 30 * 60) # 每三小时刷一次 return count
If you say, I do not want to use the cache redis like, but I have to be relatively close to the real amount of code
, Or, like mysql or table postges is not any metadata can give me to read, get a general
Number of scenarios thing?
Yes, Option 3
Scheme 3: Meta read value table
With PG, for example, we can provide program three practices
pgclass
def _get_count(self): if getattr(self, "_count", None) is not None: return self._count query = self.object_list.query if not query.where: # 如果走全表 count try: cursor = connection.cursor() cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", [query.model._meta.db_table]) self._count = int(cursor.fetchone()[0]) except: self._count = super()._get_count() else: self._count = super()._get_count() return self._count
What other program it? Of course, Option IV.
Program 4: count specified timeout
If the count exceeds the execution time of 200ms, default to a number.
def _get_count(self): with transaction.atomic(), connection.cursor() as cursor: cursor.execute("SET LOCAL statement_timeout TO 200;") try: return super().count except OperationalError: return 100000000
Page now stable at around opening time 1s, optimization is complete, completion