Django Admin massive data optimization techniques.

Reprinted from the product is slightly Library  http://www.pinlue.com/article/2020/04/0211/3710102817445.html

Foreword

I recently helped a friend to deal with some kind of stock data, using the Web as the Django framework.

Django's Admin module is one of my favorite Django better than Flask important reason.

Small projects, if it is to their own use, not so particular about the Admin interface, use Django Admin can write less code under quite a bit of short-handed.

I would like to sync some data for stocks for five minutes and write a simple Django Admin pages, solve two minor problems, he found himself did not have time to update a column, a water quickly, then people wish to give some optimization reference on the idea

0x01 things began to change from

When the amount of data increased to fifty million to one hundred million, I opened the Django Admin corresponding page you want to see some data. Open the page took about half a minute, this page speed can be quite slow.

Need to explain to my ModelAdmin is so written

@admin.register(Stock5Min)class Stock5MinAdmin(ReadOnlyAdminMixin, admin.ModelAdmin): list_display = ( "stock_name", "code", "datetime", "date", "open", "high", "low", "close", ) def stock_name(self, instance): return instance.stock.name

Well, start positioning problem

Open the Developer Tools returns a response from the judge on content, it should not be too large or html JS infinite loop / memory leak. Excluded are the tip of the problem.

Install django-debug-tools positioning of monitoring.

SQL can be seen from the Panel should basically stuck in the SQL above to enter the page to view details

There are two problems:

Question 1: count of massive data

Red and blue two sql statement is particularly evident in the two hard bone, and found that after deployment, execution is the count

count every time you need a full table scan, of course slightly slow, slow is a problem, even more embarrassing is performed twice

Problems 2: n + 1 issues

Th number query 105, is not duplicated Similar, this is the performance of the standard ORM n + 1.

0x02 solve the problem

Well, to start

count of massive data

Expand the relevant code from the stack information in django-debug-tools

Based Pathfinder code and found there are two areas need to optimize count

# odin-py3.7/lib/python3.7/site-packages/django/contrib/admin/views/main.pyclass ChangeList: def get_results(self, request): paginator = self.model_admin.get_paginator(request, self.queryset, self.list_per_page) result_count = paginator.count # 这里是 count1 优化点 # Get the total number of objects, with no admin filters applied. if self.model_admin.show_full_result_count: full_result_count = self.root_queryset.count() # 这里是 count2 优化点 else: full_result_count = None

Count2 seem easier comparison, as follows ModelAdmin add, skip count2

show_full_result_count = False

Re-optimization count1, as long as the number of changes to make paginator ideal is enough, because the number is close to 100 million, so in ModelAdmin specified in the following paginator just fine

class LargeTablePaginator(Paginator): def _get_count(self): return 100000000 count = property(_get_count)

Ever since, it would have required 40s + pages, are only 6s

This time you jump out clever

This optimization is mentally retarded, how can such a specified number of count. This is not a problem escape it ...

But avoiding shameful, but helpful.

just kidding

Witty writer in fact already know what you think,

I will refrain, go to solve the problems left over 6s come back later.

Solve the problem of N + 1

And filled with a similar repeat queries, N + 1 Bacheng this problem, i.e.,

instance.stock.name when it will be time to take stock database, which resulted in a number of hit database, query the database every hit, although time is running out, but the frequent conversation itself is a waste

N + 1 problem is nothing more than kinds of solutions

django built selectrelated achieve leftjoin

django built prefetchrelated stock taken in advance in order to achieve the objective of reducing the number of hit database

Rolled official documents, found admin.ModelAdmin in support of the first option, then

list_select_related = ["stock"]

Ever since, would have required 6 s page, the page is now open just less than 1s, time database in less than 200ms

0x03 four kinds of quick count program

Well, then pending before that we begin to solve the problem

And think about the number count is really what is particularly important?

In fact, the exact number is not particularly needed, in other words, if the number is now one hundred million, three minutes after I

This table will be a number of $ 201 300, when it was still one hundred million wooden problem.

Obviously, in this scenario, no problem at all.

So Scheme 1, that is, before the program is actually good.

Scenario 1: is the most simple and direct way to estimate a number of human flesh

def _get_count(self): return 100000000

If you say, I want a little real data point, what can

Scheme 2: a timing count value buffer

Option 2 would be more appropriate then some of the

def _get_count(self): key = "stock5min" count = cache.get(key) if not count: count = do_count() cache.set(key, count, 30 * 60) # 每三小时刷一次 return count

If you say, I do not want to use the cache redis like, but I have to be relatively close to the real amount of code

, Or, like mysql or table postges is not any metadata can give me to read, get a general

Number of scenarios thing?

Yes, Option 3

Scheme 3: Meta read value table

With PG, for example, we can provide program three practices

pgclass

def _get_count(self): if getattr(self, "_count", None) is not None: return self._count query = self.object_list.query if not query.where: # 如果走全表 count try: cursor = connection.cursor() cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", [query.model._meta.db_table]) self._count = int(cursor.fetchone()[0]) except: self._count = super()._get_count() else: self._count = super()._get_count() return self._count

What other program it? Of course, Option IV.

Program 4: count specified timeout

If the count exceeds the execution time of 200ms, default to a number.

def _get_count(self): with transaction.atomic(), connection.cursor() as cursor: cursor.execute("SET LOCAL statement_timeout TO 200;") try: return super().count except OperationalError: return 100000000

Page now stable at around opening time 1s, optimization is complete, completion

 

Published 60 original articles · won praise 58 · Views 140,000 +

Guess you like

Origin blog.csdn.net/yihuliunian/article/details/105340917