Two suggestions write cycles Python | goose combat plant

Author: Tencent Technology Project
link: https: //zhuanlan.zhihu.com/p/68128557
Source: know almost
copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Loop is a common program control structure. We often say that one of the biggest advantages compared to the human machine, that is, the machine can be repeated around the clock to do something, but people are not. ** The "loop" **, is the key concept enables the machine repeated work.

In grammar cycle, Python performance tradition that is not traditional. While it abandoned the common for (init; condition; incrment)three-stage structure, but chose forand whiletwo classic keywords to express cycle. In most cases, our demand cycle can be used for <item> in <iterable>to meet, while <condition>and even less with some contrast.

Although circulation syntax is very simple, but it really is not easy to write. In this article, we will explore what is "typical" of the loop code, and how to write them.


What is the "authentic" cycle?

"Authentic" is the word often used to describe someone to do something, it is consistent with local tradition, do very well. Figuratively, you go to a gathering of friends, the same table there was a Cantonese, the other an opening, every sentence is the standard Beijing accent, perfect child of sound. Then you can say to her: "Your Beijing as saying the true authentic ."

Since the "authentic" is the word to describe the accent often, this type of cooking taste the real thing, that the "typical" of the loop code and what does it mean? Let me get a classic example to explain.

If you ask a person just learning Python month: " How to get the current index while traversing a list? ." He may hand over this code:

index = 0
for name in names:
    print(index, name)
    index += 1

Although the loop above is true, but it does that is not "authentic." Python has a three-year experience in the development of people would say, the code would look like this:

for i, name in enumerate(names):
    print(i, name)

enumerate()Python is a built-in function, it receives a "may be iterative" object as a parameter, and then returns a continuously generate (当前下标, 当前元素)new object can iteration. This scenario uses it most appropriate.

So, in the example above, we think that the first paragraph of the second paragraph of the loop code than more "authentic." Because it uses a more intuitive code, smarter completed its work.

enumerate () programming ideas represented

However, the code is determined whether a certain cycle authentic, and not just to or without the knowledge of a built-in method as a standard. We can dig out something deeper from the above example.

As you can see, Python's forcirculation is only for <item> in <iterable>this kind of structure, and the structure in the first half - assigned to Item - not too many tricks playable. So the second half of the iterable is the only thing we can make a big fuss of things. And in order to enumerate()function as the representative of * "modified function" *, just offer a new idea: to optimize the loop itself by modifying iterables.

This brings me to my first suggestion.


Recommendation 1: Use the function to be modified to optimize the loop iteration object

Use iterables modification function processing, it can affect the loop code in various aspects. And to find a suitable example to demonstrate this method does not take up too far, the built-in module itertools is a perfect example.

Briefly, itertools function is a set of tools comprising a number of iterations may be oriented object. I was in the previous series of articles "doorway containers" mentioned in it.

If you want to learn itertools, then the official Python documentation is your choice, there are very detailed modules relevant information. But in this article, the focus will be slightly different and official documents. I'll pass some common code scenario to explain in detail how it is to improve the circulation of the code.

1. flatten nested loop product

Although we all know that * "flat nested code than good" *. But sometimes the demand for certain, it seems got to write nested loop job. For example, the following passage:

def find_twelve(num_list1, num_list2, num_list3):
    """从 3 个数字列表中,寻找是否存在和为 12 的 3 个数
    """
    for num1 in num_list1:
        for num2 in num_list2:
            for num3 in num_list3:
                if num1 + num2 + num3 == 12:
                    return num1, num2, num3

This need for nested loop code to traverse a plurality of multilayer objects, we can use the Product () function to optimize it. product()Receiving a plurality of objects may be iterative, and then generates a result based on their product continuously Cartesian.

from itertools import product


def find_twelve_v2(num_list1, num_list2, num_list3):
    for num1, num2, num3 in product(num_list1, num_list2, num_list3):
        if num1 + num2 + num3 == 12:
            return num1, num2, num3

Compared to the previous code that uses product()function only used for one cycle to complete the task, the code becomes more refined.

2. islice inner loop to achieve interlacing

External data file included have a Reddit post title, which the content format is this:

python-guide: Python best practices guidebook, written for humans.
---
Python 2 Death Clock
---
Run any Python Script with an Alexa Voice Command
---
<... ...>

Probably for aesthetic, between this document every two titles, there is a "---"separator. Now, we need to get all the documents in a list of titles, so they traverse the contents of the file, you must skip these meaningless separator.

Reference prior to enumerate()understanding the function, we can add in a cycle period based on the current cycle number of ifjudgment to do this:

def parse_titles(filename):
    """从隔行数据文件中读取 reddit 主题名称
    """
    with open(filename, 'r') as fp:
        for i, line in enumerate(fp):
            # 跳过无意义的 '---' 分隔符
            if i % 2 == 0:
                yield line.strip()

But the demand for this type of interlaced processed in the cycle, if using itertools in the islice () function is modified objects circulation, can make the loop code more simple and direct.

islice(seq, start, end, step)Function and operation of the array slice * (list [start: stop: step]) has almost exactly the same parameters. If interlacing is required inside the loop, as long as the progress of delivery is provided a third step size parameter value to 2 (default 1) *.

from itertools import islice

def parse_titles_v2(filename):
    with open(filename, 'r') as fp:
        # 设置 step=2,跳过无意义的 '---' 分隔符
        for line in islice(fp, 0, None, 2):
            yield line.strip()

3. Use takewhile alternative break statement

Sometimes, we need at the start of each cycle, the cycle needs to determine whether a premature end. For example, the following:

for user in users:
    # 当第一个不合格的用户出现后,不再进行后面的处理
    if not is_qualified(user):
        break

    # 进行处理 ... ...

For the circulation of such interruption in advance, we can use takewhile () function to simplify it. takewhile(predicate, iterable)Iteration will iterablecontinue to use the current object as a parameter in the process of calling predicatea function test and returns the result, if the function returns true, the current object is generated, the cycle continues. Otherwise, immediately interrupt the current cycle.

Use takewhilethis code sample:

from itertools import takewhile

for user in takewhile(is_qualified, users):
    # 进行处理 ... ...

itertools there is also other interesting utility functions, and they can be used with the use cycle, the use of such flat double chain function nested loops, using a function zip_longest while circulating the plurality of objects and the like.

Limited space, here I am not going to introduce. If you are interested, you can go to the official documentation to learn more about themselves.

4. Use the generator to write your own modification function

In addition to those functions itertools provided, we can very easily use the builder to define your own cycle modification function.

Let's take a simple function, for example:

def sum_even_only(numbers):
    """对 numbers 里面所有的偶数求和"""
    result = 0
    for num in numbers:
        if num % 2 == 0:
            result += num
    return result

In the above function, the body of the loop in order to filter out all odd, the introduction of an additional ifjudgment statement. To simplify the loop contents, we can define a special function generator even number of filter:

def even_only(numbers):
    for num in numbers:
        if num % 2 == 0:
            yield num


def sum_even_only_v2(numbers):
    """对 numbers 里面所有的偶数求和"""
    result = 0
    for num in even_only(numbers):
        result += num
    return result

The numbersvariable using even_onlythe decorative function, sum_even_only_v2internal functions will not have to continue to focus on "even filtering" logic, and simply sums to complete.

Hint: Of course, the above function is not really practical. In the real world, this simple demand with the most suitable for direct generator / get a list of expressions: sum(num for num in numbers if num % 2 == 0)

Proposal 2: dismantling by function code block cyclic complex in vivo

I have always felt that cycle is a more wonderful thing, every time you write a new loop code block, as if opened up a black magic, all the contents within the array will begin endless repeated.

But I also found this piece of black magic in addition to benefits, it will also continue to lure you Jinnai squeezed more and more code, including filtering out invalid elements, pre-processing the data, print log and so on. Some had not even belong to the same abstract content, it will be stuffed into the same piece of black magic.

You might think this is all a matter of course, we are in urgent need buffs array. If you do not put a lot of logic to the stuffed body of the loop, which can put them go?

Let's take a look at the following business scenario. In the website, there is one every 30 days once the implementation period of the script, its task is to query the past 30 days, at certain period of time every weekend to log off the user, and then send their bonus points.

code show as below:

import time
import datetime


def award_active_users_in_last_30days():
    """获取所有在过去 30 天周末晚上 8 点到 10 点登录过的用户,为其发送奖励积分
    """
    days = 30
    for days_delta in range(days):
        dt = datetime.date.today() - datetime.timedelta(days=days_delta)
        # 5: Saturday, 6: Sunday
        if dt.weekday() not in (5, 6):
            continue

        time_start = datetime.datetime(dt.year, dt.month, dt.day, 20, 0)
        time_end = datetime.datetime(dt.year, dt.month, dt.day, 23, 0)

        # 转换为 unix 时间戳,之后的 ORM 查询需要
        ts_start = time.mktime(time_start.timetuple())
        ts_end = time.mktime(time_end.timetuple())

        # 查询用户并挨个发送 1000 奖励积分
        for record in LoginRecord.filter_by_range(ts_start, ts_end):
            # 这里可以添加复杂逻辑
            send_awarding_points(record.user_id, 1000) 

The above function is mainly composed of two cycles. Responsibilities of the outer loop, mainly acquisition time to meet the requirements of the last 30 days, and convert it to UNIX timestamp. After using these two time stamps from the integrating inner loop transmission.

As said before, the plug being opened up in the outer loop black magic brim. But through the observation, we can find the entire body of the loop is actually made up of two completely unrelated tasks consisting of: "Select a date and time stamp ready" and "Send bonus points" .


How to cope with the new demands of complex loop

This code does any harm? let me tell you.

One day, looking over the product that some users do not sleep at night on weekends, still brush our website, we have to give them notice to let them go to bed early after. So the new demands emerged: "Over the past 30 days to a user logged on weekends 3:00 to 5:00 send a notification" .

The new problem has cropped up. Sharp as you, certainly one can find that this new demand screened Part of the user, and before the demand is very, very similar. However, if you then open that group before the loop body look, you will find that simply can not reuse the code, because the inner loop, completely different logic coupled together. ☹️

In the computer world, we often use ** "coupled" ** the word to represent relationships between things. In the above example, * "picking time" and "Send integral" * These two things living in the same loop body, established a very strong coupling relationship.

In order to better code reuse, we need to function in the * "picking time" * section decoupled from the body of the loop. And our old friend, ** "generator function" ** is the best option to carry out this work.

Using the generator function decoupled loop

Should "choose Time" section decoupled from the inner loop out, we need to define a new generator function gen_weekend_ts_ranges(), specifically designed to generate the required UNIX timestamp:

def gen_weekend_ts_ranges(days_ago, hour_start, hour_end):
    """生成过去一段时间内周六日特定时间段范围,并以 UNIX 时间戳返回
    """
    for days_delta in range(days_ago):
        dt = datetime.date.today() - datetime.timedelta(days=days_delta)
        # 5: Saturday, 6: Sunday
        if dt.weekday() not in (5, 6):
            continue

        time_start = datetime.datetime(dt.year, dt.month, dt.day, hour_start, 0)
        time_end = datetime.datetime(dt.year, dt.month, dt.day, hour_end, 0)

        # 转换为 unix 时间戳,之后的 ORM 查询需要
        ts_start = time.mktime(time_start.timetuple())
        ts_end = time.mktime(time_end.timetuple())
        yield ts_start, ts_end

With this generator function, the old demand for "Send bonus points" and new demands "Send Notification", you can reuse it in the body of the loop to complete the task:

def award_active_users_in_last_30days_v2():
    """发送奖励积分"""
    for ts_start, ts_end in gen_weekend_ts_ranges(30, hour_start=20, hour_end=23):
        for record in LoginRecord.filter_by_range(ts_start, ts_end):
            send_awarding_points(record.user_id, 1000)


def notify_nonsleep_users_in_last_30days():
    """发送通知"""
    for ts_start, ts_end in gen_weekend_ts_range(30, hour_start=3, hour_end=6):
        for record in LoginRecord.filter_by_range(ts_start, ts_end):
            notify_user(record.user_id, 'You should sleep more')

to sum up

In this article, we first briefly explain the definition of recycling code "authentic." Then he made the first suggestion: use a modified function to improve circulation. After virtual me a business scenarios, describing the importance of dismantling duty cycle within the press code.

Summarize some key points:

  • Function is modified using the cyclic object itself, can be improved loop body code
  • There are many tools itertools function can be used to improve circulation
  • Using the generator function can be easily modified to define your own functions
  • Inside the loop, is a highly "code bloat" site occurs
  • Use the generator function of the duty cycle of different code blocks decoupling, better flexibility

Tencent Technology Engineering came to know almost friends. This number is based on computer technology and Internet-related topic areas, especially in frontier exploration, with the aim to provide professional technical enthusiasts, insightful technical topics, will deliver the latest technical articles goose factory to the Friends of Scouting, as well as researchers participate in discussions and provide an open platform, build a technology ecosystem.

Tencent Technology Engineering the future will also invite the majority of goose factory technicians, as we think-tank for the Friends of Scouting answer technical work in the confusion, but also please answer a lot of points for us to carefully prepare a "Like", let us common share and encourage progress. Have any suggestions, please private letter us!

More technical dry Please sustained attention "Tencent Technology and Engineering" know almost numbers, and subscribe to our column "Tencent technology" .

Guess you like

Origin www.cnblogs.com/focus-z/p/10990175.html