Writing high-quality Python program (four) library

This series of articles is a summary of the essence of "Writing high-quality code-91 suggestions for improving Python programs".

Select demand sort()orsorted()

Python functions are commonly used in sorting sort()andsorted()

The functional forms of the two are as follows:

sorted(iterable[, cmp[, key[, reverse]]])
s.sort([cmp[, key[, reverse]]])

sort()And sorted()there are three common parameters:

  • cmp: Any user-defined comparison function , the parameters of the function are two comparable elements (from iterable or list), the function returns -1, 0 or +1 (the first in accordance with the relationship between the first parameter and the second parameter) If a parameter is less than the second parameter, a negative number is returned). The default value of this parameter None.
  • keyIs a function with parameters, used to extract the comparison value for each element , the default is None(that is, compare each element directly)
  • reverseIndicates whether the sorting result is reversed

Comparison of the two:

  • sorted()Acting on any object can be iterative ; and sort()generally act on the list .

  • sorted()Function list returns an ordered after the original list remains the same ; and sort()function will directly modify the original list , the function returns None. The actual application process required to keep the original list, use the sorted()function is more appropriate, or you can select sort()a function, because sort()the function does not need to copy the original list, less memory consumption, higher efficiency.

  • Whether sort()or sorted()function, passing parameters keyparameters than the incoming cmphigher efficiency. cmpIncoming function throughout the ordering process will be called multiple times, large overhead functions; and keyonly do a deal for each element, so using key than using cmphigher efficiency.

  • sorted()The function is very powerful, it can sort different data structures to meet different needs.

Example:

Sort the dictionary :

>>> phone_book = {"Linda": "7750", "Bob": "9345", "Carol": "5834"}
>>> from operator import itemgetter
>>> sorted_pb = sorted(phone_book.items(), key=itemgetter(1))
>>> print(sorted_pb)
[('Carol', '5834'), ('Linda', '7750'), ('Bob', '9345')]

Multidimensional List Sort : actual case will run into the situation needs to be sorted in a number of fields, it is very easy to do with SQL statements in DB inside, but using multidimensional list of joint sorted()functions can also be easily reached

>>> import operator
>>> game_result = [["Bob",95,"A"],["Alan",86,"C"],["Mandy",82.5,"A"],["Rob",86,"E"]]
>>> sorted(game_result, key=operator.itemgetter(2, 1))
[['Mandy', 82.5, 'A'], ['Bob', 95, 'A'], ['Alan', 86, 'C'], ['Rob', 86, 'E']]

Mixed List sorting in the dictionary: the key or value in the dictionary is a list, and the elements in a certain position in the list are sorted

>>> my_dict = {"Li":["M",7],"Zhang":["E",2],"Wang":["P",3],"Du":["C",2],"Ma":["C",9],"Zhe":["H",7]}
>>> import operator
>>> sorted(my_dict.items(), key=lambda item:operator.itemgetter(1)(item[1]))
[('Du', ['C', 2]), ('Zhang', ['E', 2]), ('Wang', ['P', 3]), ('Zhe', ['H', 7]), ('Li', ['M', 7]), ('Ma', ['C', 9])]

Mixed dictionary sorting in List: Each element in the list is in the form of a dictionary, sorting for multiple key values ​​of the dictionary

>>> import operator
>>> game_result = [{"name":"Bob","wins":10,"losses":3,"rating":75},{"name":"David","wins":3,"losses":5,"rating":57},{"name":"Carol","wins":4,"losses":5,"rating":57},{"name":"Patty","wins":9,"losses":3,"rating":71.48}]
>>> sorted(game_result, key=operator.itemgetter("rating","name"))
[{'losses': 5, 'name': 'Carol', 'rating': 57, 'wins': 4}, {'losses': 5, 'name': 'David', 'rating': 57, 'wins': 3}, {'losses': 3, 'name': 'Patty', 'rating': 71.48, 'wins': 9}, {'losses': 3, 'name': 'Bob', 'rating': 75, 'wins': 10}]

Use the copy module to copy objects deep

  • Shallow copy : Construct a new compound object and insert references found in the original object into the object. Shallow copy of a variety of implementations, such as plant function, slicing, Copy module copyoperation.
  • Deep copy : A new compound object is also constructed, but when it encounters a reference, it will continue to recursively copy the specific content pointed to, that is to say, it will continue to copy the object pointed to by the reference, so the resulting object is not Influenced by other reference object operations. It relies to achieve deep copy of copy module deepcopy()operation.

Shallow copy does not make a complete copy. When there are immutable objects such as lists and dictionaries, it only copies its reference address . To solve the above problems, deep copy is needed. Deep copy not only copies the reference but also the object pointed to by the reference, so the object obtained by the deep copy and the original object are independent of each other.

Use Counter for counting statistics

Counting statistics is to count the number of occurrences of an item. It can be implemented using different data structures:

  • For example, the use of defaultdictrealization
from collections import defaultdict
some_data = ["a", "2", 2, 4, 5, "2", "b", 4, 7, "a", 5, "d", "a", "z"]
count_frq = defaultdict(int)
for item in some_data:
    count_frq[item] += 1
print(count_frq)
# defaultdict(<class 'int'>, {'a': 3, '2': 2, 2: 1, 4: 2, 5: 2, 'b': 1, 7: 1, 'd': 1, 'z': 1})

But a more elegant and Pythonic solution is to usecollections.Counter :

from collections import Counter
some_data = ["a", "2", 2, 4, 5, "2", "b", 4, 7, "a", 5, "d", "z", "a"]
print(Counter(some_data))
# Counter({'a': 3, '2': 2, 4: 2, 5: 2, 2: 1, 'b': 1, 7: 1, 'd': 1, 'z': 1})

Deep understanding of ConfigParser

Common configuration file formats are XML, ini, etc. Among them, on MS Windows systems, the ini file format is especially used, and even the API of the operating system also provides related interface functions to support it. The file format similar to ini is also very commonly used in operating systems such as Linux. For example, the configuration file of pylint is this format. Python has a standard library to support it, which is ConfigParser.

The basic usage of ConfigParser can be mastered through the manual , but there are still a few knowledge points worth noting. The first is getboolean()this function. getboolean()According to certain rules, the value of the configuration item is converted into a Boolean value, such as the following configuration:

[section1]
option1=0

When you call getboolean("section1", "option1"), it will return False.

getboolean()The true value rule : In addition to 0, no, false, and off will be escaped as False, and the corresponding 1, yes, true, and on it are escaped True, the other value will result in an ValueErrorexception.

It should also be noted that the search rules for configuration items. First, ConfigParser supported configuration file format, there is a [DEFAULT]festival, when reading the configuration items are not specified in the section, it will be to ConfigParser [DEFAULT]section to find.
In addition, there are some mechanisms cause the project to look for more complex configuration items, this is the class ConfigParser constructor parameter defaults as well as its get(section, option[, raw[, vars]])full name parameter vars. If all of these mechanisms are used, the search rules for configuration item values :

  • If no section name is found, NoSectionError is thrown
  • If a given configuration items appear in get()the method varargument, it returns varthe value of the parameter
  • If the given configuration item is contained in the specified section, its value is returned
  • If there is a specified configuration item in [DEFAULT], then return its value
  • If there is a configuration item specified in the defaults parameter of the constructor, its value is returned
  • NoOptionError is thrown

Use argparse to handle command line arguments

Although applications can usually change their behavior without modifying the code through configuration files, it is still very meaningful to provide flexible and easy-to-use command-line parameters, such as: to reduce the user's learning cost, usually the use of command-line parameters only needs The program name can be obtained by adding the --help parameter, and the configuration method of the configuration file usually needs to be read through the manual to master.

Regarding command line processing, the standard library for parameter processing at this stage is argparse.

  • add_argument() The method is used to add a parameter declaration.
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))
  • In addition to supporting basic numeric types such as conventional int / float, argparse also supports file types , as long as the parameters are legal, the program can use the corresponding file descriptor.
parser = argparse.ArgumentParser()
parser.add_argument("bar", type=argparse.FileType("w"))
parser.parse_args(["out.txt"])
  • It is also easier to extend types . Any callable object, such as a function, can be used as an argument to type. In addition parameter choices also support more types, such as: parser.add_argument("door", type=int, choices=range(1, 4)).
  • In addition, add_argument () provides support for required parameters , as long as the required parameter is set to True and passed in. When this parameter is missing, argparse will automatically exit the program and prompt the user.
  • Parameter grouping is also supported . add_argument_group () can be more clear when outputting help information, which is very helpful in CLI applications with complicated usage:
parser = argparse.ArgumentParser(prog="PROG", add_help=False)
group1 = parser.add_argument_group("group1", "group1 description")
group1.add_argument("foo", help="foo help")
group2 = parser.add_argument_group("group2", "group2 description")
group2.add_argument("--bar", help="bar help")
parser.print_help()
  • There are also add_mutually_exclusive_group(required=False)very practical: it ensures that the group of parameters, at least one or only one (required = True).
  • argparse also supports sub-commands , such as pipthere install/uninstall/freeze/list/showand other sub-command, these subcommands and accept different parameters, using ArgumentParser.add_subparsers () can achieve similar functionality.
import argparse
parser = argparse.ArgumentParser(prog="PROG")
subparsers = parser.add_subparsers(help="sub-command help")
parser_a = subparsers.add_parser("a", help="a help")
parser_a.add_argument("--bar", type=int, help="bar help")
parser.parse_args(["a", "--bar", "1"])
  • In addition to parameter processing, when illegal parameters appear , the user also needs to do some processing. After the processing is complete, generally output a prompt message and exit the application. ArgumentParser function provides two methods, respectively exit(status=0, message=None), and error(message)can save a import sysrecall sys.exit()procedure.

Understand the pros and cons of pickle modules

Serialization is simply the process of converting a data structure in memory into a text or binary representation of an object without losing its identity and type information. The serialized form of the object should be able to restore the original object after the deserialization process.

There are many modules in Python that support serialization, such as pickle, json, marshal, and shelve.

Pickle is the most common serialization module . It also has a C language implementation of cPickle. It has better performance than pickle. Its speed is about 1000 times faster than pickle. Therefore, cPickle should be used in most applications. (Note: except that cPickle cannot be inherited, their use is basically the same). pickle two most important function of dump()and load(), respectively, used to serialize and deserialize objects.

The good features of pickle are summarized in the following points:

  • The interface is simple and easy to use . Use dump()and load()can be easily achieved serialization and de-serialization.

  • The storage format of pickle is universal and can be shared by Python parsers of different platforms . For example, the serialized format file under Linux can be deserialized on the Python parser on the Windows platform, and the compatibility is better.

  • Supports a wide range of data types . The digital, boolean, string, a serialized object contains only the tuples, dictionaries, lists, etc., non-nested functions, classes, and by the class __dict__or __getstate__()instance may return serialized objects and the like.

  • The pickle module is extensible . For instance objects, reducing the pickle when the object is not generally call the __init__()function to be called if __init__()initialized, may be provided in the class definition for class classical __getinitargs__()functions, and returns a tuple, when the unpickle performed, Python automatically calls __init__(), and the __getinitargs__()returned tuple passed as arguments to __init__(), and for the new class, may provide __getnewargs__()parameters to provide the object generation time, when unpickle to Class.__new__(Class, *arg)create objects. For non-serialized objects, such as Sockets, file handles, database connections, etc., can also be solved by implementing the sojourn pickle protocol, mainly through special methods __getstate__()and __setstate__()to return status of the instance when the pickle.

    Examples:

    import cPickle as pickle
    class TextReader:
        def __init__(self, filename):
            self.filename = filename    # 文件名称
            self.file = open(filename)    # 打开文件的句柄
            self.postion = self.file.tell()    # 文件的位置
    
        def readline(self):
            line = self.file.readline()
            self.postion = self.file.tell()
            if not line:
                return None
            if line.endswith("\n"):
                line = line[:-1]
            return "{}: {}".format(self.postion, line)
    
        def __getstate__(self):    # 记录文件被 pickle 时候的状态
            state = self.__dict__.copy()    # 获取被 pickle 时的字典信息
            del state["file"]
            return state
    
        def __setstate__(self, state):    # 设置反序列化后的状态
            self.__dict__.update(state)
            file = open(self.filename)
            self.file = file
    
    reader = TextReader("zen.text")
    print(reader.readline())
    print(reader.readline())
    s = pickle.dumps(reader)    # 在 dumps 的时候会默认调用 __getstate__
    new_reader = pickle.loads(s)    # 在 loads 的时候会默认调用 __setstate__
    print(new_reader.readline())
    
  • Can automatically maintain references between objects, if there are multiple references on an object, pickle will not change the references between objects, and can automatically handle circular and recursive references.

    >>> a = ["a", "b"]
    >>> b = a    # b 引用对象 a
    >>> b.append("c")
    >>> p = pickle.dumps((a, b))
    >>> a1, b1 = pickle.loads(p)
    >>> a1
    ["a", "b", "c"]
    >>> b1
    ["a", "b", "c"]
    >>> a1.append("d")    # 反序列化对 a1 对象的修改仍然会影响到 b1
    >>> b1
    ["a", "b", "c", "d"]
    

However, the use of pickle also has the following limitations:

  • Pickle cannot guarantee atomicity of operations . Pickle is not an atomic operation, which means that if an exception occurs during a pickle call, part of the data may have been saved. In addition, if the object is in a deep recursion state, then the maximum recursion depth of Python may be exceeded. Recursion depth can sys.setrecursionlimit()be extended.
  • Pickle has security issues . Python's documentation clearly states that it does not provide security guarantees, so don't easily deserialize data received from an untrusted data source. Since loads () can receive strings as parameters, well-designed strings provide a possibility for intrusion. Enter the code in the Python interpreter pickle.loads("cos\nsystem\n(S'dir\ntR.")will be able to view all files in the current directory. You can replace dir with other more destructive commands. To further enhance security, and the user can override the inherited class pickle.Unpickler through find_class()methods to achieve.
  • The pickle protocol is specific to Python, and compatibility between different languages ​​is difficult to guarantee . Pickle files created in Python may not be available in other languages.

Another good option for serialization-JSON

The most common methods provided by Python's standard library JSON are similar to pickle, dump / dumps are used for serialization, and load / loads are used for deserialization. It should be noted that json does not support non-ASCII-based encoding by default. If the load method may not display properly when processing Chinese characters, you need to specify the corresponding character encoding through the encoding parameter. In terms of serialization, JSON has the following advantages over pickle:

  • It is simple to use and supports multiple data types . The composition of JSON documents is very simple, there are only two major data structures:
    • A collection of name / value pairs . In various languages, it is implemented as an object, record, structure, dictionary, hash table, key list, or associative array.
    • An ordered list of values . In most languages, it is implemented as an array, vector, list, or sequence. The corresponding data types supported in Python include dictionaries, lists, strings, integers, floating-point numbers, True, False, None, etc. The data structure in JSON and the conversion in Python are not exactly one-to-one correspondence, there are certain differences.
  • The storage format is more readable and easy to modify . Compared to pickle, the json format is closer to the programmer's thinking, and it is much easier to read and modify. The dumps () function provides a parameter indent to make the generated json file more readable. 0 means "each value is on a separate line"; numbers greater than 0 mean "each value is on a separate line and use the space of this number to Indent nested data structures ". But it should be noted that this parameter comes at the cost of larger file size.
  • json supports cross-platform and cross-language operations . For example, the json file generated in Python can be easily parsed using JavaScript, and the interoperability is stronger, while the pickle format file can only be supported in the Python language. In addition, json's native JavaScript support, the client browser does not need to use an additional interpreter for this, especially suitable for Web applications to provide fast, compact, and convenient serialization operations. In addition, compared to pickle, the storage format of json is more compact and takes up less space.
  • Has strong scalability . The json module also provides encoding (JSONEncoder) and decoding class (JSONDecoder) so that users can extend the serialization types that it does not support by default.

The performance of the standard module json in Python is slightly worse than pickle and cPickle. If you have very high requirements on serialization performance, you can use the cPickle module.

Use the threading module to write multi-threaded programs

GIL makes the existence of multi-threaded programming Python temporarily unable to take advantage of multiple processors, and can not improve the operating rate, but in several cases, such as waiting for external resources to return , or in order to improve the user experience and create flexible user interface reaction , Or in multi-user applications , multi-threading is still a better solution.

Python provides two very simple and clear modules for multithreaded programming: thread and threading.

The thread module provides a multi-threaded low-level support module to process and control threads in a low-level and primitive way, which is more complicated to use; while the threading module is packaged based on thread, and the operation of the thread is objectified, providing rich features at the language level. In practical applications, it is recommended to use the threading module first rather than the thread module.

  • In terms of synchronization and mutual exclusion of threads , not only the Lock instruction lock, RLock reentrant instruction lock, but also the condition variable Condition, semaphore Semaphore, BoundedSemaphore, and Event events are supported in the threading module.

  • The threading module's main thread and sub-threads are interactive , and the join()method can block the thread of the current context until the thread calling this method terminates or reaches the specified timeout (optional parameter). Using this method can easily control the execution of the main thread and sub-threads and between sub-threads.

In fact, in many cases, we may want the main thread to wait for all sub-threads to complete before exiting. At this time, the threading module is used to guard the thread. The daemon attribute of the thread can be set through the setDaemon () function. When the daemon attribute is set to True, it means that the main thread can exit without waiting for the child thread to complete. By default, the daemon flag is False, and the main thread will end after all non-daemon threads have ended.

import threading
import time
def myfunc(a, delay):
    print("I will calculate square of {} after delay for {}".format(a, delay))
    time.sleep(delay)
    print("calculate begins...")
    result = a * a
    print(result)
    return result

t1 = threading.Thread(target=myfunc, args=(2, 5))
t2 = threading.Thread(target=myfunc, args=(6, 8))
print(t1.isDaemon())
print(t2.isDaemon())
t2.setDaemon(True)
t1.start()
t2.start()

Use Queue to make multithreaded programming safer

Multithreaded programming is not an easy task. Synchronization and mutual exclusion between threads, sharing of data between threads, etc. are all issues to be considered concerning thread safety.

The Queue module in Python provides three types of queues:

  • Queue.Queue(maxsize): First-in first-out , maxsize is the queue size, when the value is non-positive, it is an infinite circular queue

  • Queue.LifoQueue(maxsize): Last in, first out, equivalent to stack

  • Queue.PriorityQueue(maxsize): Priority queue

These three queues support the following methods:

  • Queue.qsize(): Returns the queue size.
  • Queue.empty(): True when the queue is empty, otherwise False
  • Queue.full(): When the queue size is set, return True if the queue is full, otherwise return False.
  • Queue.put(item[, block[, timeout]]): When adding element item to the queue, when block is set to False, a Full exception is thrown if the queue is full. If block is set to True, when timeout is None, it will wait until there is an empty position, otherwise it will throw a Full exception after timeout according to the timeout setting.
  • Queue.put_nowait(item): Equal put(item, False).blockto False when the queue is empty if an exception is thrown Empty. If the block is set to True and timeout is None, it will wait until there are elements available, otherwise it will throw an Empty exception after timeout according to the timeout setting.
  • Queue.get([block[, timeout]]): Remove the element from the queue and return the value of the element
  • Queue.get_nowait():Equivalent to get(False)
  • Queue.task_done(): Send a signal to indicate that the enqueue task has been completed, often used in consumer threads
  • Queue.join(): Block until all elements in the queue are processed

The Queue module is thread safe . It should be noted that the queue in the Queue module is not the same as the queue represented by collections.deque. The former is mainly used for communication between different threads, and it internally implements the thread lock mechanism; while the latter is mainly on the data structure. concept.

Example of multi-thread download:

import os
import Queue
import threading
import urllib2
class DownloadThread(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue
    def run(self):
        while True:
            url = self.queue.get()    # 从队列中取出一个 url 元素
            print(self.name + "begin download" + url + "...")
            self.download_file(url)    # 进行文件下载
            self.queue.task_done()    # 下载完毕发送信号
            print(self.name + " download completed!!!")
    def download_file(self, url):    # 下载文件
        urlhandler = urllib2.urlopen(url)
        fname = os.path.basename(url) + ".html"    # 文件名称
        with open(fname, "wb") as f:    # 打开文件
            while True:
                chunk = urlhandler.read(1024)
                if not chunk:
                    break
                f.write(chunk)
if __name__ == "__main__":
    urls = ["https://www.createspace.com/3611970","http://wiki.python.org/moni.WebProgramming"]
    queue = Queue.Queue()
    # create a thread pool and give them a queue
    for i in range(5):
        t = DownloadThread(queue)    # 启动 5 个线程同时进行下载
        t.setDaemon(True)
        t.start()
​
    # give the queue some data
    for url in urls:
        queue.put(url)
​
    # wait for the queue to finish
    queue.join()

Guess you like

Origin www.cnblogs.com/monteyang/p/12742947.html