A guide to best practices for engineering Python projects

There is no unified specification and project management plan for Python engineering, perhaps because Python's sudden rise is relatively short and there are fewer companies for developing large-scale projects. So in order to help internal students better solve Python engineering problems and share personal development habits and code management ideas, I wrote this article.

dependency management

Before PEP 518 and pyproject.toml were introduced. A project cannot tell a tool like pip what build tools it needs to build. Now setuptools has a setup_require parameter to specify what is required to build the project, but you can't read that setting unless you have setuptools installed, which means you can't declare that you require setuptools to read settings in setuptools. The chicken-and-egg question is why tools like virtualenv install setuptools by default, and why pip always injects setuptools and wheel when running a setup.py, whether you explicitly install it or not. You shouldn't even try to depend on a specific version of setuptools to build your project, because you have no way to specify a version; you'll have to settle for whatever the user happens to have installed.

After PEP 518 you can declare your build tools and required versions.

In the past, we may often use files such as requirements.txt to save the dependencies required by our project, but it does not have a good way to distinguish the dependencies we need in the production environment, development environment, and test environment. They must be divided into Multiple files state separately that with some new build tools we can solve our problem. And using requirements.txt alone cannot declare the python version, system environment, etc. we need.

In this internal project development, I chose PDM , which aims to be the next generation Python package management tool. It was originally born as a personal interest. If you feel pipenvor poetryuse is really good and don't want to introduce a new package manager, then go ahead and use them; but if you find something that these tools don't support, then you can probably find it in pdm.

Poetry also looks like a good choice. Similar to Pipenv and PDM, Poetry is a Python virtual environment and dependency management tool. In addition, it also provides package management functions, such as packaging and publishing. You can think of it as a superset of tools like Pipenv and Flit. It allows you to use Poetry to manage both Python libraries and Python programs.

If you are using PDM or Poetry, please create a folder called .venv or similar in the project directory through virtualenv. Why do I recommend virtualenv instead of PEP582? Many systems rely on a default python version. If PEP582 is used, the python version that comes with the system will be used by default. If not, we must additionally generate a virtual environment corresponding to the required python version; followed by Currently vscode does not support it.

We can use these build tools to separate our development environment, test environment, and production environment dependencies.

We can also add additional commands for building tools in pyproject.toml, such as the start command for starting the service, the test command for testing, and so on. Similar to what PDM Scripts describe. Starting the service through the build tool can effectively solve the problem of where the package is located, and force the running directory of all packages to be the project root directory.

project structure

The recommended project structure is as follows:

Dockerfile

This file is mainly used to build images for Docker. It is recommended to deploy through Docker when deploying in a production environment.

docs

A folder dedicated to saving documents.

LICENSE

If this is an open source project, then this file is generally used to place the open source protocol used.

pyproject.toml

The configuration file based on the PEP518 specification saves the project introduction, author contact information, dependent packages, used build tools, etc.

README.md

It is mainly used for MarkDown documents of project introduction and usage instructions.

{project_name}

The folder where the actual project code is placed, you can choose any name, but you need to ensure that it does not duplicate the name of other third-party packages you depend on. The reason why you don't use src is that it will be more convenient when you write test cases, or use it as a python package for other projects.

{project_name}-stubs

The project_name in {project_name}-stubs needs to be the same as the name of the above folder, and then splice -stubs. For example, if the project is called andy, then this folder is called andy-stubs. If your project is a python library, you can store the type description files generated by mypy in this folder. mypy will automatically read the type description in this project, which is convenient for type judgment. See the mypy documentation for details .

tests

Folder for unit tests etc.

tox. ini

tox configuration file.

.gitignore

Used to tell git which files should be ignored.

module reference

Python modules are one of the main abstraction layers, and quite possibly the most natural one. The abstraction layer allows the code to be divided into different parts, each part contains related data and functions.

For example, in a project, one layer controls user operation-related interfaces, and the other layer handles underlying data operations. The most natural way to separate these two layers is to reorganize all functional interfaces in one file and encapsulate all low-level operations in another file. In this case, the interface file needs to import the file that encapsulates the underlying operations, which can be done with importand from ... importstatements. Once you use the import statement, you can use this module. It can be a built-in module including os and sys, or a third-party module that has been installed, or a module inside the project.

To comply with the style guide, keep module names short, use lowercase, and avoid special symbols such as dot (.) and question mark (?). Such my.spam.pya name must not be used! Naming it this way will interfere with Python's module lookup functionality. In the case of my.spam.py, Python expects to myfind the file in the folder spam.py, which is not the case. You can name your module after if you like my_spam.py, but using underscores in module names is not recommended. However, using other characters (spaces or hyphens) in module names will prevent imports (- is the subtraction operator), so try to keep module names simple so that you don't need to separate words. Most importantly, don't use the underscore namespace, use submodules instead.

# OK
import library.plugin.foo
# not OK
import library.foo_plugin
复制代码

In addition to the above naming restrictions, there are no other special requirements for Python files to become modules, but in order to use this concept reasonably and avoid problems, you need to understand the principle mechanism of import. Specifically, import moduthe statement looks for the appropriate file, that is, modu.pythe file in the calling directory, if it exists. If the file is not found, the Python interpreter recursively looks for the file in the "PYTHONPATH" environment variable, and if it is still not found, an ImportError exception is thrown.

Once found modu.py, the Python interpreter will execute the module in an isolated scope. All top-level statements are executed, including other references. Method and class definitions will be stored in the module's dictionary. The variables, methods, and classes of this module are then exposed to callers through namespaces, a particularly useful and powerful core concept in Python.

In many other languages, include filedirectives are used by the preprocessor to take all the code in a file and 'copy' it into the caller's code. Python is different: the include code is placed independently in the module namespace, which means that you generally don't need to worry that the included code may cause bad effects, such as overloading methods with the same name.

It is also possible from modu import *to emulate more standard behavior using special forms of the import statement. But import *generally considered bad practice. The code using from modu import *is harder to read and has insufficient dependency independence. Use from modu import functo pinpoint the method you want to import and place it in the global namespace. is from modu import *better than because it explicitly indicates which methods are imported into the global namespace, and its import moduonly advantage over is that it requires less typing when using the method later.

import modu
[...]
x = modu.sqrt(4)
复制代码

Secondly, if you refer to the module of your own project, add your project called my and the module called modu, then it is not recommended to use it for from my import modureference, and it is strongly recommended to use it from . import modu.

Secondly, it is suggested that if you need to refer to some classes for other modules, please __init__.pyexpose them in this module and add as to replace the export in some languages.

from .config import Config as Config
复制代码

And it is not recommended to __init__.py place a large amount of code at all, it is only recommended to replace export.

Too much code is added to __init__.py, and as the complexity of the project grows, the directory structure gets deeper and deeper, and subpackages and deeper nested subpackages may appear. In this case, importing a component in a multi-level nested subpackage requires execution of all __init__.pyfiles encountered in the path. If the modules and subpackages in the package have no code sharing requirements, it is normal or even good practice to use blank __init__.pyfiles.

type checking

Python is a dynamic language. Many times we may not know the function parameter type or return value type. It is very likely that some types do not have a specified method. After writing the code for a while and looking back at the code, it is likely that we have forgotten what we wrote. What parameters the function needs to pass and what type of result to return, you have to read the specific content of the code, which reduces the speed of reading. The typing module can solve this problem very well.

Since python3.5, PEP484 introduced type annotations (type hints) for python.

Mypy is a static type checker in Python. Mypy has a powerful and easy-to-use type system with many nice features such as type inference, generics, callable types, tuple types, union types, and structural subtyping. It is recommended to use mypy as a type checking tool and each method must clearly declare parameters, parameter types, and return value types.

def register(
    self, factory: Optional[PooledObjectFactory] = None, name: Optional[str] = None
) -> None:

	pass
复制代码

If the parameter or return value can be empty, it should be marked as Optional or use the syntax type of 3.11 | None. Such as PooledObjectFactory | None.

In vscode, you can install the mypy plugin, so that type checking can be done directly in vscode.

Code formatting and style checking

In order to help developers unify the code style, the Python community proposed the PEP8 code coding style, which does not require everyone to follow it. Python officially launched a tool to check whether the code style conforms to PEP8, also called pep8.

Black calls itself "The uncompromising code formatter".

Black claims to be an uncompromising Python code formatting tool. The reason why it becomes "uncompromising" is because it detects code styles that do not conform to the specification, and it will format everything for you directly. You don't need to confirm it at all, and make decisions for you directly. And in return, Black offers fast speeds. Black enables faster code reviews by producing minimal diffs. The use of Black is very simple. After successful installation, it can be used like other system commands, just specify the file or directory to be formatted after the black command.

In a sense, a very low-configurability code formatting and inspection tool is better in the team than one that can be heavily customized. Modern IDEs generally provide support for Black.

configuration management

It is recommended to put the configuration in the {project_name}/{project_name} folder and save it in yaml format. The reason why formats such as toml are not used is that if you use configuration mapping functions such as k8s, it will not be available, and yaml can be well compatible with other systems.

You can read the yaml file where the configuration is located and deserialize it into a configuration object. This configuration object can be a dataclass in python or an ordinary class, and each field of the configuration is declared above.

Configuration is something that may often add and delete fields, and we should not do it in a dict-like way.

exception management

Exceptions exist in almost all programming languages. The exception can quickly point out the problems in the program, which is convenient for troubleshooting. Developers can also throw custom exceptions according to the situation to indicate that the expected content does not match the actual content. Good exception design and usage habits can improve the quality of the program.

In the logic, there may be logic that does not meet expectations, and related exceptions will be thrown. At this time, when coding, for the normal operation of the logic, it is necessary to process the logic and catch exceptions.

To catch exceptions, use try...exceptcode blocks to wrap the code that needs to handle exceptions. expectCatch the specified exception type, and enter the corresponding code logic if it occurs. For some do not want to handle, by raisethrowing an exception.

When capturing, try not to capture broad exception base classes such as Exception, but to capture specific exceptions, such as ValueError.

When handling exceptions, if no exceptions continue to be thrown, log information needs to be entered. Unless you know that not outputting any information will not cause difficulty in making mistakes. Item exceptions should ERRORend with . Similar to standard exception naming.

test

In addition to the built-in test framework in Python, there are many third-party test frameworks, and some non-test frameworks also have built-in test frameworks. The purpose is to add some features on the basis of the built-in test framework to make writing tests more convenient and the test process smoother.

In order to facilitate the test framework to find test cases, certain specifications should be followed when writing tests:

Test modules test_start with
Test methods should test_start with
The test class name should Teststart with

The tests are placed under the tests folder.

Pytest adds a lot of syntactic sugar on the basis of unittest to make testing easier and more flexible. And it has a plug-in function to facilitate the integration of other functions.

Since Pytest is compatible with most other testing frameworks, and it also has powerful functions, it is recommended to use Pytest as the main testing framework.

tox is a general-purpose virtual environment management and testing command-line tool. tox allows us to customize multiple independent and isolated python environments on the same Host. If your project needs to be compatible with multiple python versions, it is strongly recommended to use it.