Several interesting Python libraries, it is recommended to collect ~

As the language that has ranked first in the programming language rankings for many times, Python is deeply loved and praised by everyone.

With each release of Python, new modules are added and new and better ways of doing things are introduced, and while we are all used to using good old Python libraries and certain ways of doing things, now is also the time to upgrade and take advantage of the new And improved modules and their features too.

Pathlib

pathlib is definitely one of the bigger recent additions to the Python standard library, and it's been part of the standard library since Python 3.4, but many people still use the os module for filesystem operations.

However, pathlib has a number of advantages over the old os.path - while the os module represents paths in raw string format, pathlib uses an object-oriented style, which makes it much more readable and natural to write:

from pathlib import Path
import os.path

# 老方式
two_dirs_up = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# 新方式,可读性强
two_dirs_up = Path(__file__).resolve().parent.parent

The fact that paths are treated as objects rather than strings also makes it possible to create an object once and then look up or manipulate its properties:

readme = Path("README.md").resolve()

print(f"Absolute path: {readme.absolute()}")
# Absolute path: /home/martin/some/path/README.md
print(f"File name: {readme.name}")
# File name: README.md
print(f"Path root: {readme.root}")
# Path root: /
print(f"Parent directory: {readme.parent}")
# Parent directory: /home/martin/some/path
print(f"File extension: {readme.suffix}")
# File extension: .md
print(f"Is it absolute: {readme.is_absolute()}")
# Is it absolute: True

One of my favorite features of pathlib is that you can use the / ("division") operator to concatenate paths:

# Operators:
etc = Path('/etc')

joined = etc / "cron.d" / "anacron"
print(f"Exists? - {joined.exists()}")
# Exists? - True

It's important to note that pathlib is just a replacement for os.path and not the entire os module, it also includes the functionality of the glob module, so if you're used to using os.path with glob.glob then you can completely replace them with pathlib.

In the snippet above we showed some handy path manipulation and object attributes, but pathlib also includes all the methods you're used to from os.path, for example:

print(f"Working directory: {Path.cwd()}")  # same as os.getcwd()
# Working directory: /home/martin/some/path
Path.mkdir(Path.cwd() / "new_dir", exist_ok=True)  # same as os.makedirs()
print(Path("README.md").resolve())  # same as os.path.abspath()
# /home/martin/some/path/README.md
print(Path.home())  # same as os.path.expanduser()
# /home/martin

See the official documentation for a complete mapping of os.path functions to new functions in pathlib.

Secrets

Speaking of the os module, another part you should stop using is os.urandom. Instead, you should use the new secrets module available since Python 3.6:

# 老方式:
import os

length = 64

value = os.urandom(length)
print(f"Bytes: {value}")
# Bytes: b'\xfa\xf3...\xf2\x1b\xf5\xb6'
print(f"Hex: {value.hex()}")
# Hex: faf3cc656370e31a938e7...33d9b023c3c24f1bf5

# 新方式:
import secrets

value = secrets.token_bytes(length)
print(f"Bytes: {value}")
# Bytes: b'U\xe9n\x87...\x85>\x04j:\xb0'
value = secrets.token_hex(length)
print(f"Hex: {value}")
# Hex: fb5dd85e7d73f7a08b8e3...4fd9f95beb08d77391

Using os.urandom isn't really the problem here, the reason the secrets module was introduced is because people use the random module to generate passwords etc even though the random module doesn't produce cryptographic security tokens.

According to the docs the random module should not be used for security purposes, you should use secrets or os.urandom, but the secrets module is definitely preferable as it is newer and contains some utility/convenience methods for hex tokens and URL security token.

Zoneinfo

Before Python 3.9, there was no built-in library for timezone manipulation, so everyone was using pytz, but now we have zoneinfo in the standard library, so it's time to switch.

from datetime import datetime
import pytz  # pip install pytz

dt = datetime(2022, 6, 4)
nyc = pytz.timezone("America/New_York")

localized = nyc.localize(dt)
print(f"Datetime: {localized}, Timezone: {localized.tzname()}, TZ Info: {localized.tzinfo}")

# 新方式:
from zoneinfo import ZoneInfo

nyc = ZoneInfo("America/New_York")
localized = datetime(2022, 6, 4, tzinfo=nyc)
print(f"Datetime: {localized}, Timezone: {localized.tzname()}, TZ Info: {localized.tzinfo}")
# Datetime: 2022-06-04 00:00:00-04:00, Timezone: EDT, TZ Info: America/New_York

The datetime module delegates all timezone operations to the abstract base class datetime.tzinfo, which requires a concrete implementation - before importing this module, most likely from pytz. Now that we have zoneinfo in the standard library, we can use it.

However, there is a caveat to using zoneinfo - it assumes time zone data is available on the system, which is the case for UNIX systems, if your system does not have time zone data then you should use the tzdata package, which is maintained by the CPython core developers A first-party library that contains the IANA time zone database.

Dataclasses

An important addition to Python 3.7 is the dataclasses package, a replacement for namedtuple.

You might wonder why you need to replace namedtuple? Here are some reasons why you should consider switching to data classes:

  • it can be mutable

  • By default, repr , eq , init , hash  magic methods are provided  ,

  • Allows specifying a default value,

  • Inheritance is supported.

    Additionally, the data class supports  frozen  and  slots (since 3.10) attributes to provide feature parity with named tuples.

Switching really shouldn't be too hard, since you just need to change the definition:

# 老方式:
# from collections import namedtuple
from typing import NamedTuple
import sys

User = NamedTuple("User", [("name", str), ("surname", str), ("password", bytes)])

u = User("John", "Doe", b'tfeL+uD...\xd2')
print(f"Size: {sys.getsizeof(u)}")
# Size: 64

# 新方式:
from dataclasses import dataclass

@dataclass()
class User:
   name: str
   surname: str
   password: bytes

u = User("John", "Doe", b'tfeL+uD...\xd2')

print(u)
# User(name='John', surname='Doe', password=b'tfeL+uD...\xd2')

print(f"Size: {sys.getsizeof(u)}, {sys.getsizeof(u) + sys.getsizeof(vars(u))}")
# Size: 48, 152

In the code above we also included the size comparison as this is one of the bigger differences between namedtuples and dataclasses, as you can see above namedtuples are much smaller in size due to dataclasses using dict to represent attributes.

As for speed comparisons, unless you plan on creating millions of instances, the access times for properties should be basically the same, or not important enough:

import timeit

setup = '''
from typing import NamedTuple
User = NamedTuple("User", [("name", str), ("surname", str), ("password", bytes)])
u = User("John", "Doe", b'')
'''

print(f"Access speed: {min(timeit.repeat('u.name', setup=setup, number=10000000))}")
# Access speed: 0.16838401100540068

setup = '''
from dataclasses import dataclass

@dataclass(slots=True)
class User:
  name: str
  surname: str
  password: bytes

u = User("John", "Doe", b'')
'''

print(f"Access speed: {min(timeit.repeat('u.name', setup=setup, number=10000000))}")
# Access speed: 0.17728697300481144

If the above convinces you to switch to data classes, give it a try ASAP

Instead, if you don't want to switch and really want to use namedtuples for some reason, you should at least use the typings module instead of NamedTuple from collections:

# 不好方式的:
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])

# 更好的方式:
from typing import NamedTuple
class Point(NamedTuple):
    x: float
    y: float

Finally, if you're using neither namedtuples nor dataclasses, you might want to consider using Pydantic directly.

Proper Logging

It's not a recent addition to the standard library, but it's worth using - you should use proper logging instead of print statements, you can use print if you're debugging issues locally, but for any production-ready program that runs without user intervention, Proper logging is a must.

Especially considering that setting up Python logging is as simple as:

import logging
logging.basicConfig(
    filename='application.log',
    level=logging.WARNING,
    format='[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s',
    datefmt='%H:%M:%S'
)

logging.error("Some serious error occurred.")
# [12:52:35] {<stdin>:1} ERROR - Some serious error occurred.
logging.warning('Some warning.')
# [12:52:35] {<stdin>:1} WARNING - Some warning.

Compared with print statements, the simple configuration above will provide you with a superior debugging experience, and most importantly, you can further customize the logging library to log to different locations, change log levels, automatically rotate logs, etc.

f-strings

Python includes many ways to format strings, including C-style formatting, f-strings, template strings, or the .format function, however, one of them - f-strings - is formatted string literals, which are more naturally written , which is more readable and the fastest of the previously mentioned options.

So I don't think it's necessary to argue or explain why they are used, however, there are certain situations where f-strings cannot be used:

The only reason to use the % format is for logging:

import logging

things = "something happened..."

logger = logging.getLogger(__name__)
logger.error("Message: %s", things)  # 评估内部记录器方法
logger.error(f"Message: {things}")  # 立即评估

In the example above, if you use f-strings, the expression will be evaluated immediately, whereas with C-style formatting, substitutions will be deferred until they are actually needed, which is important for message grouping, where all messages with the same template are can be logged as one, this does not work with f-strings because the template is populated with data before being passed to the logger.

Also, there are some things that f-strings simply cannot do, such as populating templates at runtime - i.e. dynamic formatting - which is why f-strings are called literal string formats:

# 动态设置模板及其参数
def func(tpl: str, param1: str, param2: str) -> str:
    return tpl.format(param=param1, param2=param2)

some_template = "First template: {param1}, {param2}"
another_template = "Other template: {param1} and {param2}"

print(func(some_template, "Hello", "World"))
print(func(another_template, "Hello", "Python"))

# 动态重用具有不同参数的相同模板.
inputs = ["Hello", "World", "!"]
template = "Here's some dynamic value: {value}"

for value in inputs:
    print(template.format(value=value))

Most importantly, use f-strings whenever possible, as they are more readable and more performant, but note that there are still situations where other formatting styles are preferred and/or required.

Tollib

TOML is a widely used configuration format, especially important to Python's tooling and ecosystem, as it is used in the pyproject.toml configuration file, until now you had to use an external library to manage TOML files, but starting with Python 3.11, There will be a built-in library called tomllib, which is based on the toml package.

So, once you switch to Python 3.11, you should get in the habit of using import tomllib instead of import tomli. One less dependency to worry about!

# import tomli as tomllib
import tomllib

with open("pyproject.toml", "rb") as f:
    config = tomllib.load(f)
    print(config)
    # {'project': {'authors': [{'email': '[email protected]',
    #                           'name': 'Martin Heinz'}],
    #              'dependencies': ['flask', 'requests'],
    #              'description': 'Example Package',
    #              'name': 'some-app',
    #              'version': '0.1.0'}}

toml_string = """
[project]
name = "another-app"
description = "Example Package"
version = "0.1.1"
"""

config = tomllib.loads(toml_string)
print(config)
# {'project': {'name': 'another-app', 'description': 'Example Package', 'version': '0.1.1'}}

Setuptools

The last one is more like a deprecation notice:

Since Distutils is deprecated, use of any functions or objects from distutils is likewise discouraged, and Setuptools is intended to replace or deprecate all such uses.

It's time to say goodbye to the distutils package and switch to setuptools. The setuptools documentation provides guidance on how to replace distutils usage. In addition, PEP 632 provides migration advice for parts of distutils not covered by setuptools.

Summarize

Every new Python version brings new features, so I recommend you check out the "New Modules", "Deprecated Modules", and "Removed Modules" sections of the Python Release Notes, which is an important part of understanding the Python standard. Great way to change, this way you can constantly incorporate new features and best practices into your project.

Guess you like

Origin blog.csdn.net/JACK_SUJAVA/article/details/131245935