The epoch-making Python package manager - PDM caching mechanism

PDM Series Catalog

1. The epoch-making Python package manager: PDM tutorial—Introduction
2. The epoch-making Python package manager: PDM tutorial—principle
3. The epoch-making Python package manager—PDM local & global configuration
4. The epoch-making Python package manager - PDM local & global project
5. The epoch-making Python package manager - PDM caching mechanism
6. Reader asked: How to make PyCharm support PDM?


pdm introduced the local package directory of pep 582, and many people are questioning: each project is under its own project directory, what is the difference between that and the venv virtual environment?

Many people do not have a deep understanding of virtual environment and pep 582, so it is normal to have this question.

First of all, the first difference is that the virtual environment has its own Python interpreter, while pep 582 does not add a new Python interpreter, so pep 582 is more lightweight.

Then, the second difference is our core content today, the support of the pdm caching mechanism.

If multiple pdm projects depend on the same python package of the same version, under normal circumstances, each project will save a copy to its own __pypackages__directory .

But there are several problems with this:

  1. waste of disk space
  2. Installation is slow

You may think that disks are the cheapest hardware now, and it doesn't matter if you waste it, but some Python projects have more dependencies than you can imagine. For example, OpenStack, the world's largest Python project, has thousands of dependencies. Even if you don't feel bad about your disk, your time must be precious, right?

You create a new pdm project, and you have to reinstall so many dependent packages, and you can't get it done in a day. Then you will know the importance of caching.

1. Enable cache

pdm 默认是关闭 cache 的,如有需要,可以通过如下命令进行开启

$ pdm config install.cache on

复制代码

与缓存相关的配置有三个

  • install.cache:是否开启缓存
  • install.cache_method:选择连接缓存的方式
  • cache_dir:指定缓存的存放目录

关于 cache_dir 如无特殊需要,可以不用管,用默认的目录即可

/Users/iswbm/Library/Caches/pdm

复制代码

比较难以理解的,值得一讲的是 install.cache_method,它的值有两种:

  • symlink:以软链接的方式连接
  • pth:以 pth 的方式连接

关于它们的区别,我在后边有详细的讲解,请继续往下

2. 简单示例

这边以一个简单的示例,让你了解缓存的工作原理。

首先我创建两个 pdm 项目

# 初始化第一个 pdm 项目
mkdir pdm-demo1 && cd pdm-demo1
pdm init


# 初始化第二个 pdm 项目
mkdir pdm-demo2 && cd pdm-demo2
pdm init

复制代码

在 pdm-demo1 下,安装 typer 的包

pdm add typer

复制代码

然后进入 python 交互式解释器,试着导入一下,查看导入的 typer 包路径是什么?

可以发现,存放的目录正是 cache_dir 所配置的目录

然后进入 pdm-demo2 下,同样安装 typer 包

pdm add typer

复制代码

同样进入 python 交互式解释器,试着导入一下,查看导入的 typer 包路径是什么?

可以发现,导入的 typer 与之前 pdm-demo1 的路径一致,说明这两个项目用的同一个 typer 包,避免了同个包同个版本的重复安装。

3. 缓存的原理

关于缓存原理,其实并不难,对于不同的 install.cache_method 原理也不一样

cache_method=symlink

symlink 是默认的连接方式,也是最好理解的一种方式。

当你安装了 typer 包后,在本地包目录下就可以看到 typer 通过一个软链接的方式指向了缓存目录下的 typer 包

cache_method=pth

对于 .pth 相信有不少人不清楚它的用法和原理,这里简单提一下。

When Python is traversing the known library file directory, if it finds a .pth file, it will add the path recorded in the file to the sys.path setting, so the library specified in the .pth file can also be used by Python Runtime found.

Focus back to pdm, if you use the cache_method=pth mode, every time you install a package, a .pthfile , which records the lib directory of the package to be cached.

In this way, when Python looks for packages in the __pypackages__directory , once it finds a .pthfile, it will add the path recorded in the .pthfile to sys.path.

In the above example, looking at the __pypackages__directory , you can find that there are many aaa_xxx.pth files, and the content of these files is the lib directory of the corresponding package in our cache directory

4. Cache management

The command help for pdm management cache is as follows

  • pdm cache clear: clear all caches
  • pdm cache info: View all cache information
  • pdm remove [pattern]: remove the matched file
  • pdm cache list: List all wheel files in the cache

Guess you like

Origin juejin.im/post/7086101465206358053