I came across tnpm suddenly a while ago, and then learned the principle of optimizing npm install. Although it doesn't help the business, it is still very beneficial to learn some optimization ideas. In the process of learning, I just want to summarize the advantages and disadvantages of npm, yarn, and pnpm in recent years.
origin
- At the beginning, use URLs to share code. For example, when you want to use JQuery, go to the JQuery official website to use its download link. The disadvantage is that every time you use other people’s open source code, you have to find its corresponding download link one by one, and then import it into the file;
- Npm came into being: use a tool to gather these open source codes together for management, and execute commands to import them into the project when needed. However, in the early days, people were reluctant to throw the codes in one place;
- After the birth of node.js, a package management tool was urgently needed, and it hit it off with npm, so npm became popular~
npm
npm install principle
- Get configuration from project-level
.npmrc
files > user-level.npmrc
files > global-level.npmrc
>npm
built-in.npmrc
files; - Check for
package-lock.json
files:- If so, check
package-lock.json
thatpackage.json
the dependencies are consistent with the declaration:- Consistent, use directly
package.json
to handle dependencies; - Inconsistency, according to different versions take different treatment.
- Consistent, use directly
- if there is not,
- According to
package.json
the recursive construction of dependency tree relationship, the construction process:- Regardless of whether it is a direct dependency or a sub-dependency, it is preferred to put it in the root directory of node_modules;
- When encountering the same module, judge whether the version of the module that has been placed in the dependency tree meets the version range of the new module. If so, skip it; otherwise, place the module under the node_modules of the current module.
- Find each package in the dependency tree in turn from the cache to determine whether there is a cache:
- No cache exists:
npm
Download from remote repository- Check the integrity of the package:
- If it fails, re-download;
- Verification passed:
- Copy the downloaded compressed package to the cache directory;
pacote
Unzip the downloaded package with the help ofnode_modules
- There is a cache, and the cached package is
pacote
decompressed tonode_modules
- No cache exists:
pacote
Unzip the cached package with the help ofnode_modules
- Generate
package-lock.json
files.
- According to
- If so, check
flat structure
a waste of resource
As mentioned earlier, npm3
a flat structure will be adopted in the future, but this structure also has disadvantages. Suppose we need to install now:
node_modules
└──A
└──node_modules
└──B V1.0
└──C
└──node_modules
└──B V2.0
└──D
└──node_modules
└──B V2.0
└──E
From the previous installation mechanism, it is not difficult to guess that because A
the dependent package is internally dependent B V1.0
, A
the dependent package and B V1.0
will be installed in the root directory. When installing C
the dependent package, since the root directory already exists B V2.0
, it B V2.0
will be installed in the C
root directory node_modules
, D
the same reason. That is node_modules
, the structure is roughly as follows:
node_modules
├──A
├──B V1.0
└──C
└──node_modules
└──B V2.0
└──D
└──node_modules
└──B V2.0
└──E
It is not difficult to find here that the version is repeated, wasting resource space.
UncertaintyNon-Determinism
The same package.json
file install
may not get the same node_modules
directory structure after dependency.
Still the previous example, A depends on [email protected], C depends on [email protected], and whether B's 1.0 or 2.0 should be upgraded after installation depends.
node_modules
├── [email protected]
├── [email protected]
└── [email protected]
└── node_modules
└── [email protected]
└── [email protected]
└── node_modules
└── [email protected]
node_modules
├── [email protected]
│ └── node_modules
│ └── [email protected]
├── [email protected]
└── [email protected]
└── [email protected]
Depends on the user's order of installation.
Duplicate module : which refers to modules with the same name and semver (semantic version) compatibility. Each semver corresponds to a version allowable range. If the version allowable ranges of two modules overlap, then a compatible version can be obtained.
phantom dependency
Packages that have no declared dependencies can be illegally accessed. Although we dependencies
don't write modules directly in B
, we can do require('B');
that directly phantom dependency
. Once the library is released, an B
error will be reported because the module will not be installed when the user installs the library.
time consuming
The flattening algorithm itself is very complex and takes a long time.
Yarn
Yarn
Some problems that have been optimized npm3
: slow dependency installation and uncertainty.
Improve installation speed
When installing dependencies in npm
, the installation tasks are serialized, and the installation is performed one by one in package order, which means it waits for a package to be fully installed before moving on to the next one.
In order to speed up package installation, yarn
parallel operation is adopted, which has a significant performance improvement. Moreover, in the caching mechanism, yarn
each package will be cached on the disk. When the package is installed next time, it can be installed offline from the disk without the network.
lockfile resolves uncertainty
During dependency installation, a yarn.lock file will be generated according to package.josn, which records dependencies, dependent sub-dependencies, dependent versions, hashes for obtaining addresses and verifying module integrity. Even if the installation sequence is different, the same dependency can get a stable node_modules directory structure in any environment and container, ensuring the determinism of dependency installation.
disadvantages
The problem of ghost dependence and dependence on clones is still not solved.
Npm later also used package-lock.json to solve the problem of uncertainty. The caching strategy was also added after the V5 version, so with the upgrade of npm, many advantages of yarn are not obvious.
CNPM
Speed up the download of related source files. In principle, cnpm
what we do is to change it for everyone registry
.
Taobaoyuan changed its address in June this year, so you need to update or replace the latest one when using cnpm
registry
Address: The original Taobao npm domain name will stop parsing soon
Yarn berry
ditch node_modules
Whether it is npm
or yarn
, it has the function of caching. In most cases, when installing dependencies, it is actually copying the relevant packages in the cache node_modules
to . When it comes to IO operations, it is very time-consuming.
And yarn PnP
will not copy this step, but maintain a static mapping table in the project pnp.cjs
.
pnp.cjs
The specific location of dependencies in the cache will be recorded, and all dependencies are stored in the global cache. At the same time, a self-built parser is built to help node find dependencies from the global cache directory instead of searching when relying on references node_modules
.
In this way, a large number of I/O operations are avoided, and the project directory will not be node_modules
generated. There will only be one copy of dependencies of the same version globally, and the installation speed and parsing speed of dependencies will be greatly improved.
Breaking away from the node ecology
- Because of the use of PnP, there will
node_modules
be , butWebpack,Babel
various front-end tools such asnode_modules
. Although many tools such aspnp-webpack-plugin
have been solved, there will inevitably be compatibility risks. - PnP has built its own dependency parser, and all dependency references must be executed by the parser, so the node script can only be executed through the yarn command.
PNPM
Advantage
As mentioned earlier, npm3
the flat structure is adopted, and there are several problems here:
- High disk resource usage;
- phantom dependency, which can illegally access packages that have not declared dependencies
- The flattening algorithm is complex and time-consuming.
And pnpm
all above-mentioned problems have been improved.
pnpm i express
Check node_modules
the folder:
.pnpm
.modules.yaml
express
Here express
is a soft link, there is no node_modules
directory inside, and the real file location is in .pnpm-store
the folder.
▾ node_modules
▾ .pnpm
▸ [email protected]
▸ [email protected]
...
▾ [email protected]
▾ node_modules
▸ accepts -> ../[email protected]/node_modules/accepts
▸ array-flatten -> ../[email protected]/node_modules/array-flatten
...
▾ express
▸ lib
History.md
index.js
LICENSE
package.json
Readme.md
This result can package.json
be basically consistent with the declared dependencies, and also reduces resource usage. The management of this dependency method also solves the security problem of dependency promotion.
disadvantages
project isolation
Because .pnpm
all are linked to the same source code file through hard links, when we modify the file of this package in a certain project, the package will be modified in all projects, which makes it impossible to isolate the modified projects.
fix dependencies
In the process of front-end development, we often encounter bugs in third-party open source libraries. Usually, we have the following methods to deal with them.
- Fork a copy of the source code modification by yourself. After the repair, you can package it locally and use it directly. If you want to share your research results with others, you can upload them to the npm repository or submit a PR to the source repository. This method has a disadvantage, that is, it is difficult to keep the notes in sync with the official library.
- Waiting for library author to fix. This method is not reliable, because open source authors are generally busy, and your needs may not be in the forefront.
- By patching
pacth-package
localnpm
packages.
However pacth-package
pnpm is not supported.
The optimization direction of npm
Lingyi, an npm engineer of Ant Group, shared a second-level way to install npm at the SEE Conf 2022 Alipay Experience Technology Conference : tnpm (Ant Group Luban Award), and proposed the pain points and optimization solutions of npm install:
HTTP request
Regardless of caching, during execution npm install
, we will sequentially and recursively obtain the package information of the current dependencies and tar
download them correspondingly, that is, the number of HTTP requests will increase, which will lead to a gradual increase in the generation time of the dependency tree.
In this regard, the amount of HTTP requests can be reduced by aggregation .
Send the project's package.json
to the server, and run it on the server @npmcli/arborist
to generate the dependency tree. Directly hijack the arborist
accessed interface to our service. Speed up the generation process of dependency tree through memory cache/distributed cache/DB.registry
HTTP
registry
@npmcli/arborist
It is a package for npm's low-level inspection and managementnode_modules
tree.
I/O operation
The article uses the package he installed as an example. After pulling the tar package to the local, the IO operations involved after decompression include: creating folders, writing files, soft links to bin files, and setting read and write permissions for bin. That is, a total of 13 IO operations are involved.
Since the package npm install
corresponding to the dependent package is pulled from the warehouse at the time tar
, and the tar package is also used for caching, the optimization method starts from the tar
beginning. If you don't need to decompress tar
the package, you don't need the above IO operations:
tar
It is very simple to add files at the end. So we can tar
merge the two together.
The flow for writing both packages would become:
fs.createFile
: Create public filesfs.appendFile
: write the first package tofs.appendFile
: write the second package to
Reduced from 26 IOs to 3 IO operations.
Although the download and installation are fast now, the installed things cannot be used. tar
It is not a file at all JS
, it cannot JS
be used require
, it cannot shell
be manipulated in it, and it cannot be edited in the IDE. All habits are broken. Next we need to solve tar
the problem of reading and writing.
Study require
the process of going through JS:
- Use
fs.readFile
this API to initiate a file read request. - The JS method constructs the data result
libuv
inuv_req_t
libuv
libc
The method in will be calledread
.read
The method will initiate a system call to access the file system in the kernel to read the file.
PnP is zip
saved in the form of a package. The reading of the package is realized through the method of hijacking node
, and the reading in is supported by developing a plug-in . However, there are a large number of implementations in the community to traverse the directory, and developers will also use it to perform some dependency operations. For the existing habits of use greater damage.require
zip
IDE
IDE
fsAPI
node_modules
shell
- Solve the reading of tar files: use FUSE (the full name of FUSE is FileSystem In UserSpace, which is a module used in Linux to mount certain network spaces, such as SSH, to the local file system) to implement a file system in the user mode program;
- Solve the modification of the tar file: With the help of the Overlay file system (the Overlay file system is a file system widely used in Docker containers, the idea of copying when writing is to divide the file into two layers, lower and upper.) The Lower directory is read-only, and the Upper directory is readable and writable. We can combine the Upper and Lower directories in Overlay to construct a readable and writable directory. Modifications to files will be reflected in the Upper directory without affecting the Lower directory. Resolved project isolation .
cache
After solving the installation speed, we have to solve the final disk space problem. Now npm takes up too much space after installation, and the name of the black hole is well deserved.
NPM uses a global tar
cache to speed up the download process and reduce repeated downloads. But it still takes too much time to decompress each time.
pnpm
The form of file hard link is used to reduce the amount of writing, but the hard link means that the whole point points to the same file. For example, if two projects depend on the same package, if one of them debug
makes
Another feature of Overlay is COW (Copy On Write), which copies the underlying files to the upper directory when modifying the underlying files. So we can use the same cache to support all projects globally.
other
Corepack "A manager that manages package managers"
Corepack is an experimental tool introduced in Node.js v16.13 release.
- It is no longer necessary to install tools such as yarn pnpm globally.
- It is possible to force team projects to use his specific package manager version without them having to manually sync it every time an update is needed, and will error out in the console if it doesn't match the configuration.
Can be configured in package.json:
"packageManager": "[email protected]"
// 声明的包管理器,会自动下载对应的 yarn,然后执行
yarn install
// 用非声明的包管理器,会自动拦截报错
pnpm install
Usage Error: This project is configured to use yarn
Problems in the test phase:
- Currently only pnpm and yarn are supported, and cnpm is not supported
- There are still some problems with compatibility, and npm cannot intercept it. That is to say, even if the packageManager is configured to use yarn, the global npm installation can still be called
@antfu/ni
Before it runs, it detects your yarn.lock
// to know the current package manager, and runs the appropriate command.pnpm-lock.yaml
package-lock.json
使用 `ni` 在项目中安装依赖时:
假设你的项目中有锁文件 `yarn.lock`,那么它最终会执行 `yarn install` 命令。
假设你的项目中有锁文件 `pnpm-lock.yaml`,那么它最终会执行 `pnpm i` 命令。
假设你的项目中有锁文件 `package-lock.json`,那么它最终会执行 `npm i` 命令。
npm i -g @antfu/ni
ni
reference
- "A way to install npm in seconds - Lingyi" speech video + text version
- npm install principle analysis
- In-depth explanation of npm & yarn & pnpm package management mechanism
- Record an npm&pnpm sharing within the group
- Node.js Corepack
- You Yuxi recommends the artifact ni, can it replace npm/yarn/pnpm? Simple and easy to use! Source code revealed!
If there are any mistakes, please point them out, thank you for reading~