[Front-end engineering] Development of package manager

I came across tnpm suddenly a while ago, and then learned the principle of optimizing npm install. Although it doesn't help the business, it is still very beneficial to learn some optimization ideas. In the process of learning, I just want to summarize the advantages and disadvantages of npm, yarn, and pnpm in recent years.

origin

  • At the beginning, use URLs to share code. For example, when you want to use JQuery, go to the JQuery official website to use its download link. The disadvantage is that every time you use other people’s open source code, you have to find its corresponding download link one by one, and then import it into the file;
  • Npm came into being: use a tool to gather these open source codes together for management, and execute commands to import them into the project when needed. However, in the early days, people were reluctant to throw the codes in one place;
  • After the birth of node.js, a package management tool was urgently needed, and it hit it off with npm, so npm became popular~

npm

npm install principle

insert image description here

  • Get configuration from project-level .npmrcfiles > user-level .npmrcfiles > global-level .npmrc> npmbuilt-in .npmrcfiles;
  • Check for package-lock.jsonfiles:
    • If so, check package-lock.jsonthat package.jsonthe dependencies are consistent with the declaration:
      • Consistent, use directly package.jsonto handle dependencies;
      • Inconsistency, according to different versions take different treatment.
    • if there is not,
      • According to package.jsonthe recursive construction of dependency tree relationship, the construction process:
        • Regardless of whether it is a direct dependency or a sub-dependency, it is preferred to put it in the root directory of node_modules;
        • When encountering the same module, judge whether the version of the module that has been placed in the dependency tree meets the version range of the new module. If so, skip it; otherwise, place the module under the node_modules of the current module.
      • Find each package in the dependency tree in turn from the cache to determine whether there is a cache:
        • No cache exists:
          • npmDownload from remote repository
          • Check the integrity of the package:
            • If it fails, re-download;
            • Verification passed:
              • Copy the downloaded compressed package to the cache directory;
              • pacoteUnzip the downloaded package with the help ofnode_modules
        • There is a cache, and the cached package is pacotedecompressed tonode_modules
      • pacoteUnzip the cached package with the help ofnode_modules
      • Generate package-lock.jsonfiles.

flat structure

a waste of resource

As mentioned earlier, npm3a flat structure will be adopted in the future, but this structure also has disadvantages. Suppose we need to install now:

node_modules
└──A
    └──node_modules
      └──B V1.0
└──C
    └──node_modules
      └──B V2.0
└──D
    └──node_modules
      └──B V2.0
└──E

From the previous installation mechanism, it is not difficult to guess that because Athe dependent package is internally dependent B V1.0, Athe dependent package and B V1.0will be installed in the root directory. When installing Cthe dependent package, since the root directory already exists B V2.0, it B V2.0will be installed in the Croot directory node_modules, Dthe same reason. That is node_modules, the structure is roughly as follows:

node_modules
├──A
├──B V1.0
└──C
    └──node_modules
      └──B V2.0
└──D
    └──node_modules
      └──B V2.0
└──E

insert image description here
It is not difficult to find here that the version is repeated, wasting resource space.

UncertaintyNon-Determinism

The same package.jsonfile install may not get the same node_modules directory structure after dependency.

Still the previous example, A depends on [email protected], C depends on [email protected], and whether B's 1.0 or 2.0 should be upgraded after installation depends.

node_modules
├── [email protected]
├── [email protected]
└── [email protected]
    └── node_modules
        └── [email protected]
└── [email protected]
    └── node_modules
        └── [email protected]

node_modules
├── [email protected]
│   └── node_modules
│       └── [email protected]
├── [email protected]
└── [email protected]
└── [email protected]

Depends on the user's order of installation.

Duplicate module : which refers to modules with the same name and semver (semantic version) compatibility. Each semver corresponds to a version allowable range. If the version allowable ranges of two modules overlap, then a compatible version can be obtained.

phantom dependency

Packages that have no declared dependencies can be illegally accessed. Although we dependenciesdon't write modules directly in B, we can do require('B');that directly phantom dependency. Once the library is released, an Berror will be reported because the module will not be installed when the user installs the library.

time consuming

The flattening algorithm itself is very complex and takes a long time.

Yarn

YarnSome problems that have been optimized npm3: slow dependency installation and uncertainty.

Improve installation speed

When installing dependencies in npm , the installation tasks are serialized, and the installation is performed one by one in package order, which means it waits for a package to be fully installed before moving on to the next one.

In order to speed up package installation, yarnparallel operation is adopted, which has a significant performance improvement. Moreover, in the caching mechanism, yarn each package will be cached on the disk. When the package is installed next time, it can be installed offline from the disk without the network.

lockfile resolves uncertainty

During dependency installation, a yarn.lock file will be generated according to package.josn, which records dependencies, dependent sub-dependencies, dependent versions, hashes for obtaining addresses and verifying module integrity. Even if the installation sequence is different, the same dependency can get a stable node_modules directory structure in any environment and container, ensuring the determinism of dependency installation.

disadvantages

The problem of ghost dependence and dependence on clones is still not solved.

Npm later also used package-lock.json to solve the problem of uncertainty. The caching strategy was also added after the V5 version, so with the upgrade of npm, many advantages of yarn are not obvious.

CNPM

Speed ​​up the download of related source files. In principle, cnpmwhat we do is to change it for everyone registry.

Taobaoyuan changed its address in June this year, so you need to update or replace the latest one when using cnpmregistry

Address: The original Taobao npm domain name will stop parsing soon

Yarn berry

ditch node_modules

Whether it is npm or yarn, it has the function of caching. In most cases, when installing dependencies, it is actually copying the relevant packages in the cache node_modules to . When it comes to IO operations, it is very time-consuming.

And yarn PnPwill not copy this step, but maintain a static mapping table in the project pnp.cjs.

pnp.cjs The specific location of dependencies in the cache will be recorded, and all dependencies are stored in the global cache. At the same time, a self-built parser is built to help node find dependencies from the global cache directory instead of searching when relying on references node_modules.

In this way, a large number of I/O operations are avoided, and the project directory will not be node_modules generated. There will only be one copy of dependencies of the same version globally, and the installation speed and parsing speed of dependencies will be greatly improved.

Breaking away from the node ecology

  • Because of the use of PnP, there will node_modules be , but Webpack,Babelvarious front-end tools such as node_modules. Although many tools such as pnp-webpack-pluginhave been solved, there will inevitably be compatibility risks.
  • PnP has built its own dependency parser, and all dependency references must be executed by the parser, so the node script can only be executed through the yarn command.

PNPM

Advantage

As mentioned earlier, npm3the flat structure is adopted, and there are several problems here:

  • High disk resource usage;
  • phantom dependency, which can illegally access packages that have not declared dependencies
  • The flattening algorithm is complex and time-consuming.

And pnpmall above-mentioned problems have been improved.

pnpm i express

Check node_modulesthe folder:

.pnpm
.modules.yaml
express

Here expressis a soft link, there is no node_modulesdirectory inside, and the real file location is in .pnpm-storethe folder.

▾ node_modules
  ▾ .pnpm
    ▸ [email protected][email protected]
    ...
    ▾ [email protected]
      ▾ node_modules
        ▸ accepts  -> ../[email protected]/node_modules/accepts
        ▸ array-flatten -> ../[email protected]/node_modules/array-flatten
        ...
        ▾ express
          ▸ lib
            History.md
            index.js
            LICENSE
            package.json
            Readme.md

This result can package.json be basically consistent with the declared dependencies, and also reduces resource usage. The management of this dependency method also solves the security problem of dependency promotion.

disadvantages

project isolation

Because .pnpmall are linked to the same source code file through hard links, when we modify the file of this package in a certain project, the package will be modified in all projects, which makes it impossible to isolate the modified projects.

fix dependencies

In the process of front-end development, we often encounter bugs in third-party open source libraries. Usually, we have the following methods to deal with them.

  • Fork a copy of the source code modification by yourself. After the repair, you can package it locally and use it directly. If you want to share your research results with others, you can upload them to the npm repository or submit a PR to the source repository. This method has a disadvantage, that is, it is difficult to keep the notes in sync with the official library.
  • Waiting for library author to fix. This method is not reliable, because open source authors are generally busy, and your needs may not be in the forefront.
  • By patching pacth-packagelocal npmpackages.

However pacth-packagepnpm is not supported.

The optimization direction of npm

Lingyi, an npm engineer of Ant Group, shared a second-level way to install npm at the SEE Conf 2022 Alipay Experience Technology Conference : tnpm (Ant Group Luban Award), and proposed the pain points and optimization solutions of npm install:

HTTP request

insert image description here

Regardless of caching, during execution npm install, we will sequentially and recursively obtain the package information of the current dependencies and tardownload them correspondingly, that is, the number of HTTP requests will increase, which will lead to a gradual increase in the generation time of the dependency tree.

In this regard, the amount of HTTP requests can be reduced by aggregation .

insert image description here

Send the project's package.jsonto the server, and run it on the server @npmcli/arborist to generate the dependency tree. Directly hijack the arboristaccessed interface to our service. Speed ​​up the generation process of dependency tree through memory cache/distributed cache/DB.registry HTTP registry

@npmcli/arboristIt is a package for npm's low-level inspection and management node_modules tree.

I/O operation

insert image description here

The article uses the package he installed as an example. After pulling the tar package to the local, the IO operations involved after decompression include: creating folders, writing files, soft links to bin files, and setting read and write permissions for bin. That is, a total of 13 IO operations are involved.

Since the package npm installcorresponding to the dependent package is pulled from the warehouse at the time tar, and the tar package is also used for caching, the optimization method starts from the tarbeginning. If you don't need to decompress tarthe package, you don't need the above IO operations:

insert image description here

tar It is very simple to add files at the end. So we can tar merge the two together.

insert image description here

The flow for writing both packages would become:

  1. fs.createFile: Create public files
  2. fs.appendFile: write the first package to
  3. fs.appendFile: write the second package to

Reduced from 26 IOs to 3 IO operations.

Although the download and installation are fast now, the installed things cannot be used. tar It is not a file at all JS , it cannot JS be used require, it cannot shell be manipulated in it, and it cannot be edited in the IDE. All habits are broken. Next we need to solve tar the problem of reading and writing.

Study requirethe process of going through JS:

  1. Use fs.readFilethis API to initiate a file read request.
  2. The JS method constructs the data result libuvinuv_req_t
  3. libuv libc The method in will be called read.
  4. readThe method will initiate a system call to access the file system in the kernel to read the file.

insert image description here

PnP is zip saved in the form of a package. The reading of the package is realized through the method of hijacking node , and the reading in is supported by developing a plug-in . However, there are a large number of implementations in the community to traverse the directory, and developers will also use it to perform some dependency operations. For the existing habits of use greater damage.requirezipIDE IDE fsAPInode_modulesshell

insert image description here

  • Solve the reading of tar files: use FUSE (the full name of FUSE is FileSystem In UserSpace, which is a module used in Linux to mount certain network spaces, such as SSH, to the local file system) to implement a file system in the user mode program;

insert image description here

  • Solve the modification of the tar file: With the help of the Overlay file system (the Overlay file system is a file system widely used in Docker containers, the idea of ​​copying when writing is to divide the file into two layers, lower and upper.) The Lower directory is read-only, and the Upper directory is readable and writable. We can combine the Upper and Lower directories in Overlay to construct a readable and writable directory. Modifications to files will be reflected in the Upper directory without affecting the Lower directory. Resolved project isolation .

insert image description here

cache

After solving the installation speed, we have to solve the final disk space problem. Now npm takes up too much space after installation, and the name of the black hole is well deserved.

NPM uses a global tarcache to speed up the download process and reduce repeated downloads. But it still takes too much time to decompress each time.

pnpm The form of file hard link is used to reduce the amount of writing, but the hard link means that the whole point points to the same file. For example, if two projects depend on the same package, if one of them debug makes

Another feature of Overlay is COW (Copy On Write), which copies the underlying files to the upper directory when modifying the underlying files. So we can use the same cache to support all projects globally.

insert image description here

other

Corepack "A manager that manages package managers"

Corepack is an experimental tool introduced in Node.js v16.13 release.

  • It is no longer necessary to install tools such as yarn pnpm globally.
  • It is possible to force team projects to use his specific package manager version without them having to manually sync it every time an update is needed, and will error out in the console if it doesn't match the configuration.

Can be configured in package.json:

"packageManager": "[email protected]"
// 声明的包管理器,会自动下载对应的 yarn,然后执行
yarn install

// 用非声明的包管理器,会自动拦截报错
pnpm install
Usage Error: This project is configured to use yarn

Problems in the test phase:

  • Currently only pnpm and yarn are supported, and cnpm is not supported
  • There are still some problems with compatibility, and npm cannot intercept it. That is to say, even if the packageManager is configured to use yarn, the global npm installation can still be called

@antfu/ni

Before it runs, it detects your yarn.lock// to know the current package manager, and runs the appropriate command.pnpm-lock.yamlpackage-lock.json

使用 `ni` 在项目中安装依赖时:
   假设你的项目中有锁文件 `yarn.lock`,那么它最终会执行 `yarn install` 命令。
   假设你的项目中有锁文件 `pnpm-lock.yaml`,那么它最终会执行 `pnpm i` 命令。
   假设你的项目中有锁文件 `package-lock.json`,那么它最终会执行 `npm i` 命令。

npm i -g @antfu/ni
ni

reference


If there are any mistakes, please point them out, thank you for reading~

Guess you like

Origin blog.csdn.net/qq_34086980/article/details/125837412