The Truth About Dependency Management—Exploring Front-End Package Managers

ad23d22c10dc451488e6517d32cb4cb6.png

Dachang Technology Advanced Front-end Node Advanced

Click on the top programmer's growth guide, pay attention to the public number

Reply 1, join the advanced Node exchange group

foreword

npm is a package management tool for Node.JS. In addition, the community has some similar package management tools such as yarn, pnpm and cnpm, as well as tnpm used within the group. In the process of project development, we usually use the above mainstream package managers to generate the node_modules directory to install dependencies and perform dependency management. This article mainly explores the dependency management principle of the front-end package manager, hoping to help readers.

above sea level

When we execute the npm installcommand, npm will help us download the corresponding dependency package and extract it to the local cache, then construct the node_modules directory structure and write the dependency file. So, what is the structure of the corresponding package in the node_modules directory? npm has mainly undergone the following changes.

1. npm v1/v2 dependency nesting

The earliest versions of npm used a very simple nested pattern for dependency management. For example, we depend on module A and module C in the project, and module A and module C depend on different versions of module B. The generated node_modules directory is as follows:
2c06aa89c0ad95a20e0c91080d12cc0d.png6ff0cb7890bc92d2efad943abcc09924.png

Dependency Hell

It can be seen that this is a nested node_modules structure. There will also be a node_modules directory under the dependencies of each module to store the dependencies of the module dependencies. Although this method is simple and clear, there are some big problems. If we add a module D that also depends on version 2.0 B in the project, the generated node_modules directory will be as follows. Although modules A and D depend on the same version B, B has been downloaded and installed twice, resulting in a waste of repeated space. This is the dependency hell problem.

node_modules
├── [email protected]
│   └── node_modules
│       └── [email protected]
├── [email protected]
│   └── node_modules
│       └── [email protected]
└── [email protected]
    └── node_modules
        └── [email protected]

Some famous memes:
e081d74d391647f11bc89dda4ca63968.png141c5cf3b29cbab93542d1c1387c10fb.png

2. npm v3 flattening

npm v3 completes the rewrite of the dependency installer. npm3 installs sub-dependencies in the same directory as the main dependency in a flattened way (hoisting promotion) to reduce the deep tree and redundancy caused by dependency nesting. The node_modules directory generated at this time is as follows:
83f1383082d28131c1b2012a9e77871f.png7a44902babd34f820ec893184d2ee6a1.png
In order to ensure the correct loading of modules, npm also implements an additional dependency search algorithm. The core is to recursively search node_modules upwards. When installing a new package, it will keep looking for the superior node_modules. If a package of the same version is found, it will not be reinstalled. When a version conflict is encountered, the sub-dependency of the module will be stored in the node_modules directory under the module, which solves the problem of repeated installation of a large number of packages, and the level of dependencies will not be too deep. .

The flat pattern solves the dependency hell problem, but also introduces additional new problems.

Phantom dependency

Spectre dependencies mainly occur when a package is not defined in package.json, but can still be referenced in the project. Consider the previous case, whose package.json is shown on the right.


0bc7f8b27fec9f5ba3ebc6d75612f51a.pngb8521494565a9f9cc4b46beb8f08761f.png

In index.js we can directly require A, because the dependency is declared in package.json, but our require B can also work properly.

var A = require('A');
var B = require('B'); // ???

Because B is a dependency of A, during the installation process, npm will tile dependency B under node_modules, so the require function can find it. But this can lead to unexpected problems:

  • Dependency incompatibility: The my-library library does not declare the version that depends on B, so the major update of B is completely legal for the SemVer system, which causes other users to download a version that is incompatible with the current dependency when installing.

  • Missing dependencies: We can also directly refer to the sub-dependencies of devDepdency in the project, but other users will not install devDepdency, which may cause an error to be reported immediately at runtime.

Multiple dependencies (doppelgangers)

9a0e2db9e42111ccbbe4fcccbff18d77.png

Considering that the module D that depends on version 2.0 B and the module E that depends on version 1.0 B continue to be introduced into the project, no matter whether B 2.0 or 1.0 is promoted at the top level, it will cause duplicate problems in another version, such as duplication here 2.0. At this point, the following problems will exist:

  • Destruction of singleton mode: A singleton object exported in module B is introduced into modules C and D. Even if the code seems to load the same version of the same module, the actual analysis and loading are different modules, and the introduction is also different. Object. Problems can arise if side-effecting operations are performed on the object at the same time.

  • Type conflicts: Although the code of each package will not pollute each other, their types can still affect each other, so duplicate versions may lead to global type naming conflicts.

Non-Determinism

In the context of front-end package management, determinism means that under a given package.json, the same node_modules directory structure can be obtained by executing the npm install command in any environment. However npm v3 is non-deterministic, its node_modules directory and dependency tree structure depends on the order of user installation.

Consider that the project has the following dependency tree structure, and the node_modules directory structure generated by npm install is shown on the right.
54b9e4fdb53ad39a2b5c5ceafbe59452.pngf07b07333849197bb3c049d4a24e9059.png
Suppose that when a user manually upgrades module A to version 2.0 using npm, causing its dependent module B to be upgraded to version 2.0, the dependency tree structure at this time is as follows.
15fef88b8c19bee5609d45f185b414e3.png942bd30e9ecb2aaf603045475320d8de.png
At this point, the development is completed, the project is deployed to the server, and npm install is executed again. At this time, the upgraded sub-dependency B version has changed, and the resulting node_modules directory structure will be different from the structure generated by the user's local development, as shown in the following figure. If the node_modules directory structure needs to be consistent, you need to delete the node_modules structure and re-execute npm install when package.json is modified.
3862d857fed8acd380c08705fc487c83.pngce5386fb06640686517426d073973086.png

3. npm v5 flattening + lock

Added package-lock.json in npm v5. When the project has a package.json file and npm install is executed for the first time, a package-lock.json file is automatically generated, which records the modules that package.json depends on, and the sub-dependencies of the modules. And each dependency is marked with a version, an access address, and a hash value for verifying the integrity of the module. Through package-lock.json, the certainty and compatibility of dependent package installation is guaranteed, so that the same result will appear every time you install it.

consistency

Considering the above case, the initial installation generates package-lock.json as shown on the left, the dependencies listed in the depedencies object are all promoted, and the requirements object in each dependency is a sub-dependency. At this point, updating A dependency to version 2.0, as shown in the figure on the right, will not change the upgraded sub-dependency version. So the regenerated node_modules directory structure will not change.
effddb3908ce314ec4f28a18d778c2bc.png54496bb781430d910edf6ffba52a163f.png

compatibility

Semantic Versioning

Depending on version compatibility, you have to mention the SemVer version specification used by npm. The version format is as follows:

  • Major version number: Incompatible API changes

  • Minor version number: Functional additions for backward compatibility

  • Revision number: Bugfixes for backward compatibility

dfc5a20c8d65cd8091f98c225d93b9c5.png
When using third-party dependencies, we usually specify the version range of dependencies in package.json. The semantic version range specifies:

  • ~: only upgrade revision numbers

  • ^: Upgrade minor version number and revision number

  • *: Upgrade to the latest version

Semantic versioning rules define an ideal version number update rule. It is hoped that all dependency updates can follow this rule, but there are often many dependencies that do not strictly follow these rules. Therefore, the inadvertent upgrade of some dependent module sub-dependencies may lead to incompatibility problems. Therefore, package-lock.json indicates a certain version for each module sub-dependency to avoid incompatibility problems.

Yarn

Yarn was open sourced in 2016. Yarn appeared to solve some problems in npm v3, before npm v5 was released. Yarn is defined as fast, secure, and reliable dependency management.

1、Yarn v1 lockfile

The node_modules directory structure generated by Yarn is the same as that of npm v5, and a yarn.lock file is generated by default. For the above example, the generated yarn.lock file is as follows:

A@^1.0.0:
  version "1.0.0"
  resolved "uri"
 dependencies:
    B "^1.0.0"

B@^1.0.0:
  version "1.0.0"
  resolved "uri"

B@^2.0.0:
  version "2.0.0"
  resolved "uri"

C@^2.0.0:
  version "2.0.0"
  resolved "uri"
 dependencies:
    B "^2.0.0"

D@^2.0.0:
  version "2.0.0"
  resolved "uri"
  dependencies:
    B "^2.0.0"

E@^1.0.0:
  version "1.0.0"
  resolved "uri"
  dependencies:
    B "^1.0.0"

You can see that yarn.lock uses a custom format instead of JSON, and puts all dependencies at the top level, given the reasons for being easier to read and review, and reducing merge conflicts.

Yarn lock vs. npm lock

  • The file format is different, npm v5 uses the json format, yarn uses the custom format

  • The versions of dependencies recorded in the package-lock.json file are all deterministic, and no semver range symbols (~ ^ *) will appear, while the semver range symbols will still appear in the yarn.lock file

  • The package-lock.json file is richer in content and implements a denser lock file, including sub-dependency promotion information

    • npm v5 only needs the package.lock file to determine the node_modules directory structure

    • yarn.lock cannot determine top-level dependencies, and two files, package.json and yarn.lock, are required to determine the node_modules directory structure. The location of packages in the node_modules directory is calculated internally in yarn, which can cause uncertainty when using different versions of yarn.

2、Yarn v2 Plug'n'Play

In the 2.x version of Yarn, the Plug'n'Play (PnP) zero-installation mode was introduced, and node_modules was abandoned, which ensured the reliability of dependencies and improved the build speed.

Because Node relies on node_modules to find dependencies, the generation of node_modules will involve a series of IO-heavy operations such as downloading dependency packages, decompressing to cache, and copying to local file directories, including dependency finding and processing duplicate dependencies, which are very time-consuming operations. The package manager for node_modules doesn't have a lot of room for optimization. Therefore, yarn does the opposite. Since the package manager already has the structure of the project's dependency tree, the package manager can directly notify the interpreter of the location of the package on disk and manage the version and sub-dependencies of the dependent package.

Execute yarn --pnpmode to enable PnP mode. In PnP mode, yarn will generate .pnp.cjs file instead of node_modules. This file maintains a mapping of dependent packages to disk locations and sub-dependency lists. At the same time, .pnp.js also implements the resolveRequest method to process the require request. This method will directly determine the location of the dependency in the file system according to the mapping table, thus avoiding the I/O operation of finding the dependency in node_modules.

16f40fdf9a2c928dd94206aff21f29a5.png
The advantages and disadvantages of pnp mode are also very obvious:

  • Excellent: get rid of node_modules, install and load modules quickly; all npm modules will be stored in the global cache directory to avoid multiple dependencies; sub-dependencies will not be promoted in strict mode, and ghost dependencies will be avoided (but this may lead to some packages Problem arises, so relaxed mode that relies on boost is also supported :<).

  • Disadvantage: Self-built resolver handles the Node require method, and execution of Node files needs to be executed through the yarn node interpreter, which is separated from the existing Node ecosystem, and the compatibility is not very good.

pnpm

pnpm1.0 was officially released in 2017. pnpm has the advantages of fast installation speed, saving disk space, and good security. It also appeared to solve the problems of npm and yarn.

Because under the structure of flattened node_modules based on npm or yarn, although the problems of dependency hell, consistency and compatibility are solved, there is no good solution for multiple dependencies and ghost dependencies. Because without considering circular dependencies, the actual dependency structure graph is a directed acyclic graph (DAG), but what npm and yarn simulate through the file directory and node resolve algorithm is actually a superset of the directed acyclic graph (a lot of links between wrong ancestors and siblings), which caused a lot of problems. pnpm also uses a combination of hard links and symbolic links to more accurately simulate DAG to solve the problems of yarn and npm.

1. Non-flat node_modules

Hard links save disk space

A hard link can be understood as a copy of the source file, allowing users to find a file through different path references, which is the same size as the source file but does not actually occupy any space. pnpm will store hard links to the project's node_modules file in the global store directory. Hard links allow different projects to find the same dependency from the global store, which greatly saves disk space.

Symbolic links create nested structures

Soft links can be understood as shortcuts. pnpm uses symbolic links to find the dependency address under the corresponding disk directory (.pnpm) when referencing dependencies. Consider installing the bar module that depends on the foo module in your project, the resulting node_modules directory looks like this.
dc834506cb0acadfbb410834eab9c799.png8a13426354f77ba2a3f99af85ff81dfa.png
It can be seen that there is no node_modules in the bar directory under node_modules. This is a symbolic link. The actual file is located in the corresponding directory in the .pnpm <package-name>@version/node_modules/<package-name>directory and is hard-linked to the global store. The dependencies of bar exist in the .pnpm directory <package-name>@version/node_modules, which is also a soft link to the <package-name>@version/node_modules/<package-name>directory and a hard link to the global store.

The advantage of this nested node_modules structure is that only the packages that are really in the dependencies can be accessed, which avoids all the promoted packages being accessible when the flat structure is used, and solves the problem of ghost dependencies well. In addition, because dependencies are always hard links in the store directory, the same dependencies are always installed only once, and the problem of multiple dependencies has also been solved.

This picture on the official website clearly explains the dependency management mechanism of pnpm
5e62db94610859fdd430f88aea07608c.png

2. Limitations

It looks like pnpm solves the problem well, but there are some limitations.

  • package-lock.json is ignored. npm's lockfile is designed to reflect the tiled node_modules layout, but pnpm creates an isolated layout by default, which cannot be mirrored by npm's lockfile format, and uses its own lockfile pnpm-lock.yaml instead.

  • Symbolic link compatibility. There are some scenarios where symlinks do not work, such as Electron applications, applications deployed on lambda cannot use pnpm.

  • Sub-dependencies are promoted to the same-level directory structure, although compatibility can be achieved due to Node.js' parent directory up-tracking logic. But for plugin loading logic like Egg and Webpack, where relative paths are used, they need to be adapted.

  • The dependencies of different applications are hard links to the same file. If the file is modified during debugging, it may inadvertently affect other projects.

cnpm and tnpm

164ec0954a99f3d9d0b34f0de9bd1664.png
cnpm is an npm domestic mirror source maintained and open sourced by Ali, and supports mirror synchronization of the official npm registry. Based on cnpm, tnpm is designed to serve students in the Alibaba economy. It provides a private npm warehouse and accumulates many Node.js engineering practices.

The dependency management of cnpm/tnpm is based on pnpm, which creates a non-flat node_modules structure through symbolic links, which maximizes the installation speed. The installed dependency packages are named with the package name in the node_modules folder, and then symbolic links are made to the directory of version number@package name. Unlike pnpm, cnpm does not use hard links and does not symlink sub-dependencies to separate directories for isolation.
0417a54d0a97026e301f8113edc360ca.png22063f0233c2ba3db842673fabb004e1.png
In addition, tnpm's new rapid mode uses the user-space file system (FUSE) to make some new optimizations for dependency management. FUST is similar to the file system version of ServiceWorker. FUSE can take over the file system operation logic of a directory. Implementing a non-flat node_modules structure based on this can solve the compatibility problem of soft links. Due to space limitations, I won't go into details here. If you are interested, you can move to the tnpm rapid mode - how to be 10 seconds faster than pnpm.

other

Deno

Through the mainstream package manager dependency management mechanism explored above, we found that no matter whether the flat or non-flat node_modules structure is perfect, the PnP mode of abandoning node_modules is not compatible with the current Node ecology, and there is no solution. It looks like Node has a problem with node_modules itself (?). Node.JS author Ryan also admitted at JSConf that node_modules was one of his top ten regrets about Node, but it was irreversible, and then he recommended his new work Deno. So let's see how Deno, another big runtime environment for JS, manages dependencies.

e289d91f378472c30fa51c4ac4c30718.png
In Deno, instead of using npm, package.json and node_modules, the imported source, package name, version number, and module name are all inserted into the URL. The dependencies are imported through the URL and cached globally, which not only saves disk space, but also saves disk space. Optimized project structure.

import * as log from "https://deno.land/[email protected]/log/mod.ts";

Therefore, there is no concept of package manager in Deno. For dependency management in projects, Deno provides such a solution. Created by the developer dep.ts, all required remote dependencies are referenced in this file, and required methods and classes are re-exported. Local modules dep.tsimport required methods and classes from unity, avoiding inconsistencies that may be caused by importing external dependencies using URL alone.

// dep.ts
export {
  assert,
  assertEquals,
  assertStringIncludes,
} from "https://deno.land/[email protected]/testing/asserts.ts";

// index.ts
import { assert } from './dep.ts';

Although Deno's way of handling dependencies solves various problems caused by node_modules, the current experience is not very good. First of all, the way the URL introduces dependencies is redundant and cumbersome, and the security of directly referencing files on the network is also debatable; and developers are required to manually maintain the dep.tsfiles, the sources of dependencies are not clear, and changes in dependencies also require changes to the local files that introduce dependencies; in addition, The ecosystem of dependent packages is also far less than Node.

But Deno does provide another way of thinking. Node's package manager seems to be just a "pure tool" that installs dependencies and generates node_modules. The real logic for finding resolve dependencies is still done in Node, so there is not much at the package manager level. optimized space. Yarn's Pnp model has tried to change the status of the package manager, but it is not invincible to the powerful Node ecosystem. Therefore, Deno restarts the stove, merges the intall and resolve dependencies, and the redundant node_modules and package managers are unnecessary. It's just that Deno's current method is not mature enough, and we look forward to the subsequent evolution.

Epilogue

Although there is no perfect dependency management solution, looking at the historical development of the package manager, it is a process of mutual learning and continuous optimization between libraries and developers, and they are constantly promoting the development of the front-end engineering field. We look forward to the future. A better solution emerges.

refer to

  • The node_modules dilemma (https://zhuanlan.zhihu.com/p/137535779) npm:

  • How Npm Works(https://npm.github.io/how-npm-works-docs/index.html)

  • Yarn: Plug'n'Play(https://yarnpkg.com/features/pnp)

  • pnpm: symlinked node_modules structure (https://pnpm.io/en/symlinked-node-modules-structure)

  • tnpm: True and simple tnpm rapid mode - how to be 10 seconds faster than pnpm (https://zhuanlan.zhihu.com/p/455809528)

  • deno: Linking to third party code(https://deno.land/[email protected]/linking_to_external_code)


Node 社群


我组建了一个氛围特别好的 Node.js 社群,里面有很多 Node.js小伙伴,如果你对Node.js学习感兴趣的话(后续有计划也可以),我们可以一起进行Node.js相关的交流、学习、共建。下方加 考拉 好友回复「Node」即可。

如果你觉得这篇内容对你有帮助,我想请你帮我2个小忙:

1. 点个「在看」,让更多人也能看到这篇文章2. 订阅官方博客 www.inode.club 让我们一起成长

点赞和在看就是最大的支持❤️

Guess you like

Origin blog.csdn.net/xgangzai/article/details/123700324