Preface
Using npm install
this command should form a "muscle memory" for us front-end developers. Previously, the relevant knowledge of npm has always been at the "can use" stage, but the internal principles are not well understood. It wasn't until a problem caused by the npm package occurred in the project that I decided to make a detailed review of the workflow of the npm package.
npm install will probably go through several stages as shown in the following illustration. The implementation details of each process and the design background will be sorted out below.
CheckConfig
This is npm install
the first stage after execution. The main work of this stage is to use the Config of NPM in the current project as the configuration to start the installation. After entering it in the terminal, you can view the npm config ls -l
current npm config as shown below , which includes us Commonly used npm package installation sources, npm package namespaces, caches, and cache behaviors, etc.
Npm supports the use of configuration items at different levels with different priorities. For example, the four levels of configuration items shown in the figure above (explained one by one according to priority from high to low):
0. cli configs
(highest priority): refers to some configurations (and some default values) added to the current npm execution command. For example, after the registry is set during execution, the corresponding configuration items will be output here.
Package-lock.json
As can be seen from the illustration at the beginning of the article, whether there is a valid package-lock file will determine 从npm服务端获取包信息
whether 构建依赖树
these two steps of work need to be executed.
The Package-lock file is actually npm 5.x
a new file added to the version. Its function is to lock the dependency structure. That is, as long as there are package-lock.json
files in your directory, the directory structure npm install
generated after each execution node_modules
must be exactly the same.
For example, the dependent packages that currently need to be installed are as follows:
{
"name": "npm-demo",
"dependencies": {
"buffer": "^6.0.3",
"ignore": "^5.1.9"}
}
ignore
There are no other dependencies in it , and _buffer_ also depends on _base64-js_ and _ieee754_:
The request during installation is as shown below. You can see that a total of 8 requests were initiated.
When package-lock.json exists, you can see the network request as shown below. Only 4 package requests were sent in total:
The generated package-lock.json is as follows
In addition, we also know that npm包
the module versions in must follow SemVer
the specifications, so when we install dependent packages, the versions are often not fixed , which leads to some unexpected errors due to versions. .
semver specification semver.org/
But after using package-lock, this problem was solved. Once executed npm install
, the integrity information of all dependent packages (version, package download address, sha512 file summary) will be saved in this file. If dependencies need to be reinstalled later, they will be read directly from the package-lock file. , greatly improving the efficiency of installation.
Build dependency tree
The so-called dependency tree is the relationship between an npm package and its dependent packages, which is mainly reflected in the node_modules
internal directory structure. According to the performance of NPM in different periods, it can be divided into two types: 嵌套结构
and扁平结构
Nested mode
In the early version npm1
, npm2
it showed a nested structure, which means that each dependency's own dependencies are stored in its own node_modules folder. For example, in the above example, npm install
after executing , the module directory structure obtained node_modules
in is as follows:
The advantages of this approach are obvious. node_modules
There is a one-to-one correspondence between the structure and package.json
the structure. The hierarchical structure is obvious, and it ensures that the directory structure is the same for each installation.
However, imagine if base64-js
there are dependencies in it, then the nesting will continue. There are some problems with this design:
0. If the dependency level is too deep, it will lead to the problem of too long file path, especially under the window system (the maximum length of the file path is 260 characters).
1. A large number of duplicate packages are installed, and the file size is extremely large. buffer
For example, if the same directory in the above example ignore
also relies on the same version base64-js
, base64-js will be installed in the node_modules of both, that is, repeated installation.
flat pattern
In order to solve the above problems, a major update was made NPM
in the version. 3.x
It changes the early nested structure into a flat structure:
- When installing a module, regardless of whether it is a direct dependency or a sub-dependency, it is first installed in
node_modules
the root directory.
Still the above dependency structure, we npm install
will get the following directory structure after executing:
It can be seen buffer
that the two dependency packages are installed in the root directory of node_modules. This does alleviate both problems with nested patterns, reducing package redundancy
At this time, if we depend on the version in the module [email protected]
:
{
"name": "npm-demo",
"dependencies": {
"base64-js": "1.0.1",
"buffer": "^6.0.3",
"ignore": "^5.1.9"}
}
- When installing to the same module, determine whether the installed module version matches the version range of the new module. If it matches, skip it. If it does not match,
node_modules
install the module under the current module.
At this point, we npm install
will get the following directory structure after executing:
Since the base64-JS that buffer depends on is not consistent with the base64-JS version range that the current project depends on, the former is installed in the buffer folder.
Correspondingly, if we reference a module in the project code, the module search has also been adjusted. The process is as follows:
- Search under the current module path
node_modules
Search for elements in the current module pathnode_modules
Search under the path of the upper-level module- …
- Until the search in the global path is
node_modules
Assuming that we depend on another package buffer2@^6.0.3
, and it also depends on the package base64-js@^1.3.1
, the installation structure at this time is as follows:
[email protected]
It is not difficult to see that redundancy still occurs because the project has a version range that does not meet the dependencies of other modules. Therefore, npm 3.x
this version does not completely solve the module redundancy problem of the old version-it may even bring new problems.
Here is a summary of the main issues with flattening:
0. Dependence on structure uncertainty .
1. The flattening algorithm itself is very complex and time-consuming. 2. Packages that have no declared dependencies can be illegally accessed
in the project
The latter two are actually easier to understand. As for the uncertainty of the first dependence structure, here can be explained with an example.
Assume that two different modules depend on the same module, but the version ranges are inconsistent.
When a project depends on these two modules, what happens when building the dependency tree?
Or is it this?
The answer is that it is possible. Both forms depend entirely on buffer
the order in which buffer2 is declared in package.json.
In addition, in order to allow developers to use the latest dependency packages under the premise of safety, we usually package.json
only lock the large versions. This means that after the minor versions of some dependency packages are updated, the dependency structure may also change. Uncertainty can cause unpredictable problems for programs.
This is why the problem of dependency structure arises 不确定
, and package-lock 文件
the reason why it was born - to ensure that a certain node_modules
structure is generated after installation.
cache
In order to speed up the efficiency of npm installation, after executing npm install
or npm update
command to download dependencies, they will be stored in the local cache directory, and then the corresponding dependencies will be copied to the project node_modules
directory.
npm config get cache
You can query it through the command: in Linux
or Mac
The default is the directory under the user's home directory .npm/_cacache
.
There are two directories in this directory: content-v2
, index-v5
, content-v2
the directory is used to store tar
the cache of the package, and index-v5
the directory is used to store tar
the package hash
.
npm(5.x)
When executing the installation, you can generate a unique cache record corresponding to the directory package-lock.json
based on the stored in to find the package , and then use it directly to find the cached package.integrity、version、name
key
index-v5
tar
hash
hash
tar
We can find a package to search and test in the cache directory, and index-v5
search for the package path:
grep https://npm.corp.kuaishou.com/base64-js/-/base64-js-1.0.1.tgz -r index-v5
After the command is executed, the corresponding hash storage path will be returned. Here is 0d/dd
the directory
After opening the file and formatting it as JSON, you can see that there is a _shasum field, which represents the cache hash of this library. The first four digits of the value represent the path in content-v2/sha1, for example, here is 6926, means that content-v2/sha1/69/26
under the directory
Based on cached data, npm provides offline installation modes, which are as follows:
--prefer-offline
: Prioritize using cached data. If there is no matching cached data, download it from the remote warehouse ( default ).--prefer-online
: Prioritize the use of network data. If the network data request fails, then request cached data. This mode can obtain the latest module in time.--offline
: Does not request the network and directly uses the cached data. Once the cached data does not exist, the installation will fail.
file integrity
We mentioned file integrity many times above, so what is file integrity verification?
Before downloading the dependency package, we can usually get the value npm
calculated for the dependency package hash
. For example, if we execute npm info
the command, the following tarball
(download link) is shasum
( hash
):
After the user downloads the dependency package locally, he or she needs to make sure that no errors occurred during the download process. Therefore, after the download is completed, the value of the file needs to be calculated locally. If the two values are the same, ensure that the hash
downloaded hash
dependency is complete. If different, re-download.
Overall process review
- Check config* Check if there are
lock
files in the project. * Nonelock
File: *npm
Obtain package information from the remote warehouse * Build the dependency tree according to package.json, the construction process: * When building the dependency tree, regardless of whether it is a direct dependency or a sub-dependency, it is placed in the root directory firstnode_modules
. * When encountering the same module, determine whether the module version that has been placed in the dependency tree matches the version range of the new module. If it matches, skip it. If it does not match, place the module under the current modulenode_modules
. * Note that this step only determines the logical dependency tree, not the actual installation. Later, the dependency package in the cache will be downloaded or obtained based on this dependency structure * Search for each package in the dependency tree in sequence in the cache * There is no cache :*npm
Download the package from the remote warehouse * Verify the integrity of the package * Verification fails: * Download again * Verification passes: * Copy the downloaded package to thenpm
cache directory * Extract the downloaded package according to the dependency structure tonode_modules
* There is a cache : Extract the cache tonode_modules
* according to the dependency structure. Extract the package tonode_modules
*. Generatelock
files to see the future.
In addition to npm, there are also third-party package management tools worth recommending. Here are two yarn
:pnpm
YARN
yarn
2016
It was released in , when it npm
was still in V3
the period, and there were no package-lock.json
files at that time. As we mentioned above: shortcomings such as instability and slow installation speed are often complained by developers. At this time, yarn
was born:
The above are the advantages mentioned on the official website yarn
, which were still very attractive at that time. Of course, I also realized my own problems later npm
and made many optimizations. In the subsequent optimizations ( lock
files, cache, default -s...), we can more or less see yarn
the shadow of I, and it can be seen yarn
that the design of I is still very excellent.
yarn
The flat structure of is also used npm v3
to manage dependencies. yarn
After installing the dependencies, a file will be generated by default yarn.lock
. It is still the dependency relationship above. Let's take a look yarn.lock
at the structure of:
It can be seen that it package-lock.json
is quite similar to the file.
yarn
The cache structure is npm v5
similar to the previous one. Each cached module is stored in a separate folder. The folder name contains the module name, version number, hash and other information. Use the command yarn cache dir
to view the directory of cached data:
In terms of caching strategy, it is opposite to npm. yarn
The default prefer-online
mode is to use network data first, and if the network data request fails, cache data will be requested.
In addition, Yarn also has a feature of pnp " classic.yarnpkg.com/en/docs/pnp ..."
According to the normal process, npm/yarn will generate a node_modules directory, and then Node will search in the node_modules directory according to its module search rules. But in fact, Node does not know what this module is, it will search in node_modules, and if it is not found, it will Search node_modules in the parent directory, and so on. This efficiency is very low.
But as a package manager, Yarn knows the dependency tree of your project. Can you let Yarn tell Node? Let it go directly to a directory to load modules. This can improve the search efficiency of Node modules and reduce the number of node_modules Copy of files. This is Plug'n'Play
the basic principle.
Simply put, in pnp mode, Yarn will not create the node_modules directory. Instead, it will create a .pnp.js
file, which is a node program. This file contains the project's dependency tree information, module search algorithm, etc.
yarn is not used by default Pnp
. You need to add the following configuration to package.json and reinstall the dependencies.
"installConfig": {
"pnp": true}
PNPM
pnpm is a new generation of modern package management tools. Its official documentation says this:
Fast, disk space efficient package manager
- Package installation is extremely fast;
- Disk space utilization is efficient.
Install:
npm i pnpm -g
high speed
Here is a data comparison chart on the community. As the yellow part of pnpm, in most scenarios, the package installation speed is significantly better than npm/yarn/yarn PNP. In most scenarios, the speed is even faster than them. 2-3 times.
Dependency tree (efficient use of disk space)
Previously, neither NPM nor YARN (non-PNP mode) had completely solved the risk of avoiding illegal access to dependencies and repeated installations. The author of pnpm found that yarn had no intention to Zoltan Kochan
solve the above problems, so he started from scratch and wrote a new package management It creates a new dependency management mechanism.
Or take the above dependency example to execute the installation
pnpm i
Let’s take a look again node_modules
:
We directly see the three familiar dependency packages, but it is worth noting that there are small arrows behind these three dependencies, indicating that they are all just 软链接
pointing to addresses that are actually in the .Pnpm folder.
For example, [email protected] , the actual storage location is
Note that there are three files here, among which base64-js/ieee754 is a soft link, which is pointed to the corresponding package address in the .pnpm directory.
The buffer pointed to by node_module in the project root directory points to the buffer folder here.
Putting 包本身
and under 依赖
the same node_module
folder is fully compatible with native Node, and it can organize packages and related dependencies well together. The design is very exquisite. The nested mode implemented through soft chains solves the previous biggest problem - module installation redundancy.
And the dependency management method of pnpm also cleverly avoids 非法访问依赖
the problem that has not yet been solved in npm/yarn, that is, as long as a package does not declare dependencies in package.json, it will not be accessible in the project.
Support monorepo
The purpose of monorepo is to use a git warehouse to manage multiple sub-projects. All sub-projects are stored in the root packages
directory, so one sub-project represents one package
. And managed through lerna. This technology concept is also used in elementUI-plus and our business.
In actual development, we may encounter that if we want to add a package to all packages, we need to execute npm i in the project root directory and then execute the lerna command again. Another big difference between pnpm and npm/yarn is that it supports monorepo, which is reflected in the functions of each subcommand. For example, in the root directory, the dependency A will be pnpm add A -r
added to all packages. Of course, --filter
fields are also supported for pairing package to filter.
Summarize
Recorded, then a sub-item represents a package
. And managed through lerna. This technology concept is also used in elementUI-plus and our business.
In actual development, we may encounter that if we want to add a package to all packages, we need to execute npm i in the project root directory and then execute the lerna command again. Another big difference between pnpm and npm/yarn is that it supports monorepo, which is reflected in the functions of each subcommand. For example, in the root directory, the dependency A will be pnpm add A -r
added to all packages. Of course, --filter
fields are also supported for pairing package to filter.
Summarize
At present, npm still has certain flaws in actual development, but we have to admit that it is a mature and stable package management tool. And it comes node
with , which currently handles package management in a good enough way. As a developer, I hope that npm can be inspired by these interesting package management tools in the community (just like yarn) and provide better-use npm.