You do npm install like this

Preface

Using npm installthis command should form a "muscle memory" for us front-end developers. Previously, the relevant knowledge of npm has always been at the "can use" stage, but the internal principles are not well understood. It wasn't until a problem caused by the npm package occurred in the project that I decided to make a detailed review of the workflow of the npm package.

npm install will probably go through several stages as shown in the following illustration. The implementation details of each process and the design background will be sorted out below.

CheckConfig

This is npm installthe first stage after execution. The main work of this stage is to use the Config of NPM in the current project as the configuration to start the installation. After entering it in the terminal, you can view the npm config ls -lcurrent npm config as shown below , which includes us Commonly used npm package installation sources, npm package namespaces, caches, and cache behaviors, etc.

Npm supports the use of configuration items at different levels with different priorities. For example, the four levels of configuration items shown in the figure above (explained one by one according to priority from high to low):

0. cli configs(highest priority): refers to some configurations (and some default values) added to the current npm execution command. For example, after the registry is set during execution, the corresponding configuration items will be output here.

Package-lock.json

As can be seen from the illustration at the beginning of the article, whether there is a valid package-lock file will determine 从npm服务端获取包信息whether 构建依赖树these two steps of work need to be executed.

The Package-lock file is actually npm 5.xa new file added to the version. Its function is to lock the dependency structure. That is, as long as there are package-lock.jsonfiles in your directory, the directory structure npm installgenerated after each execution node_modulesmust be exactly the same.

For example, the dependent packages that currently need to be installed are as follows:

{
  "name": "npm-demo",
  "dependencies": {
    "buffer": "^6.0.3",
    "ignore": "^5.1.9"}
} 

ignoreThere are no other dependencies in it , and _buffer_ also depends on _base64-js_ and _ieee754_:

The request during installation is as shown below. You can see that a total of 8 requests were initiated.

When package-lock.json exists, you can see the network request as shown below. Only 4 package requests were sent in total:

The generated package-lock.json is as follows

In addition, we also know that npm包the module versions in must follow SemVerthe specifications, so when we install dependent packages, the versions are often not fixed , which leads to some unexpected errors due to versions. .

semver specification  semver.org/

But after using package-lock, this problem was solved. Once executed npm install, the integrity information of all dependent packages (version, package download address, sha512 file summary) will be saved in this file. If dependencies need to be reinstalled later, they will be read directly from the package-lock file. , greatly improving the efficiency of installation.

Build dependency tree

The so-called dependency tree is the relationship between an npm package and its dependent packages, which is mainly reflected in the node_modulesinternal directory structure. According to the performance of NPM in different periods, it can be divided into two types: 嵌套结构and扁平结构

Nested mode

In the early version npm1, npm2it showed a nested structure, which means that each dependency's own dependencies are stored in its own node_modules folder. For example, in the above example, npm installafter executing , the module directory structure obtained node_modulesin is as follows:

The advantages of this approach are obvious. node_modulesThere is a one-to-one correspondence between the structure and package.jsonthe structure. The hierarchical structure is obvious, and it ensures that the directory structure is the same for each installation.

However, imagine if base64-jsthere are dependencies in it, then the nesting will continue. There are some problems with this design:

0. If the dependency level is too deep, it will lead to the problem of too long file path, especially under the window system (the maximum length of the file path is 260 characters).
1. A large number of duplicate packages are installed, and the file size is extremely large. bufferFor example, if the same directory in the above example ignorealso relies on the same version base64-js, base64-js will be installed in the node_modules of both, that is, repeated installation.

flat pattern

In order to solve the above problems, a major update was made NPMin the version. 3.xIt changes the early nested structure into a flat structure:

  • When installing a module, regardless of whether it is a direct dependency or a sub-dependency, it is first installed in node_modulesthe root directory.

Still the above dependency structure, we npm installwill get the following directory structure after executing:

It can be seen bufferthat the two dependency packages are installed in the root directory of node_modules. This does alleviate both problems with nested patterns, reducing package redundancy

At this time, if we depend on the version in the module [email protected]:

{
  "name": "npm-demo",
  "dependencies": {
    "base64-js": "1.0.1",
    "buffer": "^6.0.3",
    "ignore": "^5.1.9"}
} 
  • When installing to the same module, determine whether the installed module version matches the version range of the new module. If it matches, skip it. If it does not match, node_modulesinstall the module under the current module.

At this point, we npm installwill get the following directory structure after executing:

Since the base64-JS that buffer depends on is not consistent with the base64-JS version range that the current project depends on, the former is installed in the buffer folder.

Correspondingly, if we reference a module in the project code, the module search has also been adjusted. The process is as follows:

  • Search under the current module path
  • node_modulesSearch for elements in the current module path
  • node_modulesSearch under the path of the upper-level module
  • Until the search in the global path isnode_modules

Assuming that we depend on another package buffer2@^6.0.3, and it also depends on the package base64-js@^1.3.1, the installation structure at this time is as follows:

[email protected]It is not difficult to see that redundancy still occurs because the project has a version range that does not meet the dependencies of other modules. Therefore, npm 3.xthis version does not completely solve the module redundancy problem of the old version-it may even bring new problems.

Here is a summary of the main issues with flattening:

0. Dependence on structure uncertainty .
1. The flattening algorithm itself is very complex and time-consuming. 2. Packages that have no declared dependencies can be illegally accessed
in the project

The latter two are actually easier to understand. As for the uncertainty of the first dependence structure, here can be explained with an example.

Assume that two different modules depend on the same module, but the version ranges are inconsistent.

When a project depends on these two modules, what happens when building the dependency tree?

Or is it this?

The answer is that it is possible. Both forms depend entirely on bufferthe order in which buffer2 is declared in package.json.

In addition, in order to allow developers to use the latest dependency packages under the premise of safety, we usually package.jsononly lock the large versions. This means that after the minor versions of some dependency packages are updated, the dependency structure may also change. Uncertainty can cause unpredictable problems for programs.

This is why the problem of dependency structure arises 不确定, and package-lock 文件the reason why it was born - to ensure that a certain node_modulesstructure is generated after installation.

cache

In order to speed up the efficiency of npm installation, after executing npm installor npm updatecommand to download dependencies, they will be stored in the local cache directory, and then the corresponding dependencies will be copied to the project node_modulesdirectory.

npm config get cacheYou can query it through the command: in Linuxor MacThe default is the directory under the user's home directory .npm/_cacache.

There are two directories in this directory: content-v2, index-v5, content-v2the directory is used to store tarthe cache of the package, and index-v5the directory is used to store tarthe package hash.

npm(5.x)When executing the installation, you can generate a unique cache record corresponding to the directory package-lock.jsonbased on the stored in to find the package , and then use it directly to find the cached package.integrity、version、namekeyindex-v5tarhashhashtar

We can find a package to search and test in the cache directory, and index-v5search for the package path:

grep https://npm.corp.kuaishou.com/base64-js/-/base64-js-1.0.1.tgz -r index-v5 

After the command is executed, the corresponding hash storage path will be returned. Here is 0d/ddthe directory

After opening the file and formatting it as JSON, you can see that there is a _shasum field, which represents the cache hash of this library. The first four digits of the value represent the path in content-v2/sha1, for example, here is 6926, means that content-v2/sha1/69/26under the directory

Based on cached data, npm provides offline installation modes, which are as follows:

  • --prefer-offline: Prioritize using cached data. If there is no matching cached data, download it from the remote warehouse ( default ).
  • --prefer-online: Prioritize the use of network data. If the network data request fails, then request cached data. This mode can obtain the latest module in time.
  • --offline: Does not request the network and directly uses the cached data. Once the cached data does not exist, the installation will fail.

file integrity

We mentioned file integrity many times above, so what is file integrity verification?

Before downloading the dependency package, we can usually get the value npmcalculated for the dependency package hash. For example, if we execute npm infothe command, the following tarball(download link) is shasum( hash):

After the user downloads the dependency package locally, he or she needs to make sure that no errors occurred during the download process. Therefore, after the download is completed, the value of the file needs to be calculated locally. If the two values ​​​​are the same, ensure that the hashdownloaded hashdependency is complete. If different, re-download.

Overall process review

  • Check config* Check if there are lockfiles in the project. * None lockFile: * npmObtain package information from the remote warehouse * Build the dependency tree according to package.json, the construction process: * When building the dependency tree, regardless of whether it is a direct dependency or a sub-dependency, it is placed in the root directory first node_modules. * When encountering the same module, determine whether the module version that has been placed in the dependency tree matches the version range of the new module. If it matches, skip it. If it does not match, place the module under the current module node_modules. * Note that this step only determines the logical dependency tree, not the actual installation. Later, the dependency package in the cache will be downloaded or obtained based on this dependency structure * Search for each package in the dependency tree in sequence in the cache * There is no cache :* npmDownload the package from the remote warehouse * Verify the integrity of the package * Verification fails: * Download again * Verification passes: * Copy the downloaded package to the npmcache directory * Extract the downloaded package according to the dependency structure to node_modules* There is a cache : Extract the cache to node_modules* according to the dependency structure. Extract the package to node_modules*. Generate lockfiles to see the future.

In addition to npm, there are also third-party package management tools worth recommending. Here are two yarn:pnpm

YARN

yarn2016It was released in , when it npmwas still in V3the period, and there were no package-lock.jsonfiles at that time. As we mentioned above: shortcomings such as instability and slow installation speed are often complained by developers. At this time, yarnwas born:

The above are the advantages mentioned on the official website yarn, which were still very attractive at that time. Of course, I also realized my own problems later npmand made many optimizations. In the subsequent optimizations ( lockfiles, cache, default -s...), we can more or less see yarnthe shadow of I, and it can be seen yarnthat the design of I is still very excellent.

yarnThe flat structure of is also used npm v3to manage dependencies. yarnAfter installing the dependencies, a file will be generated by default yarn.lock. It is still the dependency relationship above. Let's take a look yarn.lockat the structure of:

It can be seen that it package-lock.jsonis quite similar to the file.

yarnThe cache structure is npm v5similar to the previous one. Each cached module is stored in a separate folder. The folder name contains the module name, version number, hash and other information. Use the command yarn cache dirto view the directory of cached data:

In terms of caching strategy, it is opposite to npm. yarnThe default prefer-onlinemode is to use network data first, and if the network data request fails, cache data will be requested.

In addition, Yarn also has a feature of pnp " classic.yarnpkg.com/en/docs/pnp ..."

According to the normal process, npm/yarn will generate a node_modules directory, and then Node will search in the node_modules directory according to its module search rules. But in fact, Node does not know what this module is, it will search in node_modules, and if it is not found, it will Search node_modules in the parent directory, and so on. This efficiency is very low.

But as a package manager, Yarn knows the dependency tree of your project. Can you let Yarn tell Node? Let it go directly to a directory to load modules. This can improve the search efficiency of Node modules and reduce the number of node_modules Copy of files. This is Plug'n'Playthe basic principle.

Simply put, in pnp mode, Yarn will not create the node_modules directory. Instead, it will create a .pnp.jsfile, which is a node program. This file contains the project's dependency tree information, module search algorithm, etc.

yarn is not used by default Pnp. You need to add the following configuration to package.json and reinstall the dependencies.

 "installConfig": {
    "pnp": true} 

PNPM

pnpm is a new generation of modern package management tools. Its official documentation says this:

Fast, disk space efficient package manager

  • Package installation is extremely fast;
  • Disk space utilization is efficient.

Install:

npm i pnpm -g 

high speed

Here is a data comparison chart on the community. As the yellow part of pnpm, in most scenarios, the package installation speed is significantly better than npm/yarn/yarn PNP. In most scenarios, the speed is even faster than them. 2-3 times.

Dependency tree (efficient use of disk space)

Previously, neither NPM nor YARN (non-PNP mode) had completely solved the risk of avoiding illegal access to dependencies and repeated installations. The author of pnpm found that yarn had no intention to Zoltan Kochansolve the above problems, so he started from scratch and wrote a new package management It creates a new dependency management mechanism.

Or take the above dependency example to execute the installation

pnpm i 

Let’s take a look again node_modules:

We directly see the three familiar dependency packages, but it is worth noting that there are small arrows behind these three dependencies, indicating that they are all just 软链接pointing to addresses that are actually in the .Pnpm folder.

For example, [email protected] , the actual storage location is

Note that there are three files here, among which base64-js/ieee754 is a soft link, which is pointed to the corresponding package address in the .pnpm directory.

The buffer pointed to by node_module in the project root directory points to the buffer folder here.

Putting 包本身and under 依赖the same node_modulefolder is fully compatible with native Node, and it can organize packages and related dependencies well together. The design is very exquisite. The nested mode implemented through soft chains solves the previous biggest problem - module installation redundancy.

And the dependency management method of pnpm also cleverly avoids 非法访问依赖the problem that has not yet been solved in npm/yarn, that is, as long as a package does not declare dependencies in package.json, it will not be accessible in the project.

Support monorepo

The purpose of monorepo is to use a git warehouse to manage multiple sub-projects. All sub-projects are stored in the root packagesdirectory, so one sub-project represents one package. And managed through lerna. This technology concept is also used in elementUI-plus and our business.

In actual development, we may encounter that if we want to add a package to all packages, we need to execute npm i in the project root directory and then execute the lerna command again. Another big difference between pnpm and npm/yarn is that it supports monorepo, which is reflected in the functions of each subcommand. For example, in the root directory, the dependency A will be pnpm add A -radded to all packages. Of course, --filterfields are also supported for pairing package to filter.

Summarize

Recorded, then a sub-item represents a package. And managed through lerna. This technology concept is also used in elementUI-plus and our business.

In actual development, we may encounter that if we want to add a package to all packages, we need to execute npm i in the project root directory and then execute the lerna command again. Another big difference between pnpm and npm/yarn is that it supports monorepo, which is reflected in the functions of each subcommand. For example, in the root directory, the dependency A will be pnpm add A -radded to all packages. Of course, --filterfields are also supported for pairing package to filter.

Summarize

At present, npm still has certain flaws in actual development, but we have to admit that it is a mature and stable package management tool. And it comes nodewith , which currently handles package management in a good enough way. As a developer, I hope that npm can be inspired by these interesting package management tools in the community (just like yarn) and provide better-use npm.

Guess you like

Origin blog.csdn.net/web2022050901/article/details/126941262