From what angle should we think about the difference between npm, yarn and pnpm

As pnpm becomes more and more popular, I believe that everyone has heard more and more times. Today we will also learn together what is the magic of pnpm , which is highly praised.


Table of contents

1. The principle of npm/yarn install

Two, npm

Three, yarn

4. pnpm

5. Better practicality

Summarize


1. The principle of npm/yarn install

What happens when   npm/yarn install ? It is mainly divided into two parts, the first is: how the package reaches node_modules , and the second is: how node_modules internally manages dependencies;

  After the command is executed, the dependency tree will be built first, and then the package under each node will go through four steps:

  1. Parse the version interval of the dependent package to a specific version number
  2. Download the corresponding dependent tar package to the local offline mirror
  3. Decompress dependencies from offline mirror to local cache
  4. Copy dependencies from the cache to the node_modules directory of the current directory

Then, the corresponding package will arrive in the node_modules of the project . 

So, what kind of directory structure are these dependencies in node_modules ? That is, what does the dependency tree we entered above look like? This depends on the detailed explanation below, different versions are different!

Two, npm

We start from npm2 according to the development history of npm package management tools :

   Use the node version management tool to reduce the node version to 4 , then the npm version is 2.x

 Then we create a new Demo, execute npm init -y , and quickly create a package.json .

 Then execute npm install express , then the express package and its dependencies will be downloaded:

Expand express, it also has node_modules 

 Expanding a few more layers, we will find that each dependency has its own node_modules

 

 That is, the node_modules of npm2 are nested .

node_modules
└─ express
   ├─ index.js
   ├─ package.json
   └─ node_modules
      └─ accepts
         ├─ index.js
         └─ package.json

This structure is what we called the dependency tree above !

Now there are dependencies among the accepts, and then the nesting will continue. Just imagine what is wrong with such a design?

  1. The dependency level is too deep, leading to the fatal problem that the Windows file path is up to 260 characters, so the nesting will exceed the length limit of the Windows path.
  2. A large number of duplicate packages are installed, and the file size is very large. For example, there is a foo in the statistical directory of express , and both depend on the same version of Lodash , then Lodash will be installed in the node_modules of the two , that is, repeated installation, which will occupy a relatively large disk space.
  3. Module instances cannot be shared. For example, React has some internal variables. React introduced in two different packages is not the same module instance, so internal variables cannot be shared, resulting in some unpredictable bugs.

At that time , npm hadn’t solved it yet, and the community came up with a new solution, which was yarn :↓ 

Three, yarn

  How does yarn solve the problem of repeated dependencies and long nested paths?

  Flattening , that is to say, all dependencies are no longer nested layer by layer, but all on the same layer, so that there is no problem of repeated dependencies, and there will be no problem of too long paths;

We deleted node_modules , reinstalled with yarn , and executed yarn add express :

At this time node_modules is like this:

All are laid out on the first floor, and most of the following packages do not have second-level node_modules :

All dependencies are flattened into  node_modulesthe directory, and there is no deep nesting relationship anymore. In this way, when installing a new package, according to the node require mechanism, it will keep  node_moduleslooking for it in the upper level. If the same version of the package is found, it will not be reinstalled, which solves the problem of repeated installation of a large number of packages, and the dependency level will not be changed. too deep.


But if you expand a few more dependent packages, you will find out why there is still nesting?

        Because a package may have multiple versions, only one can be upgraded, so when encountering different versions of the same package later, the nesting method is still used. 


 After npm was later upgraded to 3 , it also adopted this paving scheme, which isvery similar to yarn . Of course, yarn also implements yarn.lock to lock the version, and this functionis also implemented by npm .

But both yarn and npm adopt a paved solution, so is there no problem with this solution?

No, it still has many problems, let’s sort it out:

  1. Depending on the uncertainty of the structure .
  2. The flattening algorithm itself is very complex and takes a long time.
  3. Ghost dependencies (in layman's terms:packages that have not declared dependencies can be accessed illegally )

Later, both are easy to understand, but how to understand the  uncertainty of the first point?

If the project now depends on two packages Barry and Lishen, the dependencies of these two packages are as follows:

 So when npm / yarn install, what is it like after flattening?

 Is that so?

 Or is it like this?

 The answer is: both are possible, depending on the positions of Barry and Lishen in package.json , if Barry is declared first, it is the previous structure, otherwise it is the latter structure. This is why the dependency will cause the "uncertainty" , which is the reason why the lock file mentioned above was born, whether it is package-lock.json (available after npm 5.X, that is, npm3) or yarn.lock , It is all to ensure that a certain node_modules structure is generated after the install.


 So how does pnpm solve these two problems?

4. pnpm

Recall why npm3 and yarn need to flatten node_modules ? Isn't it because the same dependency will be copied multiple times, and the path is too long, so there is a problem under windows?

What if you don't copy it, such as through link .

First, let’s introduce link , that is, soft and hard links. This is the mechanism provided by the operating system. Hard links are different references to the same file, while soft links are to create a new file, and the content of the file points to another path . Of course, the use of these two links is similar.

What if you don't copy files, just save a copy of the content of the npm package in the global warehouse, and link the rest of the places?

In this way, there will be no waste of disk space for copying multiple times, and there will be no problem of too long paths. Because the limitation of too long path is essentially that there cannot be too deep directory hierarchy, and now they are links to directories in various locations, not the same directory, so there is no length limit.

That's right, pnpm is implemented through this idea.

Delete node_modules again , then reinstall it with pnpm , and execute pnpm install.

You will find that it prints this sentence:

 Packages are hardwired from the global store to the virtual store , where the virtual store is node_modules/.pnpm .

Let's open node_modules and take a look:

It is indeed not flat anymore. If express is relied on, then there is only express under node_modules, and there is no ghost dependency.

Expand .pnpm to see:

All dependencies are paved here, and they are all hard-linked from the global store, and then the dependencies between packages and packages are organized through soft links. expresss  under .pnpm/node_modules , these are soft links.

In other words, all dependencies are hard-linked from the global store to .pnpm/node_modules , and then depend on each other through soft links.

 The official gave a schematic diagram, and you can understand it after looking at it: pnpm official address

 This is how pnpm works .

So looking back, why is pnpm excellent?

First of all, the biggest advantage is to save disk space. Only one copy of a package is saved globally, and the rest are soft and hard connections. How much disk space will be saved.

The second is fast , because it will naturally be fast by linking instead of copying .

 

5. Better practicality

  Having said so much, I guess you will find it  pnpm quite complicated. Does it cost a lot to use?

  On the contrary, pnpm is very easy to use. If you have previous experience with npm/yarn, you can even migrate to pnpm seamlessly.

  Is that all? The more important thing below is the support of project management methods

Personal feeling: pnpm can better support the transition from Multirepo project to Monorepo project for the following reasons:

If A depends on X, B depends on X, and there is a C, which does not depend on X, but uses X in its code. Due to the existence of dependency promotion, npm/yarn will put X into the node_modules of the root directory, so that C can run locally, because according to the package loading mechanism of node, it can be loaded into the node_modules of the monorepo project root directory The X. But just imagine, once C is packaged separately and the user installs C separately, then X will not be found, and an error will be reported directly when the code that references X is executed.

 

Summarize


Of course, today is just an in-depth study. The package management method is more suitable for everyone. It depends on our own projects. It may vary from person to person, from project to project, and from business to business~~~

Welcome everyone to discuss and study together~~~

 

Guess you like

Origin blog.csdn.net/weixin_56650035/article/details/126842623