What is a node? Front-end learning subtotal

What is Node.js

In the traditional sense, JavaScript runs on the browser. This is because the browser kernel is actually divided into two parts: the rendering engine and the JavaScript engine. The former is responsible for rendering HTML + CSS, and the latter is responsible for running JavaScript. The JavaScript engine used by Chrome is V8, which is very fast.

Node.js is a framework that runs on the server side, and its bottom layer uses the V8 engine. We know that both Apache + PHP and Java Servlet can be used to develop dynamic web pages. Node.js is similar to them except that it is developed using JavaScript.

 

After the introduction from the definition, to give a simple example, create a new app.js file and enter the following content:

var http = require('http');  

http.createServer(function (request, response) {  

    response.writeHead(200, {'Content-Type': 'text/plain'}); // HTTP Response 头部  

    response.end('Hello World\n'); // return data "Hello World"  

}).listen(8888); // monitor port 8888  

// The terminal prints the following information  

console.log('Server running at http://127.0.0.1:8888/');

In this way, even if a simple HTTP server is finished, just enter  node app.js  to run it, and then visit <localhost:8888> to see the output.

Why use Node.js

 

Facing a new technology, it is always good to ask a few more. Since PHP, Python, and Java can all be used for back-end development, why bother to learn Node.js? At least we should know in which scenario, Node.js is more appropriate.

 

In general, Node.js is suitable for the following scenarios:

 

1. Real-time applications, such as online multi-person collaboration tools, web chat applications, etc.

2. I/O-based high-concurrency applications, such as providing APIs for clients to read databases.

3. Streaming applications, for example, clients often upload files.

4. The front and rear ends are separated.

In fact, the first two can be attributed to one type, that is, the client widely uses long connections. Although the number of concurrent connections is high, most of them are idle connections.

 

Node.js also has its limitations, it is not suitable for CPU-intensive tasks, such as artificial intelligence computing, video, picture processing, etc.

 

Of course, the above shortcomings are not just words or rote memorization, let alone others. It requires us to have a certain understanding of the principles of Node.js in order to make correct judgments.

Basic concept

 

Before introducing Node.js, clarifying some basic concepts will help to have a deeper understanding of Node.js.

Concurrent

 

Different from the client, server developers are very concerned about the number of concurrent requests, that is, how many concurrent requests from the client can be supported by the server. The C10K problem in the early years was to discuss how to use a single server to support 10K concurrent data. Of course, with the improvement of software and hardware performance, C10K is no longer a problem at present. We started to try to solve the C10M problem, that is, how a single server can handle millions of concurrency.

 

When C10K was proposed, we were still using the Apache server. Its working principle is that whenever a network request arrives, it forks a child process and runs PHP scripts in the child process. After executing the script, send the result back to the client.

 

This can ensure that different processes do not interfere with each other. Even if a problem occurs in one process, the entire server will not be affected, but the shortcomings are also obvious: a process is a relatively heavy concept. It has its own heap and stack and takes up more memory. There is an upper limit on the number of processes that the server can run, about a few thousand.

 

Although Apache later used FastCGI, it was essentially just a process pool, which reduced the overhead of creating processes, but could not effectively increase the number of concurrency.

 

Java Servlet uses a thread pool, that is, each Servlet runs on a thread. Although threads are lighter than processes, they are also relative. Someone has tested that the size of the stack exclusive to each thread is 1M, which is still not efficient enough. In addition, multi-threaded programming will bring all kinds of troubles, this must be well understood by programmers.

 

If you don't use threads, there are two solutions, using coroutines and non-blocking I/O. Coroutines are lighter than threads. Multiple coroutines can run in the same thread and the programmer is responsible for scheduling. This technology is widely used in the Go language. Non-blocking I/O is used by Node.js to handle high concurrency scenarios.

Non-blocking I/O

 

The I/O mentioned here can be divided into two types: network I/O and file I/O. In fact, the two are highly similar. I/O can be divided into two steps. First, the contents of the file (network) are copied to the buffer, which is located in the memory area exclusively occupied by the operating system. Then copy the contents of the buffer to the memory area of ​​the user program.

 

For blocking I/O, from initiating a read request, to the buffer is ready, and then to the user process to obtain data, these two steps are blocked.

 

Non-blocking I/O actually polls the kernel to see if the buffer is ready, and if not, continue to perform other operations. When the buffer is ready, the contents of the buffer are copied to the user process. This step is actually blocked.

 

I/O multiplexing technology refers to the use of a single thread to process multiple network I/Os. We often say  select and epoll  are functions used to poll all sockets. For example, Apache uses the former, while Nginx and Node.js use the latter. The difference is that the latter is more efficient. Since I/O multiplexing is actually a single-threaded polling, it is also a non-blocking I/O solution.

 

Asynchronous I/O is the most ideal I/O model, but it is a pity that true asynchronous I/O does not exist. AIO on Linux transmits data through signals and callbacks, but there are flaws. The existing libeio and IOCP on Windows essentially use thread pools and blocking I/O to simulate asynchronous I/O.

Node.js threading model

 

Many articles have mentioned that Node.js is single-threaded. However, such a statement is not rigorous and can even be said to be very irresponsible, because we will at least think of the following issues:

 

1. How does Node.js handle concurrent requests in a thread?

2. How does Node.js perform asynchronous file I/O in a thread?

3. How does Node.js reuse the processing power of multiple CPUs on the server?

Network I/O

 

Node.js can indeed handle a large number of concurrent requests in a single thread, but this requires certain programming skills. Let's review the code at the beginning of the article. After executing the app.js file, the console will output immediately, and we will see "Hello, World" when we visit the web page.

 

This is because Node.js is event-driven, which means that its callback function will only be executed when the network request event occurs. When multiple requests come in, they will line up in a queue and wait for execution in turn.

 

This seems natural, but if you don't fully realize that Node.js runs on a single thread, and the callback function is executed synchronously, and the program is developed according to the traditional model, it will cause serious problems. For a simple example, the "Hello World" string here may be the result of running some other module. Assuming that the generation of "Hello World" is very time-consuming, it will block the callback of the current network request, causing the next network request to fail to be responded.

 

The solution is simple, just use an asynchronous callback mechanism. We can pass the response  parameter used to generate the output result  to other modules, and generate the output result asynchronously, and finally execute the real output in the callback function. The advantage of this is that  the callback function of http.createServer will not block, so there will be no unresponsive requests.

 

For example, let's modify the entrance of the server. In fact, if we want to complete the routing ourselves, it is about this idea:

var http = require('http');  

var output = require('./string') // a third-party module  

http.createServer(function (request, response) {  

    output.output(response); // call a third-party module for output  

}).listen(8888);

function sleep(milliSeconds) {// Simulate freeze  

    var startTime = new Date().getTime();  

    while (new Date().getTime() < startTime + milliSeconds);  

}  

 Third-party modules:

function outputString(response) {  

    sleep(10000); // block for 10s      

    response.end('Hello World\n'); // Perform time-consuming operations first, then output}  

 

exports.output = outputString;

In short, when programming with Node.js, any time-consuming operation must be done asynchronously to avoid blocking the current function. Because you are serving clients, and all code is always single-threaded and executed sequentially.

File I/O

 

I also emphasized in the previous article that asynchrony is to optimize the experience and avoid lag. To really save processing time and utilize the multi-core performance of the CPU, it still depends on multi-threaded parallel processing.

 

In fact, Node.js maintains a thread pool at the bottom. As mentioned earlier in the basic concepts section, there is no real asynchronous file I/O, and it is usually simulated through a thread pool. There are four threads in the thread pool by default for file I/O.

 

It should be noted that we cannot directly manipulate the underlying thread pool, and actually do not need to care about their existence. The role of the thread pool is only to complete I/O operations, not to perform CPU-intensive operations, such as image and video processing, large-scale calculations, etc.

 

If there are a few CPU-intensive tasks to be processed, we can start multiple Node.js processes and use the IPC mechanism for inter-process communication, or call external C++/Java programs. If there are a lot of CPU-intensive tasks, it can only mean that choosing Node.js is a wrong decision.

Squeeze the CPU

 

So far, we know that Node.js uses I/O multiplexing technology, uses a single thread to process network I/O, and uses a thread pool and a small number of threads to simulate asynchronous file I/O. So on a 32-core CPU, does the single thread of Node.js look tasteless?

 

The answer is no, we can start multiple Node.js processes. Different from the previous section, there is no need for communication between processes. They each listen on a port and use Nginx for load balancing at the outermost layer.

 

Nginx load balancing is very easy to implement, just edit the configuration file:

http{  

    upstream sampleapp {  

        // Optional configuration items, such as least_conn, ip_hash  

        server 127.0.0.1:3000;  

        server 127.0.0.1:3001;  

        // ... listen to more ports  

    }  

    ....  

    server{  

       listen 80;  

       ...  

       location / {  

          proxy_pass http://sampleapp; // listen on port 80, then forward  

       }    

}

The default load balancing rule is to assign network requests to different ports in turn. We can use the  least_conn  flag to forward network requests to the Node.js process with the least number of connections, or use  ip_hash to  ensure that requests for the same ip must be from the same Node. .js process handling.

 

Multiple Node.js processes can give full play to the processing power of multi-core CPUs and also have strong expansion capabilities.

 

Event loop

 

There is an event loop in Node.js, which may be familiar to students who have experience in iOS development. Yes, it is similar to Runloop to a certain extent.

 

A complete Event Loop can also be divided into multiple phases, followed by poll, check, close callbacks, timers, I/O callbacks, and Idle.

 

Since Node.js is event-driven, the callback function of each event will be registered to different stages of the Event Loop. For example   , the callback function of fs.readFile is added to I/O callbacks, the callback of setImmediate  is added to the poll phase of the next Loop, and  the callback of process.nextTick() is added to the end of the current phase and before the start of the next phase. .

 

The callbacks of different asynchronous methods will be executed in different phases. It is important to grasp this, otherwise logic errors will occur due to the calling sequence.

 

Event Loop loops continuously, and all callback functions registered in this stage are executed synchronously in each stage. This is exactly why I mentioned in the network I/O section, do not call blocking methods in the callback function, always use asynchronous thinking for time-consuming operations. A callback function that takes too long may make the Event Loop stuck in a certain stage for a long time, and new network requests cannot be responded to in time.

 

Because the purpose of this article is to have a preliminary and comprehensive understanding of Node.js. I won't introduce each stage of the Event Loop in detail. You can check the official documentation for details.

 

It can be seen that the Event Loop is relatively low-level. In order to facilitate the use of event-driven ideas, Node.js encapsulates the  EventEmitter  class:

According to the output result, although self.emit(thing2)  is defined later, it is executed first, which also fully complies with the calling rules of Event Loop.

 

Many modules in Node.js are inherited from EventEmitter, such as the fs.readStream mentioned in the next section  , which is used to create a readable file stream, and corresponding events will be thrown when the file is opened, data is read, and the reading is completed. .

 

data flow

 

The benefits of using data streams are obvious, and there are real portrayals in life. For example, the teacher assigns summer homework. If students do a little (workflow) every day, they can complete the task relatively easily. If the backlog is together, on the last day, facing the pile of workbooks, you will feel powerless.

 

Server development is the same, assuming that users upload 1G files, or read local 1G files. If there is no concept of data flow, we need to open a 1G buffer, and then concentrate on processing once the buffer is full.

 

If the data stream is adopted, we can define a very small buffer, such as 1Mb in size. When the buffer is full, the callback function is executed to process this small piece of data to avoid backlog.

 

In fact   , the file reading of request  and  fs module is a readable data stream:

var fs = require('fs');  

var readableStream = fs.createReadStream('file.txt');  

var data = '';  

 

readableStream.setEncoding('utf8');  

// Each time the buffer is full, process a small chunk of data  

readableStream.on('data', function(chunk) {  

    data+=chunk;  

});  

// All file streams are read  

readableStream.on('end', function() {  

    console.log(data);  

});

Using pipeline technology, you can write the contents of one stream to another stream:

var fs = require('fs');  

var readableStream = fs.createReadStream('file1.txt');  

var writableStream = fs.createWriteStream('file2.txt');  

 

readableStream.pipe(writableStream);

Different streams can also be chained together, such as reading a compressed file, decompressing it while reading it, and writing the decompressed content to the file:

var fs = require('fs');  

var zlib = require('zlib');  

 

fs.createReadStream('input.txt.gz')  

  .pipe(zlib.createGunzip())  

  .pipe(fs.createWriteStream('output.txt'));

Node.js provides a very concise data flow operation, the above is a simple introduction.

to sum up

 

For high-concurrency long connections, the event-driven model is much lighter than threads, and multiple Node.js processes can be easily expanded with load balancing. Therefore, Node.js is very suitable for providing services for I/O-intensive applications. But the disadvantage of this approach is that it is not good at handling CPU-intensive tasks.

 

Node.js usually describes data in a stream, which also provides a good encapsulation.

 

Node.js is developed with a front-end language (JavaScript) and is also a back-end server, so it provides a good idea for the separation of front-end and back-end. I will analyze this in the next article.

 

 

Guess you like

Origin blog.csdn.net/aa67567456/article/details/111146001