Analysis of Browser Principles

foreword

This article explains the principle of the browser from entering the url in the browser to leaving the page.
The main sequence is as follows

  1. Enter the parsed url
  2. Obtain static files (TCP three-way handshake and four-way handshake)
  3. Rendering pages (html\css\browser operating mechanism)
  4. Execute js (browser thread and process, event loop)
  5. garbage collection mechanism
  6. Several node events of the browser
  7. Optimization from the perspective of browser principles

1. What does the browser do after entering the url?

  1. url parsing
  2. Check whether to use local cache ( another blog by the author ), if you use cache, ignore steps 3-8
  3. DNS (full: Domain Name System, middle: Domain Name System) resolution
  4. Establish a TCP connection (three-way handshake)
  5. The client initiates an HTTP request
  6. The server responds to the HTTP request
  7. The browser receives the response
  8. After the data transmission is completed, disconnect the TCP connection (wave four times), if Connection:Keep-Alive, do not disconnect
  9. client side rendered page

url parsing

Parsing url into: protocol, domain name, resource path
For example, parsing https://blog.csdn.net/qq_38217940/article/details/125349105
protocol: https
domain name: blog.csdn.net
resource path: /qq_38217940/article/details/ 125349105 (When visiting the page, there is an index.html under the path)

DNS resolution

Proceed as follows:

  1. Check browser cache for DNS information
  2. Check whether the host cache has DNS information
  3. Check whether the hosts file has been modified (the hosts file can change the ip to a domain name, but most people will not change it)
  4. Check whether the router cache has DNS information
  5. Query the DNS cache from the ISP (Internet Service Provider, such as Alibaba Cloud)
  6. DNS recursive query
    DNS recursive query, such as parsing https://blog.csdn.net/qq_38217940/article/details/125349105, the steps are as follows
    (1) root server (what is the root server, I will not introduce it here, if you are interested, check it yourself ) Query the server with the .net suffix
    (2) Query the server with the domain name blog.csdn according to the suffix .net server
    (3) Query on blog.csdn and return a certain ip such as 101.201.178.55, a domain name may have multiple ip (load balancing)
    (4) browser access 101.201.178.55

TCP three-way handshake and four-way wave

Simply understand the explanations of a few terms:

  1. TCP and UDP
  • User Datagram Protocol UDP (User Datagram Protocol):
    UDP does not need to establish a connection before transmitting data, and the remote host does not need to give any confirmation after receiving the UDP message. Generally used for instant messaging), such as: voice, video, live broadcast, etc.
  • Transmission Control Protocol TCP (Transmission Control Protocol):
    TCP provides connection-oriented services. A connection must be established before data transmission, and the connection must be released after data transmission.
    TCP is generally used in scenarios such as file transfer, sending and receiving emails, and remote login.
  1. Several important control bits of TCP messages
    SYN Synchronization control bit
    ACK Confirmation control bit
    FIN Termination control bit
    Precondition, only one signal can be sent at a time, used to understand handshake at least three times, wave at least four times

three handshake

  1. The browser sends a request connection SYN message to the service (the browser holds the server)
  2. After the server accepts the connection, it replies with an ACK message and allocates resources for this connection (the server holds the browser)
  3. After the browser receives the ACK message, it also sends an ACK message to the server and allocates resources (browser shakes the server)

Why three times instead of two?
In order to prevent the server from opening some useless connections to increase server overhead and prevent the invalid connection request segment from being suddenly transmitted to the server, resulting in an error.

Chestnut:
If I were a guest at your house, I would shake hands twice:
Me: Can I come to your house to play?
you can.
...30 minutes later
Me: I came by plane, why doesn’t your house have an airport?
Three handshakes:
Me: Can I come to your house to play?
you can.
Me: I’m going to fly there. Does your family have an airport where I can park?
You: I’m from Yujian when I go out. There’s no airport, so don’t come here.
Why not four or more? because there is no need

waved four times

  1. The browser sends a FIN (browser waves), which is used to close the data transmission from the browser to the server, and the browser enters the FIN_WAIT_1 state.
  2. After the server receives the FIN, it sends an ACK to the browser (the server waves), and the server enters the CLOSE_WAIT state.
  3. The server sends a FIN (the server waves) to close the data transmission from the server to the browser, and the server enters the LAST_ACK state.
  4. After the browser receives the FIN, the browser enters the TIME_WAIT state, sends ACK to the server (the browser waves), and the server enters the CLOSED state.

Why four waves?
FIN and ACK are not triggered at the same time. The middle confirmation is to ensure whether there is data being transmitted and to ensure that the data being transmitted is completely transmitted.

Chestnut:
You: I'm leaving
Me: Okay, I'll see if you haven't taken anything away.
Me: nothing left, let's go.
You: OK, I’m leaving
Two waving situations:
You: I’m leaving
Me: You go
...your plane didn’t leave
Three waving situations:
You: I’m leaving
Me: You go You
: Goodbye
...You the plane didn't leave

browser rendering mechanism

  1. Parse HTML and build a DOM tree
  2. Parse CSS and build a CSSOM tree
  3. Merge the DOM tree and CSSOM tree into a render tree (render tree)
  4. Calculate the position of each node in the render tree
  5. Draw the page through the graphics card/GPU

redraw

Redrawing: When some elements in the rendering tree need to update attributes, and these attributes only affect the appearance and style of the elements, but will not affect the layout operations, such as background-color, we call such operations redrawing.

reflow (rearrangement)

When a part (or all) of the rendering tree needs to be rebuilt due to changes in the size, layout, and hiding of elements, it will affect the layout operation. We call such an operation reflow.
Any operation that changes the geometric information of the element (the position and size of the element) will trigger reflow.

  1. Add or remove visible DOM elements (append, removeChild, etc.)
  2. Element size changes - margin, padding, border, width and height
  3. Content changes, such as the user enters text in the input box
  4. The size of the browser window changes - when the resize event occurs
  5. Calculate offsetWidth and offsetHeight properties
  6. Set the value of the style attribute
  7. Modify the default font for web pages.

avoid reflux

Reflow must trigger redrawing, redrawing does not necessarily mean reflow, and the performance consumption of reflow is relatively large, so reflow should be avoided

  1. Use elements such as documentFragment or div for caching operations, first add all the elements to be added to a div, and finally append this div to the body. as follows
 var ul = document.createElement('ul');
 var fragment = document.createDocumentFragment();
 for(var i=1; i<101;i++){
    
    
     var li = document.createElement('li')
     var liText = document.createTextNode(i);
     li.appendChild(liText);
     fragment.appendChild(li);
 }
 ul.appendChild(flag);
 document.body.appendChild(ul);
 // 创建div的也一样道理,不过就得多加一层div了
let div = document.createElement('div')
  1. First display:none hides the element, then performs all operations on the element, and finally displays the element. Because operations on display:none elements will not cause reflow or redrawing.
  2. Assign the attributes (offsetWidth, offsetHeigh) that cause reflow to variables, cache them, and use the variables directly when needed.
  3. For complex animation effects, use absolute positioning (postition:absolute) to keep it out of the document flow

blocking loading

  1. When our browser obtains the HTML file, it will be loaded from top to bottom, and parsed and rendered during the loading process.
  2. Loading refers to the process of obtaining resource files. If external CSS files and images are encountered during the loading process, the browser will send another request to obtain CSS files and corresponding images. This request is asynchronous and will not affect Loading of HTML files.
  3. However, if a Javascript file is encountered, the HTML file will suspend the rendering process and wait for the JavaScript file to be loaded before continuing to render.
    Why does HTML need to wait for JavaScript? Because JavaScript may modify the DOM, causing subsequent HTML resources to be loaded in vain, the HTML must wait for the JavaScript file to be loaded before continuing to render, which is why the JavaScript file is written before the bottom body tag.

Browser process and thread

The concept of process

  • A process is the smallest unit of CPU resource allocation (it is the smallest unit that can own resources and run independently, and will not share resources between processes), and each APP has at least one process, such as browser, QQ, WeChat
  • Thread is the smallest unit of CPU scheduling (a thread is a program running unit based on a process, there can be multiple threads in a process, and the resources of the process are shared between multiple threads)
  • It is also possible to communicate between different processes, but the cost will be relatively high

The browser is multi-process, and each tab is a separate process, and each tab contains the following processes:

  • Browser process (Browser process): It is mainly responsible for interface display, user interaction, sub-process management, and provides storage and other functions.
  • Rendering process (multi-threaded): The core task is to convert HTML, CSS, and JavaScript into web pages that users can interact with. Both the typesetting engine Blink and the JavaScript engine V8 run in this process. By default, Chrome will render for each Tab tags create a render process. For security reasons, the rendering process runs in sandbox mode.
  • GPU process: At the beginning, in order to achieve the effect of 3D CSS, then the UI interface of web pages and Chrome all choose to use GPU to draw
  • Network process: mainly responsible for loading network resources of the page
  • Plug-in process: mainly responsible for running the plug-in
  • audio process

The thread of the browser refers to the rendering process, which includes

  • GUI rendering thread
  • JS engine thread
  • event trigger thread
  • timed trigger thread
  • Asynchronous HTTP request thread (IO thread)

js single thread features:

  • The js thread and the GUI rendering thread are mutually exclusive, so when js is executed, the rendering thread is pending
  • Although event triggering, timers, and asynchronous HTTP requests are all independent threads, js is single-threaded, so you have to wait for js to finish executing

Using Web Workers to enable js multithreading can be used to process transactions that block js threads and improve performance. See this for details .

event loop

Mechanism: js is single-threaded and executed in code order. After execution, look at the microqueue. If there are microtasks, execute microtasks. Then look at the macroqueue. If there are macrotasks, execute macrotasks.
Look at the following three questions:
(1) If a timer is up during js execution, should the timer be executed first or the code should be executed first?
(2) During the code execution of the click event, the timer time is up. Should the timer be executed first or the code in the click event be executed first?
(3) If the click event is triggered during js execution, should the click event be executed first or continue to execute the code?
No doubt, as long as js is executing, no matter what event or task it is, it cannot be interrupted.
In fact, the page is in a stuck state when js is executing, and there is no code that is executing and can trigger a click event at all. situation, but usually we don't write code to jam the page on purpose (unless it is badly written). If you don’t believe f12, open the code below to see if you can click to move the page

for(var i=0;i< 1000000000, i++){
    
    
	console.log(1)
}

Macrotasks and Microtasks

  • What are macro tasks and micro tasks? Simply put, it is an asynchronous task.

  • Macro tasks include
    setTimeout, setInterval, setImmediate (Node unique), ajax
    requestAnimationFrame (browser unique), I/O, UI rendering (page rendering)

  • Microtasks include
    promise, async, await

  • The difference between macro tasks and micro tasks (the author's personal opinion, readers can distinguish by themselves):
    (1) The content contained in the macro tasks is on a separate thread, while the micro tasks are all on the js thread
    (2) Two threads The performance consumed is definitely more than that consumed by one thread, so the performance consumption of microtasks will be smaller

  • Execution order
    Macro tasks and micro tasks are first-in-first-out. After the main event loop ends, the micro tasks are executed first and then the macro tasks are executed.
    See the code for specific understanding

console.log('1');
setTimeout(function() {
    
    
    console.log('10');
    new Promise(function(resolve) {
    
    
        console.log('11');
        resolve();
    }).then(function() {
    
    
        console.log('12')
    })
},10)
setTimeout(function() {
    
    
    console.log('7');
    new Promise(function(resolve) {
    
    
        console.log('8');
        resolve();
    }).then(function() {
    
    
        console.log('9')
    })
})
var a = new Promise(function(resolve) {
    
    
    console.log('2');
    resolve();
}).then(function() {
    
    
    console.log('5')
})
var b = new Promise(function(resolve) {
    
    
    console.log('3');
    resolve();
}).then(function() {
    
    
    console.log('6')
})
cosonle.log('4')
// 输出1,2,3,4,5,6,7,8,9,10,11,12
// async是立即执行,没啥好说的
// await按照Promise来理解,里面的相当于resolve之前,函数后为then,比如把a改成下面这样,执行顺序不变,这里就不多介绍了
async function a(){
    
    
	// 这里要立即执行
    await function(){
    
    
    	console.log('2')
    }()
    console.log('5');
}
a();

event flow

Background knowledge
Event bubbling and event capture were respectively proposed by Microsoft and Netscape to solve the problem of event flow in the page, that is, the timing of event triggering between elements. Of course, the long-term must be combined. After the heated debate between Microsoft and Netscape, the W3C's compromise solution was finally adopted-first capture and then bubble.
Glossary

  • Events: Events are moments when a particular interaction occurs between the document and the browser window.
  • Event flow: Event flow refers to the order in which events are accepted in a page
  • Event capture: From top to bottom, the root element receives events first, and the target element receives events last.
  • Event bubbling: from bottom to top, the target element receives the event first, and then goes up step by step, and finally the root element receives the event.

Under normal circumstances, js executes event bubbling by default (the third parameter of addEventListener is false by default), and when the third parameter of addEventListener is true, it is the monitoring event capture.

The event flow is divided into the following three stages
(1) event capture
(2) target acceptance event
(3) event bubbling

event delegation

Also called event proxy. Using the principle of event bubbling, the events triggered by the child are bound to the parent.
Advantages of event delegation:
(1) Reduce multiple bindings and improve program performance
(2) Dynamically added sub-elements can also automatically obtain events

 <ul id='list'>
 	<li>1</li>
 	<li>2</li>
 	<li>3</li>
 </ul>
const ul = document.querySelector('#list')
ul.addEventListener('click',function(e){
    
    
  const target = e.target
  if(target.nodeName === 'li'){
    
    
		console.log(target.nodeName)
	}
},false)
  • js prevents event capture and bubbling: e.stopPropagation
  • js prevents the default event: e.preventDefault()

garbage collection

What is trash?

what is not needed is garbage

How to recycle garbage?

mark reclamation algorithm

  1. Mark
    Starting from the root node (Root), traverse all objects.
    An object that can be traversed is reachable.
    Objects that have not been traversed, unreachable
  2. Recycle unreachable objects
  3. memory organization

when to collect garbage

When the browser performs garbage collection, it will pause the JavaScript script and wait for the garbage collection to complete before continuing to execute.
For ordinary applications, this is no problem, but for JS games and animations that require high continuity, if the pause time is long, the page will freeze.
When garbage collection occurs is important.

  • Generational collection
    According to the life cycle of variables, temporary objects belong to the new era, long-term objects belong to the old generation, temporary variables are recycled when they are used up, and long-term objects such as window and dom are recycled later (such as closing the browser tab)
  • Incremental collection
    divides the garbage collection work into smaller chunks, and processes one part at a time, multiple times.
  • Idle collection
    Attempts to run when the CPU is idle to reduce possible impact on code execution.

Closure

The official definition of closure (MDN)
A function is bundled with references to its surrounding state (or the function is surrounded by references), such a combination is a closure (closure). That is, closures allow you to access the scope of an outer function from within an inner function. In JavaScript, whenever a function is created, a closure is created at the same time the function is created.
Why do you say closure? First of all, we need to know the function of closure

  • Hiding variables to avoid polluting global variables
  • Provides indirect access to local variables
  • Avoid Garbage Collection (emphasis added)

Application scenarios of closures
The author has never used them, because object-oriented programming rarely needs to use closures. From the listed three functions of closures, we can know that object encapsulation, inheritance, and polymorphism are inherently available. (personal opinion)

weakMap weakSet

The references in WeakSet and weakMap are not included in the garbage collection mechanism. Therefore, WeakSet and weakMap are suitable for temporarily storing a group of objects and storing information bound to objects. As long as these objects disappear externally, their references in WeakSet and weakMap will disappear automatically. They are all used to solve the memory leak problem. (The author is too good, I haven't used it yet, it will be very useful in node development)

What should I pay attention to after knowing the principle of garbage collection?

1. Reduce the use of global variables, such as minimizing the number of objects mounted on the window.
2. Remember to recycle used objects to avoid memory leaks (garbage that is not recycled in time is a memory leak), such as

  • addEventListener, removeEventListener this kind of global monitoring should be used in pairs
  • Objects mounted on the window need to be manually reclaimed (assigned to null)
  • In frameworks such as vue, beforeDestroy should destroy global variables such as on, on,o n , off, Vuex's $store, third-party libraries, etc.
  • Use weak references weakMap, weakSet
  • Closures don't cause memory leaks, because you don't want variables to be recycled unless you abuse them.

3. Memory leaks will cause the page to freeze or even crash. When the page crashes, it is likely to be caused by a memory leak. For memory leaks, it is necessary to check which garbage has not been recycled. When checking, you can see which global variables are needed but forgotten recycled

Several node events of the browser

1. load and unload, load and close, generally execute js after onload, some disgusting websites will play an alert when unloading, preventing you from closing the page 2, onpopstate, hashchange,
statechange are triggered when the window history (url) changes, Routing can be implemented, if you are interested, you can see this

Look at optimization from the perspective of browser principles

  1. Reduce http requests
    Image merging
    Use caching
    js, css compression merging
    Reduce back-end interface calls, the merged interface will be merged into one
    request using keep-alive
  2. Reduce static file volume
    js compression, css compression
    Image use base64
    code splitting
  3. To reduce blocking,
    put the js behind the html.
    Use img for large pictures. If you want to see pictures earlier, put them in the background
    without iframe or
    lazy loading .
  4. Code level
    Reduce reflow, use position flexibly, try to use class instead of style to
    reduce code size (code redundancy)
    use web worker to process long-term calculation transactions
    asynchronously execute code
    global variables are recycled in time to avoid memory leaks

Guess you like

Origin blog.csdn.net/qq_38217940/article/details/125472202