Understand stack memory and heap memory in seconds (deep into the bottom layer)

general understanding

Many people do not have a thorough understanding of memory.

Only know that basic data types (value types) are stored in "stack memory". Reference data types (object types) are stored in "heap memory".

memory concept

First of all, both the stack and the heap are memory models created by the JS engine or JVM virtual machine and other operating environments, and are managed and controlled by the CPU.

Since it is a model, it is fictitious, and it is a memory management method and model concept.

Of course, this model is exactly what we need to know. And the real physical memory only needs to be understood.

In addition to dividing memory into stack memory and heap memory, the memory model also has constant pools and method areas.

About Stack

"Stack" has the characteristics of threads and "first in, last out", that is, each stack frame generally saves the address of the next stack frame, pointing to the next node (that is, pointing to the next stack frame), thus forming a queue-like chain hand in hand formula structure. At the same time, the ones that are pushed into the stack first will be executed first, and the ones that are pushed into the stack later will be popped first (destroyed after execution).

A thread is the smallest execution unit of the CPU, and it is a single sequential control flow. Each thread has its own call stack.

The stack can be divided into the thread stack of the CPU (save the "execution context" in the CPU for task scheduling) and the space stack of the memory (save resources such as variables in the memory for CPU call execution).

And what we call stack memory generally refers to the space stack of memory.

Heap memory (Heap)

It is the largest memory area in the memory, and it is also the memory area shared by each thread. All object instances (or complex type information) are stored in the heap memory.

The relationship between stack and heap

Objects are stored in heap memory as reference types, but are accessed through pointers stored on the stack.

Objects may contain properties and methods, and these properties and methods may contain variables of primitive types. Therefore, some variables in the heap memory will be placed in the stack memory for execution during the declaration and call process.

pointer

People usually refer to "memory address" as "pointer" vividly.

Both stack memory and heap memory find values ​​(basic data, objects, function bodies, etc.) through pointers.

The virtual address space (pointer page table, Map or table structure) established by the CPU to manage memory maps virtual addresses to physical addresses.

pointer page table

The Map page table for managing memory is created by the operating system according to certain rules. It is jointly maintained by various user spaces of virtual memory, and different processes (applications) manage their own exclusive user spaces.

Every time the data is modified, the Map table is updated to record which physical memory spaces have been used and the number of memory data calls.

+1 for each call, -1 for each recycling. If there is no call, that is, the number of calls = 0, it will be allowed to modify this memory space.

That is to say, all memory does not need to delete data, since no one uses it anyway, it is equivalent to a blank area, overwrite and write new data, and then update the first address and last address of this data to the page table, indicating that someone has used it .

Every time the computer is restarted, the pointer page table will be re-established.

variable

The variable name is an alias for the stack memory pointer.

Because you can declare variables first and then assign values.

Declaring a variable is to create variable information in the pointer page table. And assignment is the real opening up of memory space. But in order to save memory, it will first check whether there is the same value in the memory. If there is, update the found memory address to the corresponding map page table. If not, it will open up memory space, write data, and update the memory address. Update to the pointer page table.

So the variable name and value data are stored separately.

Just because variables and values ​​are stored in two separate places, there is the feature of "stack memory data sharing". As a result, different variable names point to the same memory, and the same variable name points to different memory.

The memory address that holds the variable name is called a pointer variable.

pointer variable

The pointer variable is the unique primary key stored in the pointer page table. In addition to pointer variables, the pointer page table also saves "variable name", "data type" (the size of the stack memory space is allocated according to the type when assigning a value), "pointer" (the address of the physical memory space where the value is located), "scope" ( Complex tree structure, related to reading and recycling of variables) and other information.

And these pointer variables will be loaded into the two objects of GO (Global Object global object, also called global execution context) or AO (Activation Object active object, also called function execution context).

GO and AO are collectively referred to as VO (variable object), although this concept is outdated. In the latest ECMA version specification, each execution context is associated with a variable environment (Variable Environment, VE for short)

That is to say, information such as variables and scopes in the memory is created and managed by the "execution context" (also called the execution environment, or VE) during the running of the program.

Execution Context

The execution context is the execution environment generated in the CPU and memory when the program code is compiled and executed (that is, the required things, such as variable objects, scope chains, this pointers, etc.).

Once the "execution context" is created, it will enter the "execution context" stack, and the execution order is managed by the thread stack.

The execution context is generally divided into "global execution context" (there is only one, which is created by the browser, which is often referred to as the window object) and "function execution context" (when the function is executed, it will create a called execution context The internal object will be destroyed once it is executed).

It can be seen that the execution context is an object, which is stored in the heap memory, and the pointer is assigned to the execution stack. During the execution process, the CPU parses the data in the heap memory, and creates many stack and heap references according to different data types, thus forming a complex execution relationship or environment.

scope

Scope is used to determine the scope of use of variables, to ensure that variables can be read, and destroyed in time after use. Scope in JS is divided into global scope and function scope, and function scopes can be nested with each other.

var a = 1
function fn() {
    var a = 2
    function bar() {
        console.log(a)
    }
    bar()
}
fn()

The nesting relationship of each scope forms a "scope chain". The scope chain of the bar function in the example is bar -> fn -> global.

The scope chain is mainly for querying identifiers (variables and functions) . Identifier resolution is the process of searching for identifiers along the scope chain level by level, and the scope chain is to ensure orderly access to variables and functions. .

(1) If the variable is declared in its own scope, there is no need to use the scope chain.

(2) If the variable is not declared in its own scope, you need to use the scope chain to find it.

If no variable is found in the scope chain, an exception of ReferenceError: xxx is not defined will be thrown , that is, a reference exception, that is, the variable is not declared. All variables should be declared before use (except for an undeclared, automatic global assignment, which should be avoided as much as possible).

If a variable is found but not assigned a value (that is, memory is not allocated), the variable value will be undefined (unspecified, undefined), that is, uninitialized. It is not the same as assigning to null. The value of null in memory is 0000 0000 (binary, occupying one byte, and the corresponding ASCII code is NULL).

garbage collection

When the variable is assigned a value of null, that is, the value address in the pointer page table is pointed to null, indicating that the variable can be released. Whether to release or destroy the variable depends on whether the variable is referenced elsewhere in the scope.

Variable release is not directly related to whether the value of the variable is null. Only variables that are null will be retrieved and released first.

The garbage collection mechanism automatically scans the pointer page table at intervals, retrieves all variables, judges the number of references, and at the same time checks whether the variable is referenced on the scope tree. If the number of reference marks is 0, or no reference to the variable was found by scanning. Then the variable will be released, that is, deleted from the pointer page table.

Graphical Memory Model

shared and exclusive

The method area, constant pool, and heap memory are shared by all processes.

Stack memory and program counter are exclusive to each process.

Counter and method area

The program counter is used to store the address of the unit where the next instruction is located, and the CPU reads the instructions through it to ensure the order of execution. At the same time, the role of the counter is to count the executed code. If the same code is executed multiple times (the greater the weight), put it in the method area, and run it directly without repeating compilation and analysis next time to increase the speed.

In addition to saving the compiled code, the method area also saves local methods, such as the new Object() method and toString() method of V8 engine built-in classes such as Object, Function, and Array.

constant pool

The constant pool is dedicated to storing strings. Because a large number of strings need to be processed during code execution, creating and recycling continuously consumes too much memory, so the memory space is allocated separately to reduce memory overhead. It is dedicated to the special area, only one data type, and improves the speed at the same time.

All variable names are strings. Therefore, variable names are stored in the constant pool.

user space

User space is a portion of virtual memory that is private to each process.

The operating system allocates a separate set of spaces for each process, so that each process does not interfere with each other.

Each user space contains its own stack memory and program counter.

cache

A cache is equivalent to a small block of memory. To find a needle from the sea, and to find a needle from the washbasin, it must be faster to find it from the washbasin.

There are many kinds of cache, generally referring to the cache of the CPU. There are also hard disk cache, memory cache, graphics card cache, and so on.

The role of cache: (improve hit rate, reduce latency, reduce memory overhead)

1. Store some frequently accessed data in the cache. 2. Temporarily store time-consuming calculation results. 3. Dedicated IO channel to improve read and write speed.

data storage

binary

All computer information data is represented by binary 0 and 1 .

bit (bit)

The smallest unit of memory is a "bit", which is like a switch, either 0 or 1.

Byte (byte)

The most basic storage unit of memory is "byte", and a byte is composed of 8 bits, just like a group represented by 8 switches.

It is the basic unit that can represent meaningful characters.

ASCII code

Find the corresponding ASCII code through 2^8 (256) "bytes", and output the most basic characters we know, such as a, A, 1, 2, +, -, !, ?, etc.

ASCII编码中0000 0000(一个字节刚好8位)表示null(空),0011 0000表示0,0011 0001表示1,0011 0010表示2,0100 0001表示A,0100 0010表示B,0110 0001表示a,0110 0010表示b,0010 1011表示+,0100 0000表示@,0010 0101表示% 等等。

The Chinese character encoding in the computer is extended from the ASCII encoding, and a Chinese character symbol is represented by double bytes (2 bytes, two sets of switches).

byte addressing

The pointer page table uses a string of numbers consisting of 0 and 1 as the identifier of the byte. This process is called byte addressing. This string of numbers consisting of 0 and 1 is called the address of the byte or the pointer of the byte.

On a 32-bit system, a pointer consists of 4 bytes.

On 64-bit systems, a pointer size consists of 8 bytes.

( 4 bytes, 8 bits per byte, each bit can have two values ​​of 0 or 1, 2^4*8, that is, 2 to the 32nd power, the knot is 4G. )

1bit最多能表示0或1,共2个数(即2^1次方)
2bit最多能表示00或01或10或11,共4个数(即2^2次方)
3bit最多能表示000或001或010或011或100或101或110或111,共8个数(即2^3次方)
以此类推,1Byte=8bit,最多能表示256个数(即2^8次方)。
那4Byte=32bit,最多能表示2^8*4,即2的32次方,就是4G个数据。
而Byte是内存的基本单元,也就是一个单元就是1B。
因此内存用Byte来表示最基本大小而不是bit,所以不能再将4GB内存理解为4G*8b。

因此,32位系统中指针页表最多只能分配4G个指针。总共就4G个编号,超过4G的内存就没有多余编号来指定了。

也就是说,32位系统最大能管理的内存是4GB。

现在知道为什么XP系统明明插上8G、16G甚至更多的内存也只能认到4G内存的原因了吧。

MB和Mb的区别

byte是“字节”的意思,用大写B表示。bit是“位”的意思,用小写b表示。

就问你有没有被宽带的网速欺骗过?100M宽带,你以为下载速度是100MB/s,结果其实是100Mb/s,下载时最大显示12.5MB/s,差了8倍,惊不惊喜?

物理内存结构

硬件内存架构没有区分栈和堆。对于硬件而言,所有的栈和堆都分布在主内存中。部分栈和堆可能有时候会出现在CPU缓存中和CPU内部的寄存器中。栈和堆是由操作系统制定规则,通过虚拟机(如JVM、V8引擎等)动态创建的内存模型的一个抽象概念。

实际上,内存是连续的“字节”单元,由首地址和未地址来确定数据的大小。内存未分配或回收后就会形成空白区域,有些空白区域太小就保存不了大的数据,因此内存需要给数据寻找合适的空白区域来存放。

内存安全漏洞

单机游戏通常可以用“内存修改器”软件来作弊,比如通过多次搜索和修改金钱的数值找到该参数所在的内存地址来修改金钱。

因此内存在计算机中是共享的,也就是不同的软件程序可以访问和修改同一块内存数据。

比如我们在内存中查找"123"这个数据,会找到0x00400000,0x037F0000,0x052E0000,0x07061E68等几十上百个指针。这些指针不是物理内存的指针,而是程序参数(即变量名)的指针。
如果确认0x052E0000就是保存金钱的指针,它在指针页表里对应的变量名是menoy,值是123对应的内存指针,现在我们修改为9999,操作系统的内存读写机制就会寻找或创建9999的内存地址,然后更新到指针页表的0x052E0000指针变量中。

图解内存数据区

我们读取或修改变量就是CPU和内存等计算机硬件根据“执行上下文”对内存进行寻址和修改的过程。

我以前对内存错误的理解存储方式:

我原先以为内存数据就像http请求一样,会有请求头和请求体。以为每个内存数据的前半段保存指针,后半段保存具体值数据。

而物理内存正确的存储方式应该是下图所示:

1、指针0001H等同于0x0001,都是16进制表示方式 。

十六进制(Hexadecimal,简写为H)用数字0-9和字母a-f(或其大写A-F)表示0到15,计算方法是逢16进1。

2、内存本身不存在指针。指针是虚拟的,是操作系统对内存字节的编号方式。将编号保存到对照表中就形成所谓的指针,即内存地址。

内存只是字节单元,也就是一组一组开关,每一组开关表示的数值可以不断在改变。内存地址是操作系统对每一个字节的编号。

内存中每个字节的值可以改变,但是指针即地址序号不会变。

因为内存中可能需要多个字节才能表示一个数据,因此才有首地址和末地址的说法。末地址减去首地址所得到的字节个数,就是数据的大小(或叫长度)。

比如一个内存数据的首地址为3001H,未地址为7000H,它能表示多少存储空间?7000H-3001H+1=4000H(十六进制)=16384(十进制)=16K字节,即该数据的大小为16KB。

3、如果4个字节的内存,却赋值给它8个字节的值,写入数据的时候就会发现空间不够,从而抛出“内存溢出”的异常。

内存溢出

内存溢出是指程序在申请内存时,没有足够的内存空间供其使用,出现out of memory。

比如内存用完了或者申请了一个int,但给它存了double才能存下的数,那就会内存溢出。

什么情况会导致栈内存溢出?

(1)栈帧过多导致栈内存溢出。比如方法的递归调用,没有设置一个正确的结束条件,不断调用自己,每次调用都会分配一个栈帧,导致栈内存溢出。

(2)栈帧过大导致内存溢出。就像前面说的用int空间去存double的值。

内存泄露

内存泄露是指程序在申请内存后,无法释放已申请的内存空间,一次内存泄露危害可以忽略,但内存泄露堆积后果很严重,无论多少内存,迟早会被占光。

比如,栈内存指向堆内存的地址丢失,导致无法及时回收。

内存特点

1、栈内存特点

数据一执行完毕,变量会立即释放,节约内存空间。

优势:存取速度比堆要快,仅次于直接位于CPU中的寄存器。

缺点:存在栈中的数据大小与生存期必须是确定的,缺乏灵活性。

2、堆内存特点

堆内存中所有的实体都有内存地址值,内存释放靠垃圾回收机制不定时的收取。

堆的优势:可以动态地分配内存大小。

缺点:由于要在运行时动态分配内存,存取速度较慢。

3、栈内存的数据共享:

栈有一个很重要的特殊性,就是存在栈中的数据可以共享。假设我们同时定义:

int a = 3;

int b = 3;

编译器先处理int a = 3;首先它会在栈中创建一个变量为a的引用,然后查找栈中是否有3这个值,如果没找到,就将3存放进来,然后将a指向3。接着处理int b = 3;在创建完b的引用变量后,因为在栈中已经有3这个值,便将b直接指向3。这样,就出现了a与b同时均指向3的情况。

这时,如果再令a=4;那么编译器会重新搜索栈中是否有4值,如果没有,则将4存放进来,并令a指向4;如果已经有了,则直接将a指向这个地址。因此a值的改变不会影响到b的值。

Guess you like

Origin blog.csdn.net/qq_21379779/article/details/128695080