In-depth analysis of data storage in memory

Classification of data types

To understand the storage of data in memory, we must understand the data type.
There are many types of data, and different data types have different storage methods.

Understand the meaning of data types :

  1. Use this type to open up the size of the memory space (the size determines the scope of use).
  2. How to look at the perspective of memory space.

Basic classification of data


 - 整型
    - char
       - unsigned char
       - signed char
    - short
       - unsigned short [int]
       - signed short [int]
    - int
       - unsigned int
       - signed int
    - long
       - unsigned long [int]
       - signed long [int]
 - 浮点数
   - float
   - double
 - 构造类型
   - 数组类型
   - 数组类型
   - 结构体类型 struct
   - 枚举类型 enum
   - 联合类型 union
 - 指针类型
   - int *pi;
   - char *pc;
   - float* pf;
   - void* pv;
 - 空类型

Different types have different storage methods.

Integer storage in memory

The size of the memory opened by the shaping in the memory is determined by different types.

And how is it stored in memory?
Let's take char as an example (except for the different sizes of open space, all storage methods for integers are the same).

Expanded understanding: how to represent negative numbers in computers? :Https://blog.csdn.net/qq_40893595/article/details/104439660
integer promotion: https://blog.csdn.net/qq_40893595/article/details/104441826

Original code, inverse code, complement

The data stored by the computer is stored in binary form, and there are three ways to represent the number of symbols in the computer, namely the original code, the inverse code and the complement code.

The three representation methods have two parts: the sign bit and the value bit. The sign bit uses 0 to indicate "positive" and 1 to indicate "negative", and the three representation methods for the value bit are different.

The original code
can be directly translated into binary in the form of positive and negative numbers.
Inverse code
Leave the sign bit of the original code unchanged, and the other bits can be obtained by inverting them bit by bit.
Complement code
Inverse code +1 is complement code.

The original, inverse and complement of positive numbers are all the same.

For shaping: the data storage memory actually stores the complement code.

why?
In computer systems, values ​​are always represented and stored in complements. The reason is that the use of complement, sign-bit sum may be unitary range; the same
time, addition and subtraction may be unitary (CPU adder only) In addition, complement with the original code conversion, which is the same operation process, No
additional hardware circuit is required.

For unsigned types, it is also directly stored as one's complement. The difference is that signed negative numbers need to restore the stored complement to the original output.

Big end little end

https://blog.csdn.net/qq_40893595/article/details/104660426
what big end and little end:

Big-endian (storage) mode means that the low-order bits of the data are stored in the high address of the memory, while the high-order bits of the data are stored in the low address of the memory; the
little-endian (storage) mode means that the low-order bits of the data are stored in the low address of the memory. In the address, the high bit of the data is stored in the high address of the memory.

Why are there big-endian and little-endian:
Why is there a big-endian model? This is because in computer systems, we use bytes as the unit, and each address unit corresponds to a
byte, and a byte is 8 bits. However, in addition to the C language 8bit of char, as well as the short type 16bit, 32bit long-type (having depends
compiler body), Further, the processor more than 8 digits, for example, 16 or 32 For bit processors, since the register width is larger than one byte
, there must be a problem of arranging multiple bytes. This led to the big-endian storage model and the little-endian storage model.
For example, a 16-bit short type x, the address in the memory is 0x0010, and the value of x is 0x1122, then 0x11 is the high byte, and 0x22
is the low byte. For big-endian mode, put 0x11 in the low address, which is 0x0010, and 0x22 in the high address, which is 0x0011. The little-
endian model is just the opposite. Our commonly used X86 structure is little-endian mode, while KEIL C51 is big-endian mode. Many ARMs and DSPs are in little
endian mode. Some ARM processors can also choose big-endian mode or little-endian mode by hardware.

Small machine:Insert picture description here

We can also use the code to determine whether it is big-endian or little-endian:

Baidu 2015 System Engineer Written Test Questions:
Please briefly describe the concepts of big-endian and little-endian, and design a small program to determine the endian of the current machine. (10 points)

//代码1
#include <stdio.h>
int check_sys()
{
    
    
 int i = 1;
 return (*(char *)&i);
}
int main()
{
    
    
 int ret = check_sys();
 if(ret == 1)
 {
    
    
 printf("小端\n");
 }
 else
 {
    
    
 printf("大端\n");
 }
 return 0; }
//代码2
int check_sys()
{
    
    
 union
 {
    
    
 int i;
 char c;
 }un;
 un.i = 1;
 return un.c; }

Storage of floating-point numbers in memory

int main()
{
    
    
 int n = 9;
 float *pFloat = (float *)&n;
 printf("n的值为:%d\n",n);
 printf("*pFloat的值为:%f\n",*pFloat);
 *pFloat = 9.0;
 printf("num的值为:%d\n",n);
 printf("*pFloat的值为:%f\n",*pFloat);
 return 0; 
 }

Insert picture description here
From the above code, we can see that the int and float types that are both 32bit are different from the storage point of view.

So how should floating-point numbers with decimal points be stored?

International Standard IEEE (Institute of Electrical and Electronic Engineering) 754

We can think of using standard scientific notation when expressing a long decimal or a long large number in junior high school.

Standard scientific notation requirements:

  1. Represents the format as x*10^y
  2. The range of x is (1, 10)

So we can use this method to record floating point numbers.
Similarly, decimal can be multiplied by 10 ^y, and binary can be multiplied by 2 ^y.

According to the international standard IEEE (Institute of Electrical and Electronic Engineering) 754, any binary floating point number V can be expressed in the following form:

  • (-1)^S * M * 2^E
  • (-1)^s represents the sign bit, when s=0, V is a positive number; when s=1, V is a negative number.
  • M represents a significant number, greater than or equal to 1, and less than 2.
  • 2^E represents the exponent bit.

For example:
5.0 in decimal, 101.0 in binary, equivalent to 1.01×2^2. Then, according to the format of V above, s=0,
M=1.01, and E=2 can be obtained .
-5.0 in decimal, -101.0 in binary, equivalent to -1.01×2^2. Then, s=1, M=1.01, and E=2.

IEEE 754 stipulates: For 32-bit floating-point numbers, the highest 1 bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 23 bits are the significant digits M.
Insert picture description here
For a 64-bit floating point number, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant digit M.
Insert picture description here
We can find that S and E can be expressed in this way but M is still a decimal point. How to express it?

If we look at the requirements, we can know that M can only be 1 before the decimal point, that is to say, M can be written in the form of 1.xxxxxx, where xxxxxx represents the decimal part.
Then we only need to store the number after the decimal point, and add the decimal point and 1 when taking it out.

As for the index E, the situation is more complicated.
First, E is an unsigned int. This means that if E is 8 bits, its value range is 0~255; if E is 11 bits, its
value range is 0~2047.

However, we know that E in scientific notation can appear negative, so IEEE 754 stipulates that the true value of E must be added to an intermediate number when stored in memory. For 8-bit E, the intermediate number is 127; For an 11-digit E, the middle number is 1023. For example, the E of 2^10 is 10, so when saving as a 32-bit floating point number, it must be saved as 10+127=137, which is 10001001.

Then, the exponent E is taken out of the memory and can be divided into three situations:

  1. E is not all 0 or not all 1

At this time, the floating-point number is expressed by the following rule, that is, the calculated value of the exponent E is subtracted from 127 (or 1023) to get the true value, and then the
first digit 1 is added to the effective number M. For example: the binary form of 0.5 (1/2) is 0.1, because the positive part must be 1, that is, the decimal point is shifted to the right by 1 place,
then it is 1.0*2^(-1), and its order code is -1+127= 126 is represented as 01111110, and the mantissa 1.0 removes the integer part to 0, and fills in 0 to 23 bits
00000000000000000000000, then its binary representation is: 0 01111110 00000000000000000000000

  1. E is all 0

At this time, the exponent E of the floating-point number is equal to 1-127 (or 1-10223), which is the true value, and the significant number M is no longer added to the first digit 1, but is reduced to
a decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to zero.

  1. E is all 1

At this time, if the effective digits M are all 0, it means ± infinity (positive or negative depends on the sign bit s);

Remember the result of the code at the beginning, why is the integer 1091567616?

Now, let us return to the question at the beginning: why is 0x00000009 reduced to a floating point number and becomes 0.000000? First, split 0x00000009 to get the first sign bit s=0, the next 8 bits of exponent E=00000000, and the last 23 bits of significant digits M=000 0000 0000 0000 0000
1001.

9 -> 0000 0000 0000 0000 0000 0000 0000 1001

Since the index E is all 0, it conforms to the second situation in the previous section. Therefore, the floating-point number V is written as: V=(-1)^ 0 ×0.00000000000000000001001×2^ (-126)=1.001×2^(-146) Obviously, V is a small positive number close to 0, so Expressed as a decimal fraction is 0.000000.
Look at the second part of the example. How does the floating-point number 9.0 use binary representation? How much is it reduced to decimal? First, the floating-point number 9.0 is equal to
1001.0 in binary , which is 1.001×2^3.

9.0 -> 1001.0 ->(-1)^ 01.0012 ^3 -> s=0, M=1.001,E=3+127=130

Then, the sign bit of the first digit s=0, the effective digit M is equal to 001 followed by 20 0s, making up 23 bits, and the exponent E is equal to 3+127=130, which is 10000010. So, written in binary form, it should be S+E+M, that is

0 10000010 001 0000 0000 0000 0000 0000

This 32-bit binary number, reduced to decimal, is exactly 1091567616.

Guess you like

Origin blog.csdn.net/qq_40893595/article/details/105149835