<C language> Data storage in memory

1. Data type introduction

The basic built-in types in C language are as follows:

char        //字符数据类型
short       //短整型
int         //整型
long        //长整型
long long   //更长的整型
float       //单精度浮点数
double      //双精度浮点数

The meaning of the type:

1. Use this type to open up the size of the memory space (the size determines the scope of use).

  1. How to view the perspective of the memory space.

1.1 Basic classification of types

Integer family:

char
    unsigned char
    signed char
short
    unsigned short [int]
    signed short [int]
int
    unsigned int
    signed int
long
    unsigned long [int]
    signed long [int]

Note: Some compilers of char do not specify whether it is unsigned or signed

Floating point family:

float
double

Construction type:

> 数组类型
> 结构体类型 struct
> 枚举类型 enum
> 联合类型 union

Pointer type:

int *pi;
char *pc;
float* pf;
void* pv;

empty type:

void means empty type (no type)

Usually applied to the return type of the function, the parameter of the function, the pointer type.

2. Storage of integers in memory

Integers are usually stored in binary representation. An integer variable is allocated a certain memory space to store its value. The size of this memory space is determined by the data type of the integer. For example, the inttype usually uses 4 bytes (32 bits) or 8 bytes (64 bits).

We know that four bytes of space are allocated for the int type. How to store it?

2.1 Original code, inverse code, complement code

In computers, original code, complement code, and complement code are different encoding methods used to represent signed integers.

Raw code: Raw code is the simplest way to represent signed integers. In the original code, the highest bit (the leftmost bit) of the integer is used to represent the sign, 0 represents a positive number, 1 represents a negative number, and the remaining bits represent the absolute value of the integer in binary. For example, using 8 bits to represent a signed integer, the original code of -5 is 10000101, and the original code of +5 is 00000101.

Inverse code: The inverse code is obtained by keeping the original code of the positive number unchanged and inverting the original code of the negative number bit by bit. That is, the inverse of a negative number is to invert every bit in its original code (0 becomes 1, 1 becomes 0). For example, using 8 bits to represent a signed integer, the complement of -5 is 11111010, and the complement of +5 is 00000101.

Two's complement: Two's complement is the most common way of representing signed integers in computers. In two's complement, the complement of a positive number is the same as the original code, while the complement of a negative number is its inverse plus 1. That is, the complement of a negative number is to invert each bit in its one's complement (0 becomes 1, 1 becomes 0), and then adds 1. For example, using 8 bits to represent a signed integer, the complement of -5 is 11111011, and the complement of +5 is 00000101.

Summarize:

The original, inverse and complement of positive numbers are the same.

There are three different ways of representing negative integers.

Original code: The original code can be obtained by directly translating the value into binary in the form of positive and negative numbers.

Inverse code: The sign bit of the original code remains unchanged, and the other bits are sequentially inverted to obtain the inverse code.

Complementary code: Complementary code is obtained by inverse code +1.

For integer types: the data stored in the memory is actually stored in complement code.

why?

Why use two's complement representation for signed integers? An important feature of complement code is that it enables the addition and subtraction of integers to be implemented through the same hardware circuit** (the CPU only has an adder)**, which simplifies the design of the computer. In addition, the complement code also solves a problem of the original code and the inverse code, that is, the carry problem of the addition operation.

To illustrate the addition operation with an example: Taking the two's complement representation represented by 8 bits as an example, consider calculating +5 (complement: 00000101) + (-3, complement: 11111101).

00000101 (+5)

11111101 (-3)


00000010 (+2)

By adding two's complement, we get the correct result +2. There is no carry problem here because the two's complement representation allows subtraction to be performed using the same rules as addition.

Let's look at the storage in memory:

insert image description here

We can see that the hexadecimal complements are stored for a and b respectively. But we found that the order was a bit off .

Why? The answer is big and small endian

2.2 Big and small endian

In computer architecture, endianness concerns the order in which multibyte data (eg integers, floating point numbers) are stored in memory. Specifically, endianness concerns the order in which the least significant and most significant bytes of multibyte data are stored in memory.

Big-endian: In big-endian storage, the most significant byte (high byte) of multibyte data is stored at the low address and the least significant byte (low byte) is stored at the high address.

Little-endian: In little-endian storage, the least significant byte (lower byte) of multibyte data is stored at a lower address and the most significant byte (higher byte) is stored at a higher address. This approach is just the opposite of big-endian/

Why there is big endian and little endian:

Historical background: The development of computer architecture began in the 1950s and 1960s, when different computer manufacturers developed their own computer systems. These computer systems differ in many ways in their hardware design, including how data is stored in memory. Initially, there was no unified standard to define the storage order of multi-byte data, so different byte order methods appeared.

Hardware design differences: The difference between big and small endianness mainly stems from the way multi-byte data is stored in computers. In computers, data is usually stored in memory in bytes, and multi-byte data (such as integers, floating-point numbers) consists of multiple bytes. The question is, in what order should these bytes be stored.

Big-endian: Some early computers used big-endian, storing the most significant byte of multibyte data at a lower address and the least significant byte at a higher address. A computer designed in this way can operate in the same order as people read numbers when processing multi-byte data, which is more intuitive. For example, the 16-bit integer 0x1234 is stored as 12 34 in memory.

Little-endian: With the development of computer technology, some computer manufacturers have adopted the little-endian method, which stores the least significant byte of multi-byte data at a low address and the most significant byte at a high address. Little-endian is the opposite of big-endian, but equally reasonable. When dealing with multi-byte data, the little-endian method can directly use the low address to represent the low-order part of the data, which is helpful for the realization of some specific operations. For example, the 16-bit integer 0x1234 is stored as 34 12 in memory.

In VS, the little-endian storage method is used:

insert image description here

Baidu 2015 system engineer written test questions:

Please briefly describe the concepts of big-endian and little-endian, and design a small program to determine the byte order of the current machine. (10 points)

//代码1
#include <stdio.h>
int check_sys() {
    
    
    int i = 1;
    return (*(char *) &i);
}

int main() {
    
    
    int ret = check_sys();
    if (ret == 1) {
    
    
        printf("小端\n");
    } else {
    
    
        printf("大端\n");
    }
    return 0;
}

//代码2
int check_sys() {
    
    
    union {
    
    
        int i;
        char c;
    } un;
    un.i = 1;
    return un.c;
}

2.3 Exercises

//练习1
int main() {
    
    
    char a = -1;// -1截断后存储在a中
    //1000000000000000000000000000001 - 原码
    //1111111111111111111111111111110 - 反码
    //1111111111111111111111111111111 - 补码
    //11111111 - 截断(从右往左截断)
    //还原成原码-1 取反得10000001 = -1
    signed char b = -1;
    //1111111111111111111111111111111 - 补码
    //11111111 - 截断后
    //还原成原码-1 取反得10000001 = -1
    unsigned char c = -1;
    //其实一眼就能看出来 -1的补码是11111111  无符号就是255  8位二进制最高的表示范围
    //1111111111111111111111111111111 - 补码
    //11111111
    printf("a=%d,b=%d,c=%d", a, b, c);//a=-1,b=-1,c=255
    //c因为是以%d打印,所以还会涉及整型提升
    //11111111
    //0000000000000000000000011111111 - 整型提升  无符号 补0后  为正数  补码 原码相同 直接计算
    //有符号整型提升补首位字符  无符号直接高位补0
    return 0;
}
//练习2
//%u是打印无符号整数,认为内存中存放的补码对应的是一个无符号整数
int main(){
    
    
    char a = -128;
    //10000000000000000000000010000000
    //11111111111111111111111101111111
    //11111111111111111111111110000000 - 补码
    //10000000 - 截断
    //11111111111111111111111110000000  -  带符号整型提升高位补符号位
    printf("%u\n", a);  //4294967168
    char b = 128;
    //00000000000000000000000010000000 - 正数 原 反 补 一样
    //10000000 - 截断
    //11111111111111111111111110000000 - 整型提升
    printf("%u\n", b);   //4294967168
    return 0;
}
//练习3
int main(){
    
    
    int i = -20;
    //10000000000000000000000000010100 - 原码
    //11111111111111111111111111101011 - 反码
    //11111111111111111111111111101100 - 补码
    unsigned  int  j = 10;
    //00000000000000000000000000001010
    printf("%d\n", i + j);   //-10
    //11111111111111111111111111101100 - 补码
    //00000000000000000000000000001010 - 补码
    //11111111111111111111111111110110 - 补码+补码
    //10000000000000000000000000001001 - 取反后+1得到下面的
    //10000000000000000000000000001010 - -10
    return 0;
}
//练习4
int main(){
    
    
    unsigned int i;   //无符号整型 没有负数
    for (i = 9; i >= 0; i--)   //无符号整数一定大于0 所以循环一直进行
    {
    
    
        printf("%u\n", i);   //死循环
    }
    return 0;
}
//练习5
int main(){
    
    
    char a[1000];  //范围-128-127
    //0 -1 -2 -3 -4...-128 127 126....1 0   一个圆,一共255个值
    int i;
    for (i = 0; i < 1000; i++)
    {
    
    
        a[i] = -1 - i;
    }
    printf("%d", strlen(a));  //255
    return 0;
}
//练习6
unsigned char i = 0;  //0-255
int main(){
    
    
    for (i = 0; i <= 255; i++)   //255后i++之后i变成了0  char的值最大为255
    {
    
    
        printf("hello world\n");  //死循环
    }
    return 0;
}

3. Storage of floating point numbers in memory

Common floating point numbers: 3.14159 1E10

Floating-point family includes: float, double, long doubletype.

The range of floating-point numbers: float.hdefined in

The value range of the integer type: limits.hdefined in

Example of floating point storage:

#include <stdio.h>
int main() {
    
    
    int n = 9;
    float *pFloat = (float *) &n;
    printf("n的值为:%d\n", n);
    printf("*pFloat的值为:%f\n", *pFloat);
    *pFloat = 9.0;
    printf("num的值为:%d\n", n);
    printf("*pFloat的值为:%f\n", *pFloat);
    return 0;
}

Output result:

n的值为:9
*pFloat的值为:0.000000
num的值为:1091567616
*pFloat的值为:9.000000

Let's explain why each step outputs:

n的值为:9: This is the correct output, nthe value of the variable is printed and it is the integer 9.

*pFloat的值为:0.000000: Here's the problem. By pFloatinterpreting a pointer as a pointer float*, it tries to ninterpret the memory of (integer type memory) as a floating point number. Because of the differences in the internal representations of floating-point numbers and integers, accessing values ​​in this way nresults in incorrect floating-point values ​​(usually uninitialized values). So, the printed value is wrong.

num的值为:1091567616: This output is due to the previous *pFloat = 9.0;operation, which wrote the floating-point value 9.0 into nthe memory, resulting in nthe value being modified to 1091567616. This value is the result of converting the in-memory binary representation of the floating-point number 9.0 to integer form.

*pFloat的值为:9.000000: In the previous step, we wrote the floating-point value 9.0 into nmemory, so pFloatthe memory pointed to is now interpreted as the floating-point number 9.0. Therefore, the printed result is 9.0.

3.1 Floating-point number storage rules

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be expressed in the following form:

(-1) ^ S * M * 2 ^ E

(-1) ^ S represents the sign bit, when S = 0, V is a positive number; when S = 1, V is a negative number.

M represents a valid number, greater than or equal to 1 and less than 2.

2^E means exponent bits.

for example:

5.0 in decimal is 101.0 in binary, which is equivalent to 1.01×2^2.

Then, according to the format of V above, it can be concluded that S=0, M=1.01, and E=2.

Decimal -5.0, written in binary is -101.0, which is equivalent to -1.01×2^2. Then, S=1, M=1.01, E=2.

IEEE 754 states :

For 32-bit floating-point numbers, the highest 1 bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 23 bits are the significand M.

insert image description here

For a 64-bit floating-point number, the highest bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significand M.
insert image description here

IEEE 754 has some special regulations on the significant figure M and exponent E.

As mentioned earlier, 1≤M<2, that is to say, M can be written 1.xxxxxxin the form of , where xxxxxx represents the decimal part.

IEEE 754 stipulates that when M is saved inside the computer, the first digit of this number is always 1 by default, so it can be discarded, and only the following xxxxxx part is saved. For example, when saving 1.01, only save 01, and then add the first 1 when reading. The purpose of doing this is to save 1 significant figure. Taking the 32-bit floating point number as an example, there are only 23 bits left for M. After the first 1 is rounded off, it is equivalent to saving 24 significant figures.

As for the index E, the situation is more complicated.

First, E is an unsigned integer (unsigned int)

This means that if E is 8 bits, its value range is 0 255; if E is 11 bits, its value range is 0 2047. However, we know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that an intermediate number must be added to the real value of E when stored in memory. For 8-digit E, the intermediate number is 127; For an 11-bit E, this intermediate number is 1023. For example, the E of 2^10 is 10, so when saving it as a 32-bit floating point number, it must be saved as 10+127=137, which is 10001001.

Then, the index E is fetched from the memory and can be further divided into three cases:

E is not all 0 or not all 1

At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted by 127 (or 1023) to obtain the real value, and then the first digit 1 is added before the effective number M.

For example: the binary form of 0.5 (1/2) is 0.1, since the positive part must be 1, that is, the decimal point is shifted to the right by 1, then it is 1.0*2^(-1), and its order code is -1+127= 126, expressed as 01111110, and the mantissa 1.0 removes the integer part to be 0, and fills 0 to 23 digits 000000000000000000000000, then its binary representation is:

0 01111110 00000000000000000000000

E is all 0

At this time, the exponent E of the floating-point number is equal to 1-127 (or 1-1023), which is the real value, and the effective number M is no longer added with the first digit of 1, but is reduced to a decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.

E is all 1

At this time, if the significant number M is all 0, it means ± infinity (positive or negative depends on the sign bit s)

To explain the previous question:

Next, let's go back to the original question: why 0x00000009 becomes 0.000000 when it is restored to a floating point number?

First, split 0x00000009 to get the first sign bit s=0, the exponent E=00000000 of the next 8 bits, and the effective number M=000 0000 0000 0000 0000 1001 of the last 23 bits.

9 -> 0000 0000 0000 0000 0000 0000 0000 1001

Since the exponent E is all 0, it meets the second case in the previous section. Therefore, the floating-point number V is written as:

V=(-1)^0 × 0.00000000000000000001001×2^(-126)=1.001×2^(-146)

Obviously, V is a small positive number close to 0, so it is 0.000000 in decimal notation.

Look at the second part of the example. May I ask the floating point number 9.0, how to use binary representation? What is the reduction to decimal?

First, the floating point number 9.0 is equal to 1001.0 in binary, which is 1.001×2^3.

9.0 -> 1001.0 ->(-1)^01.0012^3 -> s=0, M=1.001,E=3+127=130

Then, the first sign bit s=0, the effective number M is equal to 001 followed by 20 0s, making up 23 bits, and the exponent E is equal to 3+127=130, which is 10000010.

Therefore, written in binary form, it should be s+E+M, ie

0 10000010 001 0000 0000 0000 0000 0000

This 32-bit binary number, restored to decimal, is exactly 1091567616.

V=(-1)^0 × 0.00000000000000000001001×2^(-126)=1.001×2^(-146)

Obviously, V is a small positive number close to 0, so it is 0.000000 in decimal notation.

Look at the second part of the example. May I ask the floating point number 9.0, how to use binary representation? What is the reduction to decimal?

First, the floating point number 9.0 is equal to 1001.0 in binary, which is 1.001×2^3.

9.0 -> 1001.0 ->(-1)^01.0012^3 -> s=0, M=1.001,E=3+127=130

Then, the first sign bit s=0, the effective number M is equal to 001 followed by 20 0s, making up 23 bits, and the exponent E is equal to 3+127=130, which is 10000010.

Therefore, written in binary form, it should be s+E+M, ie

0 10000010 001 0000 0000 0000 0000 0000

This 32-bit binary number, restored to decimal, is exactly 1091567616.

Guess you like

Origin blog.csdn.net/ikun66666/article/details/131861729