Data storage in the computer - [C language]

In the previous blog, we have learned the data types of C language. Let us review the data types in C language first.


Table of contents

Basic built-in types of the C language

Basic Classification of Types

Integer storage in memory

Original code, inverse code, complement code

endianness in storage

practise 

 Storage of floating point types in memory

 Storage rules for floating point numbers

In-Depth Analysis of Citation Problems 


Basic built-in types of the C language

char //Character data type

short //short integer data type

int //integer    

long // long integer

long long//long long integer

float //single-precision floating-point type

double // double precision floating point type

int main(void)
{
	printf("%d\n", sizeof(char));
	printf("%d\n", sizeof(short));
	printf("%d\n", sizeof(int));
	printf("%d\n", sizeof(long));
	printf("%d\n", sizeof(long long));
	printf("%d\n", sizeof(float));
	printf("%d\n", sizeof(double));

	return 0;
}

 We can also have a general understanding of the maximum and minimum values ​​of built-in types:

We are already familiar with these basic built-in data types and sizes. Next, we will roughly classify the data types in C language.


Basic Classification of Types

 Integer family:

char

        signed char

        unsigned char

short

        signed short [int]

        unsigned short [int]

int

        signed int

        unsigned int

long

        signed long [int]

        unsigned long [int]

Let's talk about char in detail. Char is divided into signed char and unsigned char, which are signed characters and unsigned characters respectively. The size of char is 8 bits, so there are 8 binary bits. When it is signed char, the highest bit is Sign bit, when it is unsigned char, it is all digital bits. Therefore, signed char and unsigned char represent different ranges, and the results are as follows:

 When the number is larger than the range that can be represented, the number will return to the beginning and continue the cycle.

Int, short, long can refer to the above content.

Floating point family:

 float

double

Pointer family:

int *p;

char *p;

float *p;

void *p;

......

 Construction type:

Array type: int arr[10]

Structure type: struct

Union type: union

Enumeration type: enum

Empty type: void means empty type (no type)

Usually the return value of the function, the parameter of the function, the pointer type.

The above is our classification and arrangement of the data types in the entire C language. Next, let's learn a little underlying stuff to deepen our knowledge and understanding of data!


Integer storage in memory

I wrote in the previous blog that a variable needs to open up a memory space when it is created, and the size of the space is related to the type of data.

How is that integer variable stored in the computer?

When we create an int a = 10, the computer will open up a 4-byte space to store the data in it, and store it in binary order (the display in vs is in hexadecimal), that is, 32 A combination of 0 or 1 is stored. 

Original code, inverse code, complement code

There are three binary representation methods for integers in computers, namely original code, complement code and complement code. The three representation methods all have two parts, a sign bit and a value bit. The sign bit uses 0 to represent "positive" and 1 to represent "negative". There are three different ways of representing negative integers.

Original code: The original code can be obtained by directly translating the value into binary in the form of positive and negative numbers.

Inverse code: Keep the sign bit of the original code unchanged, and invert the rest of the code bit by bit.

Complement: Add +1 to the resulting complement. 

For integer data, the computer usually stores it in the form of two's complement: because in the computer system, all values ​​are represented and stored in two's complement. The reason is that, using the complement code, the sign bit and the value field can be processed uniformly; at the same time, addition and subtraction can also be processed uniformly (the CPU only has an adder). In addition, the operation process of the complement code and the original code is the same, not Additional hardware circuitry is required. 

Let's take a look at the original code, inverse code, and complement code of 10 and -10

#include<stdio.h>
int main(void)
{
    int a = 10;
   // 00000000000000000000000000001010-原码、反码、补码
    int b = -10;
   // 10000000000000000000000000001010—原码
   // 11111111111111111111111111110101-反码
   // 11111111111111111111111111110110-补码
    return 0;
}

Compared with the storage in the computer, we can find that the order of storage is a bit wrong, so why is this? 

 


endianness in storage

What is big and small endian?

Big-endian (storage) mode means that the low bits of the data are stored in the high addresses of the memory, and the high bits of the data are stored in the low addresses of the memory.

The little endian (storage) mode means that the low bits of the data are stored in the low addresses of the memory, while the high bits of the data are stored in the high addresses of the memory. 

Why do we need to distinguish between big and small endian storage?

This is because in the computer system, we use bytes as the unit, and each address unit corresponds to a byte, and a byte is 8 bits. But in the C language, in addition to the 8-bit char, there are also 16-bit short types and 32-bit long types (depending on the specific compiler). In addition, for processors with more than 8 bits, such as 16-bit Or for a 32-bit processor, since the register width is greater than one byte, there must be a problem of how to arrange multiple bytes. Therefore, it leads to big-endian storage mode and little-endian storage mode. For example: a 16bit short type x, the address in the memory is 0x0010, the value of x is 0x1122, then 0x11 is the high byte, and 0x22 is the low byte. For big-endian mode, put 0x11 in the low address, that is, 0x0010, and put 0x22 in the high address, that is, 0x0011. Little endian mode, just the opposite. Our commonly used X86 structure is little-endian mode, while KEIL C51 is big-endian mode. Many ARMs and DSPs are in little-endian mode. Some ARM processors can also choose the big-endian mode or the little-endian mode by hardware.

How do we distinguish between using programs to distinguish whether our computer is big-endian or little-endian?

We can use the difference between the big and small endian to start the problem. The storage method of the big endian and the little endian is just opposite. We can set a fixed value of 1, and then use the pointer to access the location of the low address to print. If the printed If the number is 1, it is little-endian storage, otherwise it is big-endian storage. (The variable created is int, at this time we only need to access the content of one byte, and use mandatory type conversion to change the address of the variable to (char *)). Theory is formed, practice begins!

#include<stdio.h>
int main(void)

int check_sys(int a)
{
	return (*(char*)&a);
}

int main(void)
{
	int i = 1;
	if (check_sys(i))
	{
		printf("小端\n");
	}
	else
	{
		printf("大端\n");
	}
	return 0;
}

practise 

 The above is the basic content of integer storage, we can consolidate it through a few exercises:

#include <stdio.h>
unsigned char i = 0;
int main()
{
    for(i = 0;i<=255;i++)
   {
        printf("hello world\n");
   }
    return 0;
}

Let's analyze this question first: the range that unsigned char can represent is (0~255). In the for loop, if the value of i exceeds 255, you can jump out of the loop, but the maximum value that i can represent is 255. If you add 1 more The value of i will return to 0 to continue the loop, so this program is an endless loop.

#include <stdio.h>
int main()
{
    char a= -1;
    signed char b=-1;
    unsigned char c=-1;
    printf("a=%d,b=%d,c=%d",a,b,c);
    return 0;
}
//-1的原反补码
//10000000000000000000000000000001-原码
//11111111111111111111111111111110-反码
//11111111111111111111111111111111-补码

This question is to distinguish the range of numbers represented by char, signed char, and unsigned char. In general, char means signed char, so the printed values ​​of a and b should be the same, both are -1. For unsigned char, we first truncate the complement of -1 to 11111111, because the printed format is %d, so the unsigned integer of 11111111 is promoted to 00000000000000000000000011111111, and the printed result should be 255, so the final printed result should be -1, -1, 255.

We have finished reading the storage of integers, so how should we store floating-point types?


 Storage of floating point types in memory

Floating-point numbers are also frequent visitors in our C language, and there are many floating-point numbers around us: 3.14, 13.14... As mentioned above, the floating-point number family includes float and double.

The range of floating point numbers can be queried in #include<float.h>.

We use an example to introduce the storage of floating point types in memory:

#include<stdio.h>
int main(void)
{
	int n = 9;
	float* pFloat = (float*) & n;
	printf("n的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);

	*pFloat = 9.0;
	printf("n的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);

	return 0;
}

 The output result is:

Using a floating-point pointer to dereference an integer reads the value differently than expected, and conversely using a floating-point number to fetch from memory does not return the expected value. From this we can conclude that the storage and reading rules of floating-point numbers and integers are different. How should floating point numbers be stored in memory?


 Storage rules for floating point numbers

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be expressed in the following form:

(-1)^S * M * 2^E

(-1)^S represents the sign bit, when S=0, V is a positive number; when S=1, V is a negative number.

M represents a valid number, greater than or equal to 1 and less than 2.

2^E means exponent bits.

For example:

5.0 in decimal is 101.0 in binary, which is equivalent to 1.01×2^2. Then, according to the format of V above, it can be concluded that S=0, M=1.01, and E=2. Decimal -5.0, written in binary is -101.0, which is equivalent to -1.01×2^2. Then, S=1, M=1.01, E=2.

IEEE 754 stipulates: For a 32-bit floating-point number, the highest bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 23 bits are the effective number M. 

For a 64-bit floating-point number, the highest bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significand M.  IEEE 754 has some special regulations on the effective number M and exponent E:

 As mentioned earlier, 1≤M<2, that is to say, M can be written in the form of 1.xxxxxx, where xxxxxx represents the decimal part. IEEE 754 stipulates that when M is saved inside the computer, the first digit of this number is always 1 by default, so it can be discarded, and only the following xxxxxx part is saved. For example, when saving 1.01, only save 01, and then add the first 1 when reading. The purpose of doing this is to save 1 significant figure. Take the 32-bit floating-point number as an example, there are only 23 bits left for M, and after the first 1 is discarded, it is equal to saving 24 significant figures.

As for the index E, the situation is more complicated:

 First, E is an unsigned integer (unsigned int) , which means that if E is 8 bits, its value range is 0~255; if E is 11 bits, its value range is 0~2047. However, we know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that an intermediate number must be added to the real value of E when stored in memory. For 8-digit E, the intermediate number is 127; For an 11-bit E, this intermediate number is 1023. For example, the E of 2^10 is 10, so when saving it as a 32-bit floating point number, it must be saved as 10+127=137, which is 10001001. 

Then there are three situations in which E is taken out of memory:

E is not all 0 or 1:

At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted by 127 (or 1023) to obtain the real value, and then the first digit 1 is added before the effective number M.

For example: The binary form of 0.25 (1/4) is 0.01. Since the positive part must be 1, that is, the decimal point is moved to the right by 2 bits, it is 1.0*2^(-2), and its order code is -2+127= 125, expressed as 01111101, and the mantissa 1.0 removes the integer part to be 0, and fills 0 to 23 digits 00000000000000000000000, then its binary representation is: 0011111010000000000000000000000

E is all 0:

At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023) which is the real value

Significant number M no longer adds the first 1, but restores to a decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.

E is all 1s: 

At this time, if the significant number M is all 0, it means ± infinity (positive or negative depends on the sign bit s)


In-Depth Analysis of Citation Problems 

Now let's go back and analyze the previous example:

The binary value of 9 is 00000000000000000000000000001001, why is it converted to a floating point number with nine tails 0.000000?  Since the exponent E is all 0, it meets the second case in the previous section. Therefore, the floating point number V is written as: V=(-1)^0 × 0.00000000000000000001001×2^(-126)=1.001×2^(-146) Obviously, V is a small positive number close to 0, so In decimal notation it is 0.000000.

Let's analyze the problems that arise when converting floating-point numbers to integers: 

 The resulting decimal integer is the result of our program.

The above is the whole content of the data in the computer. I hope you can gain something through my article. Your support is my biggest motivation! ! !

Guess you like

Origin blog.csdn.net/m0_74755811/article/details/131613077
Recommended