Storage of C language data in memory

We have learned the data types of C language in the early stage, including integer, floating point, character, etc., but we don’t seem to understand how they are stored in the memory, and when we looked at the memory before, we found that the data in it was stored upside down. Why is this? In this issue, the editor will take you to thoroughly understand the storage rules of data in memory!

Contents of this issue

1. Detailed introduction of data types

2. Storage of integers in memory

3. Introduction and judgment of big and small endian byte order

4. Detailed explanation of several written test questions

5. Detailed storage of floating point numbers in memory

1. Introduction of data types

We have already learned that the basic data types of data are built-in types (included in C language) as follows:

char character type 1 byte

short short integer 2 bytes

int integer 4 bytes

long long integer 4/8 bytes

long long longer integer 8 bytes

float Single-precision floating-point type 4 bytes

double Double-precision floating-point type 8 bytes

It should be noted here that the character type (char) occupies 1 byte in C language, but may not occupy a byte in other languages, such as 2 bytes in java! (We know that a Chinese character is two bytes. In java, variables of type char can be used to store gender!)

In addition, long occupies 4/8 bytes because the C language standard only stipulates that the size range of long is:

sizeof(int) <= sizeof(long) <= sizeof(long long) The specific size depends on the compiler

Do you want the meaning of such a data type here? Don't we just use integers and decimals in daily life? He really means something!

1.1 Significance of types:

The computer is different from life, the computer has to manage a lot of data. The type determines its scope of use (the size that can be opened up and accessed) and how to treat this space!

For example: the access authority (size) of an integer type is 4 bytes, and the access authority of a character type is 1 byte. If you store a floating point number in an integer type and do not force it, it will give you an error and the precision is lost. This shows that the types are different.

1.2 Is there a string type in C language?

The answer is: no! In the last issue, we just said that strings in C language are stored in character arrays or constant strings are stored in character pointers (the address of the first element of the string)!

1.3 Basic Classification of Types

Integer family:

char         unsigned char         signed char

short               unsigned short          signed short

int unsigned int signed int

long                unsigned long          signed long

long long        unsigned long long signed long long

Here you must be wondering why char is classified into the integer family?

In fact, when the character type is stored, it essentially stores the corresponding ASCII value, which is essentially a number! So it belongs to the integer family!

In addition, the general types are signed, if you want to make unsigned, you have to add unsigned!

Floating-point (real) family:

float

double

Construct (custom) types:

Array type: type [const num];

Structure type: struct

Enumeration type: enum

Union (public) type: union

Some friends may not know the array type here, remove the array name to get the array type! Other structures, enumerations, and unions will be introduced in the next few issues!

Pointer type:

int *p;

char *pc;

float *pf;

void * pv;

These have been mentioned in the previous two issues, so I won’t say more here, the only thing to pay attention to is: the void* pointer, he can receive any type of pointer, and force it to what you need when using it!

empty type:

void represents an empty type (such as a return value)

Usually used for function return values, function parameters, etc.!

Note that there is no return value here: you don’t need to return, if you have to return, you can write it like this: return ;

Second, the storage of integers in memory

We have introduced that creating a variable needs to open up space in memory, and the size of the space is determined by its type! But how is the data stored in the memory after the space is opened up? Let's explore together below!

2.1 Original code, inverse code, complement code

Integers in computers are stored in binary form, and there are three binary representations: original, inverse, and complement!

The three binary representation methods are composed of two parts, the sign bit and the value bit! The highest bit is the sign bit and the remaining bits are value bits! The sign bit is 0 for positive and 1 for negative!

The original, inverse and complement of positive integers (unsigned numbers) are the same!

The three forms of negative integers do not need to be calculated:

Original code: the direct translation of decimal into binary is the original code!

Negative code: the sign bit remains unchanged and the other bits are reversed bit by bit!

Complementary code: add 1 to the inverse code to complement the code!

Take a chestnut:

So far we know that integers are stored in binary form in memory, and we also understand the three forms of binary! So which type of binary is stored in memory? When we introduced the operators earlier, the bit-related operators operate on the two's complement code. We guess that the corresponding two's complement code will be stored? Let's verify it with a chestnut:

int main()
{
	int a = -1;
	return 0;
}

We know that the original code, inverse code, and complement code of positive integers are all the same. What exactly is stored? So let's test with negative numbers! We know that the complement of -1 is 32 1s, so we just need to check whether all 1s are stored in the memory, and because the value of the memory is in hexadecimal, we know that 1 hexadecimal digit is equivalent to 4 binary digits, that is to say, if the result in the memory is: ff ff ff ff, it means that the stored code is the complement code, otherwise it is the original code!

Sure enough, the complement code is saved! This chestnut also shows that the data stored in the memory is complement code, and the original code is used when printing!

We have learned above that integers are stored in the form of complement code in the memory, so is it calculated in the original code or the complement code? Here is a chestnut to prove it:


int main()
{
	int a = -3;
	int b = 5;
	int c = a + b;
	return 0;
}

Assuming that the original code is stored for calculation:

This is obviously unreasonable! Therefore, it is ruled out that the original code is used for calculation!

Assuming that it is stored in two's complement for calculation:

The result is correct! So the data is calculated and operated in its two's complement!

Summarize:

In fact, in the computer system, all values are represented and stored in complement code. The reason is that, using the complement code, the sign bit and the value bit can be processed uniformly. At the same time, addition and subtraction can also be processed uniformly (cpu only has an adder). In addition, the complement code and the original code are converted to each other. The operation process is the same, and no additional hardware circuits are required!

This explains all the problems above! In addition, pay attention to this sentence: "Complement code and original code are converted to each other, and the operation process is the same, no additional hardware circuit is needed!" What does this sentence mean? We know: Negative integers need to be inverted from the original code to the complement code (the sign bit remains unchanged and the other bits are inverted) + 1. From the complement code to the original code, it is necessary to subtract 1 and then invert it! This sentence actually introduces another method: from the original code to the complement code, you need to invert + 1, and from the complement code to the original code, you can also take the method + 1;

Draw a picture to understand:

2.2 Introduction to big and small endian byte order

When we observed the memory, we found that the data is stored upside down:

For example:


int main()
{
	int a = 15;
	return 0;
}

The binary of 15 is 00000000 00000000 00000000 00001111 converted into hexadecimal is 00 00 00 0f It stands to reason that the memory should be stored in this order!

Look at the memory:

The memory is just placed upside down, why is that? In fact, this is a big and small endian byte order problem! Let's introduce what big and small endian byte order is!

Byte order: Discuss the storage form of data in units of bytes!

2.3 What is big and small endian?

Big endian (storage) mode: the low bits of data are stored in high addresses, and the high bits of data are stored in low addresses!

Little-endian (storage) mode: the low bits of data are saved to low addresses, and the high bits of data are saved to high addresses!

OK! Draw a picture to understand:

Our machine is little-endian storage mode!

2.4 Why are there big and small ends?

This is because in the computer system, we use bytes as the unit, each address unit corresponds to a byte, and a byte is 8 bits, but in the C language, in addition to the 8-bit char, there are 16-bit short types, 32-bit int or long (see the compiler for details). Therefore, the big and small endian byte order is produced!

I don’t know if you understand what I said above, but here I will sort it out for you in vernacular:

That is to say, you can save the value of a byte variable no matter how you want it, in no order! But the storage of more than one byte has an order problem! For example:

In order to be easy to take out after storage, so it is left at the end. The above two are the big and small ends. The following two are discarded due to irregular storage!

So far, I have to introduce a written test question for Baidu system engineers in 2015:

Please briefly describe the concepts of big-endian and little-endian, and design a program to determine whether the current machine is big-endian or little-endian.

The concept here is not introduced, just finished above! The focus here is on the second question; how to judge whether it is big endian or little endian?

There are two ways to compile a question here: 1. Pointer 2. Joint. The following editor will introduce these two methods:

int check_sys()
{
	int i = 1;
	//return (*(char*)&i) & 1;
	return (*(char*)&i);
}

int main()
{
	if (check_sys() == 1)
	{
		printf("小端\n");
	}
	else
	{
		printf("大端\n");
	}

	return 0;
}

This code should be understandable! Let's look at another one:

int check_sys()
{
	union
	{
		int i;
		char c;
	}un;//创建了一个联合体对象un，他有两个成员，一个是int i和char c;
	un.i = 1;
	return un.c;
}

int main()
{
	if (check_sys() == 1)
	{
		printf("小端\n");
	}
	else
	{
		printf("大端\n");
	}

	return 0;
}

What I want to say here is: the size of the union is the size of the largest member of it! Here is also int i; the combined member variables cannot be used at the same time, here returns the value of char c, if it is small endian, it is 1, otherwise it is 0; draw a picture to explain:

It doesn't matter if you don't understand here, I will introduce it in a certain issue later!

Look at the two results:

This is consistent with what we introduced above!

4. Detailed explanation of several written test questions

（1）

int main()
{
	char a = -1;
	signed char b = -1;
	unsigned char c = -1;
	printf("a = %d b = %d c = %d ", a, b, c);
	return 0;
}

Think about it first, what is the answer to this question?

Look at the result:

I don't know if it is the same as what you think? The editor will take you to analyze:

We said before a and b: char == signed char

If you don’t understand the plastic improvement here, take a look at this article I wrote before:

C language shaping improvement http://t.csdn.cn/bCGN6 Let’s analyze this c:

（2）

int main()
{
	char a = -128;
	printf("%u\n", a);
	return 0;
}

Think about it first, what is the answer to this question?

Look at the result:

This question is likely to be answered for a while, so I will explain it:

（3）


int main()
{
	char a = 128;
	printf("%u\n", a);
	return 0;
}

What is the answer to this question?

Look at the result:

It should be very easy to do this question after finishing the last one, and you don’t need to analyze it step by step like this! The one stored in it is the same as the one stored in -128 above and all are printed as %u, so the result should be the same!

（4）

int main()
{
	int i = -20;
	unsigned int j = 10;
	printf("%d\n", i + j);
	return 0;
}

What is the answer to this question?

Look at the result:

analyze:

（5）

int main()
{
	unsigned int i;
	for (i = 9; i >= 0; i--)
	{
		printf("%u\n", i);
	}
	return 0;
}

What is the result of this code?

Take a look at the results:

Yes, he is in an endless loop, why? Let me analyze it for you:

In fact, the reason is very simple. i is an unsigned number, which is always greater than or equal to 0. The loop judgment condition here is >=0, so it is always true! So you are caught in an endless loop of profit! But why keep printing such a large number? Here we will talk about it in detail, unsigned integer!

Unsigned integer details:

We know that if the integer is not added unsigned on VS, it will be signed (signed) by default. What is the range that each type can represent? Looks like we haven't been introduced yet! The following editor introduces the range of signed and unsigned representations in detail!

char occupies one byte (8 bits, the highest bit is the sign bit), so the range of its positive value is

0 ~ 2 ^ 7 - 1 = 0 ~ 127

The value of negative number is -1 ~ -2 ^ 7 = - 1 ~ -128, the value range of multicha is: -2 ^ 7 -- 2 ^ 7 - 1

What about unsigned char? Since the highest bit is no longer a sign bit but a value bit, its range is 0~2^8-1

short occupies two bytes (16 bits, the highest bit is the sign bit), so the range of its positive value is

0 ~ 2 ^ 15 - 1

The negative value is -1 ~ -2 ^ 15, so the value range is: -2 ^ 15 -- 2 ^ 15 - 1

What about unsishort? Since the highest bit is no longer a sign bit but a value bit, its range is 0~2^16 -1;

The analysis method in other integer families is the same as this analysis method, int -2^31~2^31-1; unsigned int is 0~2^31-1; here I will not list them one by one! Let me explain why this is so! Take char as an example (others are the same):

signed char:

When you understand this, you can look at such a picture:

unsigned char:

Knowing this, you can explain why the above question prints a large number after an infinite loop!

Because i is an unsigned int type that is always greater than or equal to 0, when it is equal to 0 and then minus 1, it becomes the maximum value of unsigned int of more than four billion! Then the loop has been reduced and has been greater than or equal to 0! The same goes for the second and third questions above!

（6）

#include<string.h>
int main()
{
	char a[1000];
	int i;
	for (i = 0; i < 1000; i++)
	{
		a[i] = -1 - i;
	}
	printf("%d ", strlen(a));
	return 0;
}

One more look at this code and it will explode! What is the result of thinking about execution?

Let's take a look together:

255 There must be many people who did not expect it! When the editor made this mention for the first time, his scalp was numb from Xiu! Haha, let me analyze it below:

When a[i] is 0, the corresponding character is '\0'. We know that strlen calculates the content before \0. There are 127+128=255 elements in the array before, so it is 255. Isn't this question very tricky! Ha ha!

（7）

unsigned char i = 0;
int main()
{
	for (i = 0; i <= 255; i++)
	{
		printf("hello world\n");
	}
	return 0;
}

What is the result of this question? think for a while!

Look at the result:

The infinite loop of this mention should be easy to understand! I won't draw any pictures here! unsigned char is always less than or equal to 255) so it is an infinite loop!

5. Detailed storage of floating point numbers in memory

Common floating point numbers:

3.14

1E10 ----->1.0 * 10 ^10

We introduced the family of floating point numbers: float and double and long double

How are floating point numbers stored in memory? Let's discuss together, first look at a chestnut:


int main()
{
	int n = 9;
	float* pf = (float*)&n;
	printf("n = %d\n", n);
	printf("*pf = %f\n", *pf);

	*pf = 9.0;
	printf("num = %d\n", n);
	printf("*pf = %f\n", *pf);

	return 0;
}

Still think about it, what do you think the result of this code is?

Look at the result:

I guess the first and last are what you think! The other two are different. It is obvious that num and *pf are the same number, why are they printed differently? Why is this? To solve this problem, we need to talk about the storage rules of floating-point numbers in memory!

Storage rules for floating point numbers

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be expressed in the following form:

1. (-1)^S*M*2^E

2. (-1)^ S represents the sign bit, when S = 0, V is a positive number; when S = 1, V is a negative number;

3. M represents a valid number, greater than or equal to 1 and less than 2;

4. 2 ^ E represents the exponent

What does that mean? Let me draw a picture to understand intuitively:

OK! Take a chestnut:

int main()
{
	float f = 5.5;
	return 0;
}

We can use the above standard to practice:

IEEE754 stipulates:

For 32-bit floating-point numbers (float), the highest bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 23 bits are significant figures!

For 64-bit floating-point numbers (double), the highest is the sign bit S, the next 1 is the exponent E, and the remaining 52 bits are valid figures!

IEEE754 also has some special regulations on the effective number M and exponent E:

We said before: 1 =< M < 2, that is to say, M can be written in the form of 1.xxxxxxxx, where xxxxx represents the decimal part!

IEEE754 stipulates: When the computer saves M, the first digit of this number is always 1 by default, so it can be discarded, and only the following xxxxx (for example: 1.01 only saves 01) is added to 1 when it is used (read). Makes the storage range bigger! (64-bit floating-point numbers are the same)

As for the index E, the situation is more complicated:

Note: IEEE754 considers E to be an unsigned integer (unsigned int):

This means that if E is 8 bits, its value range is: 0~255, if E is 11 bits, its value range is: 0~2047

But we know that the exponent can actually be negative! But now E is unsigned, that is, it is always greater than or equal to 0, so in order to solve this problem, IEEE754 stipulates that an intermediate number must be added to the real value of E when it is stored in memory! For the 8-bit E, the middle number is 127, but for the 11-bit E, the middle number is 1023!

For example: the E of 2^10 is 10, when you save it (assuming 32 bits), you must save it as 10+127=137, that is: 10001001

Then there are three situations when E is read:

(1) E is not all 0 or not all 1

At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted from 127/1023 to obtain the real value, and then 1 is added to the first digit in front of the effective number M;

For example: 0.5 (32-bit as an example)

The binary value of 0.5 is: 0.1 Since the positive part must be 1, it can be written as: 1.0 * 2^-1,

When saving, E=-1 is: -1+127=126 (this becomes a positive number), that is, 01111110, and the 1 of 1.0 is omitted when saving, and added when using it! So only 0 is stored, and the bit with 23 bits plus 1 has 24 bits, so fill it up! 00000000 00000000 00000000 00000000

This is how it is stored in memory:

0 0111110 000000000000000000000000

When you read, 126 (01111110)-127 will get the real E! In the same way, adding 1 to the front of the effective digits in the back is the real effective digits, and it can be restored!

Verify it:

0 01111110 00000000000000000000000 At this time, the binary bit, the memory is hexadecimal, we change it to hexadecimal:

0011 1111 0000 0000 0000 0000 0000 0000-------"3f 00 00 00, and we have verified that the current machine is in little-endian mode, so it is saved upside down! i.e. 00 00 00 3f

Take a look at the results:

Exactly the same, this shows that our above is completely correct!

(2) E is all 0

At this time, the exponent E of the floating-point number is equal to 1-127/1-1023, which is the real value, and the effective number M is no longer added with the first 1, but is restored to a decimal of 0.xxxxxx. This is done to represent +/-0 and very small numbers close to 0!

Here is an explanation: when you find that E is all 0 when you take it, it means that the real value stored in it should be -127/-1023, that is, it is in this form:

This is a very, very small number that is almost close to 0, so it doesn't matter if the 1 in front of M is added or not!

(3) E is all 1

At this time, the effective number M is all 0, which means positive and negative infinity (positive or negative depends on S);

This situation is also quite special. When I took it, I found that it was all 1, that is, 255 (1023), that is, the real value when saving was 255-127/2047-1023 and 2^128/2^1024 is a very large number! We understand as infinity!

With the introduction here, we can explain the question just above!

OK! Good brothers, this is the end of this sharing! See you next time!