Advanced C language - data storage

Hello, time has passed so fast. It has been more than a month since I wrote the first article on csdn. I also gained some fans on csdn. Your likes are my motivation. I hope everyone will become stronger and stronger. Well, let’s get into our topic, and the basic C language is over. Now we are going to start our advanced C language. Here, the editor will take you to immerse yourself in C language, let’s start learning.

Key points of this chapter

  1. Data Type Details
  2. Storage of shaping in memory: original code, inverse code, complement code
  3. Introduction and judgment of big and small endian byte order
  4. Storage analysis of floating-point type in memory

1. Introduction to data types

char        //字符数据类型
short       //短整型
int         //整形
long        //长整型
long long   //更长的整形
float       //单精度浮点数
double      //双精度浮点数

We have learned the built-in types above and know how many bytes they occupy. We can use sizeof to calculate their size

type meaning

  1. Use this type to open up the size of the memory space (the size determines the scope of use).
  2. How to look at memory space perspective

Different types occupy different bytes when we store, so we decide when to use them

1.1 Basic classification of types

integer family

char
 unsigned char
 signed char
short
 unsigned short [int]
 signed short [int]
int
 unsigned int
 signed int
long
 unsigned long [int]
 signed long [int]

You may have doubts about the type of char, but we need to know that the ASCii codes of characters are our integers, so plan it into our integer family

unsigned means unsigned, you can think of it as a positive number

floating point family

float
double

Construction type:

> 数组类型
> 结构体类型 struct
> 枚举类型 enum
> 联合类型 union

pointer type

int *pi;
char *pc;
float* pf;
void* pv;

empty type

Void represents the empty type (no type)
and is usually applied to the return type of the function, the parameter of the function, the pointer type

2. The storage of shaping in memory
We said before that the creation of a variable needs to open up space in memory. The size of the space is determined according to different types. Next, let's talk about how the data is stored in the allocated memory?

int a = 20;
int b = -10;

We know that four bytes of space are allocated for a.
How to store it?
To know them, we need to know three things first, that is, our original code and complement code. Although these have been mentioned before, today I will briefly describe them here. 2.1 Original code, inverse code, and complement
code
.
The three representation methods all have two parts, the sign bit and the value bit. The sign bit uses 0 to represent "positive", and 1 to represent "negative"
.
There are three different ways of representing negative integers.

Original code
The original code can be obtained by directly translating the value into binary in the form of positive and negative numbers.

Inverse code
Keep the sign bit of the original code unchanged, and invert the other bits in turn to get the inverse code.

Complement code
Complement code is obtained by inverse code +1.

Supplement
To change the complement code into the original code, you can first invert the bit by bit, and then add one, of course, you can also subtract one, and then invert the bit by bit. We are all integers for signed digits, and these operations are all signed bits. And the operation is binary, and what is stored in our computer is their complement

You may have doubts about why we save complement codes. You may understand the following explanation

In computer systems, values ​​are always expressed and stored in two's complement. The reason is that, using the complement code, the sign bit and the value field can be
processed uniformly;
at the same time, addition and subtraction can also be processed uniformly (the CPU only has an adder). In addition, the operation process of the complement code and the original code
is the same, and no additional hardware circuits are required.

It can be seen that our CPU only has addition, and the others are all simulated implementations

We can debug to observe
insert image description here
insert image description here

#include<stdio.h>
int main()
{
    
    
	int a = 10;
	int b = -20;
	return 0;
}

The first picture is the address of our variable a, and the second is the address of our variable b. What
we can see is its hexadecimal storage, 0x means hexadecimal

Seeing here, do you have any questions, what is the order of these hexadecimals when we store them, do you have any questions, although we can see those? ? ? ? It is our order, I did not show him, here I will lead to a new knowledge point, that is our big and small endian

2.2 Big and small endian introduction

What is big-endian and small-endian:
Big-endian (storage) mode means that the low bits of data are stored in the high address of the memory, while the high bits of the data are stored in the low address of the memory; little endian (storage) mode means that the low bits of the data are stored in the low address of the memory, while the high bits of the data are stored in the high address of
the
memory
.

Why is there big endian and little endian???

Why is there a difference between big and small endian modes? This is because in the computer system, we use bytes as the unit, and each address unit corresponds to a byte, and a byte is 8 bits. But in the C language, in addition to the 8-bit char, there are also 16-bit short types and 32-bit long types (depending on the specific compiler). In addition, for processors with more than 8 bits, such as 16-bit or 32-bit processors, since the register width is greater than one byte, there must be a problem of how to arrange multiple bytes. Therefore, it leads to big-endian storage mode and little-endian storage mode.
For example: a 16bit short type x, the address in the memory is 0x0010, the value of x is 0x1122, then 0x11 is the high byte, and 0x22 is the low byte. For big-endian mode, put 0x11 in the low address, that is, 0x0010, and put 0x22 in the high address, that is, 0x0011. Little endian mode, just the opposite. Our commonly used X86 structure is little-endian mode, while KEIL C51 is big-endian mode. Many ARMs and DSPs are in little-endian mode. Some ARM processors can also choose the big-endian mode or the little-endian mode by hardware.

Then our compiler is actually stored in little-endian byte order, let me draw a picture for everyone to understand better
insert image description here

To put it simply, the low bits of the data are placed in the low address, which is the little endian, and the high bits of the data are placed in the high address, which is the big endian.

Then everyone already knows the big and small endian, can we use a code to prove whether our machine is big endian or little endian?

Here I would like to remind you that when we store, we store its complement, so we can take out their addresses, and take out their low address or high address, so that we only take out one byte and then perform forced transfer. If the taken out address is 1, then it is little endian, if it is 0, it is big endian
insert image description here

#include<stdio.h>
int check_sys()
{
    
    
	int i = 1;
	return *((char*)&i);
}
int main()
{
    
    
	int ret = check_sys();
	if (1 == ret)
	{
    
    
		printf("小端\n");
	}
	else
	{
    
    
		printf("大端\n");
	}
	return 0;
}

The above code can show that our VS machine storage method is little-endian byte order storage

practice questions

#include <stdio.h>
int main()
{
    
    
    char a= -1;
    signed char b=-1;
    unsigned char c=-1;
    printf("a=%d,b=%d,c=%d",a,b,c);
    return 0;
}

Our char is also signed by default, at least in VS, but unsigned may appear in a few compilers. In the integer family, char does not clearly stipulate that the first and second are very good-looking, that is, output according to the original number, but our unsigned output should be 255, because our complement of -1 is 11111111. However, unsigned is a positive number by default
.

insert image description here


#include <stdio.h>
int main()
{
    
    
	char a = -128;
	printf("%u\n", a);
	return 0;
}

insert image description here
insert image description here
We can understand by looking at these two pictures that our result is actually the binary bit above.

分析
首先我们的-128是个四字节的整型,然后我们存储到一个字节当中的char,首先会发生截断-128的原码是1000 0000 0000 0000 0000 0000 1000 0000
反码就是1111 1111 1111 1111 1111 1111 0111 1111
补码就是1111 1111 1111 1111 1111 1111 1000 0000
那我们截断就是1000 0000
在按照我们的无符号整型输出 那么要整型提升 提升是符号位,所以就变成
1111 1111 1111 1111 1111 1111 1000 0000
所以我们通过计算机算出结果,其中的第32位可不是符号位

3.
#include <stdio.h>
int main()
{
    
    
    char a = 128;
    printf("%u\n",a);
    return 0;
}

This question is even simpler. The binary of our positive number 128 is 1111 1111 1111 1111 1111 1111 1000 0000. If
we truncate it to 1000 0000, then we continue to improve the integer to 1111 1111 1111 1111 1111 1111 1000 0000, and then follow the Symbol output, he is
a positive number, the complement is the original code, so our result is 1111 1111 1111 1111 1111 1111 1000 0000 converted to decimal, which is the same as the answer to the above question
insert image description here

#include<stdio.h>
int main()
{
    
    

	int i = -20;
	unsigned int j= 10;
	printf("%d\n", i + j);
	//按照补码的形式进行运算,最后格式化成为有符号整数
	return 0;
}

insert image description here

#include<stdio.h>
int main()
{
    
    
	unsigned int i;
	for (i = 9; i >= 0; i--)
	{
    
    
		printf("%u\n", i);
	}
	return 0;
}

In fact, we can directly analyze this question. 9 to 0 is normal printing, because an unsigned integer is a positive number. When it reaches -1, we still think it is an unsigned number. In this case, it will enter an infinite loop, and it will be reduced from 2 to 32. When it is reduced to 0, it will come to -1, so it is an infinite loop.

int main()
{
    
    
    char a[1000];
    int i;
    for(i=0; i<1000; i++)
   {
    
    
        a[i] = -1-i;
   }
    printf("%d",strlen(a));
    return 0;
}

Here we can also think directly, the range of signed char is -128–127, and our strlen counts the length before \0. In this case, the first one of our a[i] is -1, but it is -2, it goes down to -128 and then becomes 127, and it loops until 0, and strlen ends when it encounters 0

#include <stdio.h>
unsigned char i = 0;
int main()
{
    
    
	for (i = 0; i <= 255; i++)
	{
    
    
		printf("hello world\n");
	}
	return 0;
}

Infinite loop. Our unsigned char range is 0 to 255. When it is equal to 256, it will be forced to become 0, so it enters an infinite loop.

3. Storage of floating-point types in memory
Common floating-point numbers:

3.14159
1E10
floating point family includes: float, double, long double types.
The range of floating-point numbers, the definition of float.h

3.1 An example

int main()
{
    
    
 int n = 9;
 float *pFloat = (float *)&n;
 printf("n的值为:%d\n",n);
 printf("*pFloat的值为:%f\n",*pFloat);
 *pFloat = 9.0;
 printf("num的值为:%d\n",n);
 printf("*pFloat的值为:%f\n",*pFloat);
 return 0;
}

insert image description here
We can see whether our results are a bit unexpected. Why is this? The reason is that our storage of floating-point numbers is different from that of integers. Let's explore it.

3.2 Floating-point number storage rules
num and *pFloat are clearly the same number in memory, why the interpretation results of floating-point numbers and integers are so different?
To understand this result, you must understand how floating-point numbers are represented inside the computer.
Detailed interpretation:
According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be expressed in the following form:

-1)^S * M * 2^E
(-1)^S represents the sign bit, when S=0, V is a positive number; when S=1, V is a negative number.
M represents a valid number, greater than or equal to 1 and less than 2.
2^E means exponent bits. Left shift is positive, right shift is negative

For example:
5.0 in decimal is 101.0 in binary, which is equivalent to 1.01×2^2.
Then, according to the format of V above, it can be concluded that S=0, M=1.01, and E=2.
Decimal -5.0, written in binary is -101.0, which is equivalent to -1.01×2^2. Then, S=1, M=1.01, E=2.
IEEE 754 stipulates:
For a 32-bit floating-point number, the highest bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 23 bits are the effective number M.

insert image description here
For 64-bit floating-point numbers, the highest bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant figure M. IEEE 754 has some special regulations for the significant figure M and the exponent
insert image description here
E.
As mentioned earlier, 1≤M<2, that is to say, M can be written in the form of 1.xxxxxx, where xxxxxx represents the decimal part.
IEEE 754 stipulates that when M is saved inside the computer, the first digit of this number is always 1 by default, so it can be discarded, and only the following
xxxxxx part is saved. For example, when saving 1.01
, only save 01, and then add the first 1 when reading. The purpose of doing this is to save 1 significant figure. Taking the 32-bit
floating-point number as an example, there are only 23 bits left for M.
After the first 1 is discarded, 24 significant figures can be saved.

As for the index E, the situation is more complicated.
First, E is an unsigned integer (unsigned int)
, which means that if E is 8 bits, its value range is 0 255; if E is 11 bits, its value range is 0 2047. However, we
know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that an intermediate number must be added to the real value of E when stored in memory. For 8-digit E, the intermediate number is 127; for 11-digit E, the intermediate number is 1023. For example, the E of 2^10 is 10, so when saving it as a 32-bit floating point number, it must be saved as 10+127=137, which is 10001001.

Then, the index E is fetched from the memory and can be further divided into three cases

E is not all 0 or not all 1

At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted by 127 (or 1023) to obtain the real value, and then the
first digit 1 is added before the effective number M.
For example:
the binary form of 0.5 (1/2) is 0.1, since the positive part must be 1, that is, the decimal point is shifted to the right by 1, then it is
1.0*2^(-1), its order code is -1+127=126, which is expressed as
01111110, and the mantissa 1.0 removes the integer part to be 0, and fills 0 to 23 digits 00000000000000000000000, then its The binary representation is
: 0 01111110 00000000000000000000000

E is all 0

At this time, the exponent E of the floating-point number equal to 1-127 (or 1-1023) is the real value, and the
effective number M is no longer added with the first digit of 1, but is restored to a decimal of 0.xxxxxx. This is done to represent ±0, and
very small numbers close to 0

E is all 1

At this time, if the significant number M is all 0, it means ±infinity (positive or negative depends on the sign bit s);
well, that’s all for the representation rules of floating-point numbers.

So let's now explain the previous topic

#include<stdio.h>
int main()
{
    
    
	int n = 9;
	//00000000000000000000000000001001--9的补码

	float* pFloat = (float*)&n;
	printf("n的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);
	//当我们用浮点数来打印我们的9的时候
	//0 就是我们的符号位
	//我们的后八位都是0 默认E就是-126
	//那我们的写成浮点数就是 (-1)^0*0.00000000000000000001001*2^-126
	//最终结果就是0.000000
	*pFloat = 9.0;
	printf("num的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);
	//9我们我们用浮点数存储就是(-1)^0*1.001*2^3
	//0 10000010 001 0000 0000 0000 0000 0000
	//变成我们的十进制就是1091567616
	return 0;
}

The explanations are all written in the code, so that’s all for our sharing today, thank you everyone! ! !

Guess you like

Origin blog.csdn.net/2301_76895050/article/details/131754442