c language data storage

Supplementary notes for a holiday.
There may be a dozen or twenty articles published this month.

In-memory storage of shapes

original denial

For signed numbers, the most significant bit is generally used to represent the sign bit. Use 0 for "positive" and 1 for "negative"

There are three representation methods for signed numbers in the computer, namely original code, inverse code and complement code.

Any data in the computer must be converted into binary, because the computer only understands binary.

If a data is negative, then the following rules must be followed for conversion:

  • Original code: You can directly translate binary into binary in the form of positive and negative numbers.
  • Complement code: Keep the sign bit of the original code unchanged , and the other bits can be obtained by inverting the other bits in turn.
  • Complement: Complement +1 to get the complement.

The original and negative complement of positive numbers are the same

  • Unsigned number: no conversion, no sign bit, the same as the original negation.
  • For shaping: the data stored in the memory actually stores the complement code

Conversion method

  • Method 1: First -1, the sign bit remains unchanged , and the bit is inverted.
  • Method 2: The process of converting the original code to the complement code is repeated.
    demo
   int a = 20;  
//20是正整数  
//0000 0000 0000 0000 0000 0000 0001 0100  
   int b = -10;  
//-10是负整数  
//1000 0000 0000 0000 0000 0000 0000 1010  
//1111 1111 1111 1111 1111 1111 1111 0101  
//1111 1111 1111 1111 1111 1111 1111 0110

unsigned and signed keywords

Signed and unsigned integers

signed int a=-10;
unsigned int b=10u; //u提醒自己为无符号数,可加可不加

Ranges

  • Take char as an example
signed char 
0000 0000
//最高比特位是符号位
//意味着有7个数值位
1 111 1111 -> -127   //可以表示[-127,0]
0 111 1111 -> 127    //可以表示[0,127]
	
//那么
//1000 0000和0000 0000都可以表示0
//存在两种表现方式取哪一种?
  • To solve this problem, 1000 0000 is used to represent -128.
  • Prove:
1 1000 0000 //-128的源码
1 0111 1111 //反码
1 1000 0000 //补码
//9比特位存入8比特位的char类型会发生截断。
//最后存入的是1000 0000
	
//取的时候规定1000 0000是-128即可。	

So the value of signed char type is -128~127.
The same is true for unsigned numbers, which is 0~255

**Summary rule: the value range of integers**

  • Unsigned: [0,2^n-1]
  • Signed: [-2^(n-1), 2^(n-1)-1]

How much data a data type can represent depends on the number of permutations corresponding to multiple bits.


Storage and retrieval of variable contents

  signed int b = -10;  
  unsigned int d = -10;  //???

Space first, content second

When saving data into space, the data has been converted to binary. When shaping storage, the data does not care about the content.

  • We first convert the content to binary:

-10's
complement: 1111 1111 1111 1111 1111 1111 1111 0110

  • save.
  • When a variable is read, the type determines how the binary sequence stored inside the space is interpreted.

It only makes sense when the data has a type!

   signed int b = -10;    //-10
   unsigned int d = -10;  //4294967286

in conclusion:

  • Storage: Literal data must first be converted to complement, and placed in the space. Therefore, the so-called sign bit depends entirely on whether the data itself carries the ± sign. It doesn't matter if the variable is signed or not!
  • Take: To take data, you must first look at the type of the variable itself, and then decide whether to look at the highest sign bit. If not needed, directly convert binary to decimal. If necessary, it needs to be converted to the original code before it can be recognized. (Of course, where is the highest sign bit, and the end of size must be specified)

Why are all complements?
In computer systems, values ​​are always represented and stored in two's complement numbers. The reason is that, using the complement code, the sign bit and the value field can be processed uniformly; at the same time, addition and subtraction can also be processed uniformly (the CPU only has an adder). In addition, the complementary code and the original code are converted to each other, and the operation process is the same, and no additional hardware circuit is required.


endian storage

  • Assume the hexadecimal number 0x12 34 56 78.
  • The data is in units of bytes, and there are high and low weight bits.
  • There are high and low addresses in memory.
  • At this time, the weight of 12 > the weight of 78.
  • How to store this number in memory?
    ![[})FCGJ~]CPO1O3TQ2[SYK4J.png]]insert image description here

No matter how you save it, as long as you get it under the same conditions, you can.

Therefore, there are two storage schemes: big endian and little endian

  • Big endian: According to the unit of bytes, the low weight bit data is stored at the high address, which is called big endian.
  • Little endian: According to the unit of byte, the low weight bit data is stored at the low address, which is called little endian.

Take 0x11223344 as an example:

0xA0 0xA1 0xA2 0xA3
big endian 11 22 33 44
little endian 44 33 22 11

No matter how you save it, it will not affect the user's use!

Why there are big endian and little endian:
This is because in computer systems, we use bytes as a unit, each address unit corresponds to a byte, and a byte is 8 bits. But in C language, in addition to 8-bit char, there are 16-bit
short type and 32-bit long type (depending on the specific compiler). In addition, for processors with more than 8 bits, such as 16-bit Or for a 32-bit processor, since the register width is greater than one byte, there must be a problem of how to arrange multiple bytes. This leads to the big endian storage mode and the little endian storage mode.
For example: a 16bit short type x, the address in memory is 0x0010, the value of x is 0x1122, then 0x11 is the high byte and 0x22 is the low byte. For big endian mode, 0x11 is placed in the low address, ie 0x0010, and 0x22 is placed in the high address, ie 0x0011. Little-endian mode, just the opposite. Our commonly used X86 structure is little-endian mode, while KEIL C51 is big-endian mode. Many ARMs and DSPs are in little-endian mode. Some ARM processors can also be selected by hardware to be big-endian or little-endian.

Supplement: 1
insert image description here


Storage of floating point numbers in memory

A piece of code:

int n = 9;  
float *pFloat = (float *)&n;  
printf("n的值为:%d\n",n);  
printf("*pFloat的值为:%f\n",*pFloat);  
*pFloat = 9.0;  
printf("num的值为:%d\n",n);  
printf("*pFloat的值为:%f\n",*pFloat);  

result:insert image description here

From the results we can speculate that there is a difference between the storage of floating point numbers and integers in memory.

storage rules

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be represented in the following form:

  • (-1)^S * M* 2^E
  • (-1)^s represents the sign bit. When s=0, V is positive; when s=1, V is negative.
  • M represents a significant number, greater than or equal to 1 and less than 2.
  • 2^E means the exponent bit.

For example:
5.5 in decimal is 101.1 in binary. 2^1+2^0+2^(-1)
Equivalent to 1.011*2^2.
According to the format of V above, we can get s=0, M=1.011, E=2.

Similarly, decimal -5.5 can be obtained according to the format: s=1, M=1.011, E=2.

** IEEE 754 stipulates: **
For a 32-bit floating-point number, the highest 1 bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 23 bits are the significand M.
insert image description here

For a 64-bit floating-point number, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significand M.
insert image description here

IEEE 754 also has some special provisions for the significand M and the exponent E.
As mentioned earlier, 1≤M<2, that is, M can be written in the form of 1.xxxxxx, where xxxxxx represents the fractional part.
IEEE 754 stipulates that when M is stored in the computer, the first digit of this number is always 1 by default, so it can be discarded , and only the following xxxxxx part is saved. For example , when saving 1.01, only 01 is saved , and when it is read, the first 1 is added. The purpose of this is to save 1 significant figure . Taking a 32-bit floating point number as an example, there are only 23 bits left for M. After rounding off the 1 in the first digit, it is equivalent to saving 24 significant digits.

As for index E, the situation is more complicated.
First, E is an unsigned integer (unsigned int)
which means that if E is 8 bits, its value range is 0 255; if E is 11 bits, its value range is 0 2047. However, we know that E in scientific notation can be negative, so IEEE 754 stipulates that the real value of E must be added with an intermediate number when stored in memory .
For 8-bit E , this intermediate number is 127 ; For an 11-bit E , this intermediate number is 1023 . For example, the E of 2^10 is 10, so when it is stored as a 32-bit floating point number, it must be stored as 10+127=137, which is 10001001.

Then, the index E is fetched from memory and can be further divided into three cases:

  1. E is not all 0 or not all 1

At this time, the floating-point number is represented by the rule of the s surface, that is, the calculated value of the exponent E is subtracted from 127 (or 1023) to obtain the real value, and then the significant number M is added with the first 1.

*举例:   0.5(1/2)的二进制形式为0.1,由于规定正数部分必须为1,即将小数点右移1位,则为1.0\*2^(-1),其阶码为-1+127=126,表示为01111110,而尾数1.0去掉整数部分为0,补齐0到23位00000000000000000000000,则其二进制表示形式为:*`   0 01111110 00000000000000000000000`
  1. E is all 0

At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023), which is the real value, and the
significant number M is no longer added with the first 1, but is restored to the decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.
To explain it, it is a number smaller than 2^(-126).

  1. E is all 1

At this time, if the significant digits M are all 0, it means ±infinity. (positive or negative depends on the sign bit s)

practise

Let's analyze the question at the beginning.
0x00000009 is restored to a floating point number
First, split 0x00000009 to
get the first sign bit s=0, the latter 8-bit exponent E=00000000, and the last 23-bit significant figure M=000 0000 0000 0000 0000 1001.

E is all 0, according to the second reduction:
V=(-1)^0 × 0.00000000000000000001001×2 (-126)=1.001×2 (-146)
Obviously, V is a small positive number close to 0, So in decimal it is 0.000000.

The floating-point number 9.0 is represented in memory
First, the floating-point number 9.0 is equal to 1001.0 in binary, which is 1.001×2^3.
To get the first sign bit s=0, the significant number M is equal to 001 followed by 20 0s, making up 23 bits, and the exponent E is equal to 3+127=130, that is, 10000010.
Written in binary form: 0 10000010 001 0000 0000 0000 0000 0000
Converted to decimal form: 1091567616.


  1. From: c language Chinese network ↩︎

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324122629&siteId=291194637