In-depth understanding of computer systems (2): representation and processing of information

Article Directory

In-depth understanding of computer systems (2): representation and processing of information

How is the information represented?

Hexadecimal notation
What is the word data size
Addressing and byte order
How to represent strings and codes
Common operations in C language

Boolean operations
logic operation
Shift operation

How to represent integers

How is the information represented?

Most computers use 8-bit blocks, or bytes , as the smallest addressable memory unit, rather than accessing individual bits in memory. Machine-level programs treat memory as a very large byte array and become virtual memory . Each byte of memory is identified by a unique number and becomes its address. The collection of all possible addresses is called a virtual address space. .

Hexadecimal notation

We know that a byte is composed of 8 bits. In binary notation, the range is 00000000 ~ 11111111. Converted to decimal, the value range is 0 ~ 255. These two ways to describe bits are not very good, because the binary is too long, and the decimal and binary conversion is very troublesome, so came up with a compromise method-using hexadecimal notation.

The following figure shows the decimal and binary corresponding to the hexadecimal notation.

$[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-s5lZGU2A-1586855517914) (C: \ Users \ NayelyA \ AppData \ Roaming \ Typora \ typora-user-images \ image-20200414155530616.png)]$

The numbers starting with 0x or 0X are hexadecimal values. The conversion between hexadecimal, binary, and decimal is still very basic knowledge, you must master it, and I will not introduce it in detail here!

What is the word data size

When we download certain software installation packages, we often see 32-bit and 64-bit installation package instructions. What do these 32-bit and 64-bit mean? In fact, it refers to 32-bit word-length machines and 64-bit word-length machines.
Insert picture description here
Every computer has a word length, indicating the standard size of the pointer data. The virtual address is encoded with such a word, so the most important system parameter determined by the word length is the maximum size of the virtual address space. For example, the word length of my computer is 64 bits, then the range of the virtual address is 0 ~ 2 to the 64th power -1, and the program can access up to 2 64th power bytes.

Most 64-bit computers can run 32-bit machine compiler programs, but 64-bit programs can only run on 64-bit machines.

Addressing and byte order

On almost all machines, multi-byte objects are stored as a continuous sequence of bytes, and the address of the object is the smallest address of the bytes used. For example, the address of the variable x of type int is 0x100, that is, the address expression & x in C is 0x100, then 4 bytes of x will be stored in the memory at 0x100, 0x101, 0x102, 0x103.

How exactly how to store it? There are two ways, one is big-endian: the most significant byte is first, and the other is little-endian: the least significant byte is first. Still using the example just described, the hexadecimal value of x is 0x01234567, and the byte order of the address range 0x100 ~ 0x103 is arranged according to the big-endian and little-endian methods as follows:

$[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-1Tc8XUxh-1586855517916) (C: \ Users \ NayelyA \ AppData \ Roaming \ Typora \ typora-user-images \ image-20200414161338024.png)]$

The choice of big-endian or little-endian depends on the type of machine. Most of them only use little-endian mode, and some new microprocessors use the double-end method.

For us, the byte order used by the machine is not visible to us, but the byte order is really important for users. For example, if a big-endian machine communicates with a little-endian machine through the network, the bytes in the word will be reversed in the receiving program. In order to avoid such problems, the recipient needs to make some conversions.

How to represent strings and codes

The string is represented by a standard encoding, the common is the ASC II character code. The same result will be obtained on any system that uses the ASCⅡ code as the character code, regardless of the byte order and word size rules. Text data has stronger platform independence than binary data.

How to represent the code? In the in-depth understanding of the computer system (1), we briefly introduced that files like hello.c are composed of ASCⅡ characters and belong to text files. After compilation, machine code expressed by bytes (indicated by binary code) will be generated. Even if the exact same process runs on different operating systems, there will be different encoding rules. As the following sum function, the results of compilation on different operating systems are different:

int sum(int x,int y){
    return x+y;
}

$[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-TsP6WfWM-1586855517921) (C: \ Users \ NayelyA \ AppData \ Roaming \ Typora \ typora-user-images \ image-20200414170514072.png)]$

Common operations in C language

Boolean operations

_{, &, | Indicate corresponding, respectively, ^ and logical operations NOT, AND, OR, EXCLUSIVE- OR, specific calculation rule not introduced. **} , &, | form a Boolean algebra **, Boolean algebra has many similarities with integer arithmetic operations, such as Boolean operations & pairs | and | pairs & have distribution laws, such as a & (b | c) = (a & b ) | (A & c), a | (b & c) = (a | b) & (a | c).

When you consider ~, &, ^ on the bit vector, you will get a different mathematical form, called a Boolean ring. There is a very important property that can be used to solve problems in the brush algorithm problem, (a ^ b) ^ a = b.

Bit vectors can also be used to represent a finite set, and because of the Boolean operation | and & corresponding to the union and intersection of the set, ~ corresponding to the complement of the set, so when using bit vectors to represent the set, it is very convenient Consolidation of collections.

C-bit Boolean operations are supported in C language.

logic operation

Logical operators ||, && and! , Corresponding to OR, AND, NOT operations in propositional logic. Logical operations and bitwise Boolean operations are different:

Bitwise Boolean operations only have the same behavior as the corresponding logical operations when the parameter is limited to 0 or 1.
The second difference between && and || and the corresponding & and | is that if the result of the expression can be determined by evaluating the first parameter, then the logical operator will not evaluate the second parameter.

Shift operation

The shift operation is nothing more than moving to the left or right. Taking moving to the right as an example, there are generally two forms of right shift: logical right shift and arithmetic right shift. x>>k:

The logical right shift is filled with k 0s at the left end. The arithmetic right shift is to fill the value of the k most significant bits at the left end. An example is shown in the figure below. The numbers in italics indicate the left / right shifted values.
$[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-NgyxCOmr-1586855517924) (C: \ Users \ NayelyA \ AppData \ Roaming \ Typora \ typora-user-images \ image-20200414164328504.png)]$
C does not specify which form of left and right shifts are required for signed numbers, but almost all compiler / machine combinations use arithmetic right shifts. In Java, x >> k is the arithmetic right shift, and x >>> k is the logical right shift.

Imagine a question, a data type is 32 bits, but what happens when we move more than 32 bits? For example, a variable of this data type is shifted 40 bits to the left. In C, the amount of displacement is calculated k mod w, that is to say, when w = 32, k = 40, it will actually shift 40%32=8bits.

If the addition and subtraction and shift operations are involved in the expression, pay attention to it. The priority of addition and subtraction is higher than the shift operation, like 1 << 2 + 3 << 4 It is 512 instead of 51 (1<<2)+(3<<4).

How to represent integers

We describe two different ways to encode integers with bits, one can only represent non-negative numbers, and one can represent negative numbers, zeros, and integers.

C supports many integer data types, of which the value range of long is the only one related to the machine. The range of long on 32-bit machines is [-2147483648,2147483647], and on 64-bit machines it is [-9223372036854775808,9223372036854775807].

$[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-ETcS1YJW-1586855517925) (C: \ Users \ NayelyA \ AppData \ Roaming \ Typora \ typora-user-images \ image-20200414165507967.png)]$
We can see that the range of values is asymmetric, and the range of negative numbers is larger than the integer by 1, so why is this the case? We want to indicate that the negative value is realized by two's complement. Half of the bit patterns (numbers with sign bit set to 1) represent negative numbers, while the other half (numbers with sign number set to 0) represent non-negative numbers, because 0 also Is a non-negative number, all this means that the integer that can be represented is one less than the negative number.