Numerical data is stored in the computer

reference:

Yuan MOOC- spring: a computer-based system (a) the second week

Fixed-point and floating-point numbers

Integers, floating point numbers stored in the computer


Numerical data typically includes an integer and a real number . Numerical data representing three major concern in the computer: binary notation, fixed / floating-point representation, a binary coding.

Binary notation:

  • Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
  • Binary: 0, 1
  • Octal: 0, 1, 2, 3, 4, 5, 6, 7
  • Hex: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

Binary conversion (binary R): the integer part, in addition modulo R, right lower left; fractional part, rounding by R, the lower left and right.

Easy way:

835=512+256+64+2+1=>1101000011

= 0.6875 + 0.125 + 0.5 = 0.0625> 0.1011 (how to represent the decimal point in the computer)

Fixed, floating point representation (solving the decimal problem)

Fixed points : the position of the decimal point in the machine agreed all fixed data, the fixed point is typically expressed as a decimal fraction (fixed decimal point before the most significant bit portion value) or pure integer (values in decimal fixed rearmost portion). Decimal point to represent the mantissa part of the floating-point number; fixed-point integer used to represent an integer. Integer is divided into a signed integer and unsigned integers . It determines the presence or absence of the sign bit indicates the range of values.

Float : using the index of the effect of floating-point, it will be expressed as a real number X \ (X = (-. 1) ^ S \ Times M \ E ^ R & lt Times \) . S 0 or 1, X is decided symbol ; M is a binary decimal point, called the X mantissa (mantissa); E is a fixed point binary integer, called the X stage or exponential (exponent); R is a radix (base ), take the computer 2 (formerly 4 or 16).

Floating-point in the computer that generally follows the IEEE 754 standard:

Using single-precision (32bit): 1 sign bit, eight exponent, 23 mantissa;

Progress using bis (64 bit): 1 sign bit, 11 exponent, 52 mantissa.

As mentioned earlier, the mantissa portion of floating decimal point to represent the use, floating-point and fixed-point integer order frameshift expressed (i.e., plus offset).

  • Fixed-point integer, fixed point decimal
  • Float: it can be a decimal point and fixed-point integer to represent a

Binary-coded (to solve the sign problem)

The original code will be described below, complement, and the inverted shift (less).

Original code

The first one is the sign bit, the positive number represented by 0, 1 indicates a negative sign, the other bits directly converted from the decimal number to a binary number. The following table is a correspondence relationship with the 4-bit binary numbers of decimal numbers.

Decimal Binary Decimal Binary
0 0000 -0 1000
1 0001 -1 1001
2 0010 -2 1010
3 0011 -3 1011
4 0100 -4 1100
5 0101 -5 1101
6 0110 -6 1110
7 0111 -7 1111

problem:

  • It can be found not unique represent 0, + 0 ( 0 000) and -0 ( . 1 000) represents not the same;

  • +/- operation mode is not uniform: \ (1 + 2 = 0001 + 0010 = 0011 \) (right); \ (1-2 = 0001-0010 = 0001 + 1010 = 1011 \) (error), the need for sign bit processing

Complement

A negative integer corresponding to the complement of you negated, plus one end.

Deformation complement (4's comlement): double symbol, for storing intermediate results may overflow.

Decimal Complement Deformation complement Decimal Binary negated Complement Deformation complement
0 0000 00000 -0 1111 0000 00000
1 0001 00001 -1 1110 1111 11111
2 0010 00010 -2 1101 1110 11110
3 0011 00011 -3 1100 1101 11101
4 0100 00100 -4 1011 1100 11100
5 0101 00101 -5 1010 1101 11101
6 0110 00110 -6 1001 1110 11110
7 0111 00111 -7 1000 1111 11111
8 1000 01000 -8 0111 1000 11000

Note decimal number such as 8, occupies too large value of the sign bit (overflow), with complement 4 can not be represented, but for complement can be modified with retention of the sign bit and the highest numerical value.

We can see from the table, +0 -0 complement and represents a unique, to analyze problems subtraction below.

Prior to this, first look at an example of modulo operation.

Timepiece hour hand points to 10 points, in order to position it to 6:00, there are two methods: dial 4 down grid (10-4 = 6) or cis dial 8 (10 + 8 = 18 18mod12 = 6, the mold used here operation).

Another example: a four decimal modulo operation system, with only the addition, calculated 9828-1928

\ (9828-1928 \\ = 9828+ (10 ^ 4-1928) (negative inverted) \\ = 9828 + 8072 \\ = 17900 \ mod 10 ^ 4 (high discard 1) = 7900 \\ \)

Computer arithmetic modulo operation system is such (because the computer operator is only a finite bit, assuming n bits, the result of the operation can retain lower n bits).

Calculated as:

\(0111\;1111-0100\;0000\\=0111\;1111+(2^8-0100\;0000)\\=0111\;1111+1100\;0000\\=10011\;1111(\mod 2^8)\\=0011\;1111\)

Namely the use of complement subtraction can be processed as a summation.

Frameshift

The offset value plus a constant (Excess / bias). When the number of coded bits is n, bias usually takes 2 ^ (n-1) or 2 ^ (n-1) -1 (such as IEEE 754). For example, n = 4 when:

Decimal Binary frameshift
-8(+8) 0000
... ....
0(+8) 1000
... ...
+7(+8) 1111

Noting: 0 frameshift representation unique. When the bias is \ (2 ^ {n-1 } \) , the complement and shift only the first one is not the same.

Order is represented by floating-point shift . why? When the floating-point subtraction Dui facilitate operation order (size comparison). Such as: \ (1.01 \ Times 2 ^ {-} + 1.11. 1 \ 2 ^ Times. 3 \) , Dui order when -1 is the complement of the complement 111,3 011, this time comparing the two Size: \ (111 <011? \) , can not be directly compared. Plus a constant and the order are, i.e. 4 = -1 + 3 + 4 011,3 complement is the complement of 7 = 111, then comparison: 011 <111, the result is directly established.

Guess you like

Origin www.cnblogs.com/xfy-learning/p/11519784.html