ICS mid-term review notes

Preface

Multiple choice

  • Be careful with absolute options
  • If you can’t choose, see if you can find the pits in the options.

compilation

  • Pay attention to the order of operations when filling in the blanks, add parentheses when adding parentheses
  • When filling in the machine code, pay attention to the little endian method
  • Pay attention to the instruction suffix and the length of the register!
  • When filling the stack frame, pay attention to the memory scale

processor

  • Processor warm-up cycle
  • Potential data hazards between cycles
  • When calculating the number of cycles, consider the first/last cycle separately
  • Pay attention to whether it is Cnd or! Cnd

optimization

  • Pay attention to the situation where memory aliases cause critical paths
  • Data dependency graph

Cache

  • Pay attention to the order, the default is advanced first

  • Read the questions carefully and carefully!

  • Find a large enough place, list all the data neatly, first convert to hexadecimal, and then convert to binary

Chp2 data

1 integer

Positive numbers overflow as negative numbers, and negative numbers overflow as positive numbers

assert(0x80000000 > 0); // 十六进制数先转unsigned, 再转long; 这里转成unsigned
int x = 0x80000000;
assert(x < 0);

Classic example: INT_MIN == -INT_MIN

2 Floating point

Space allocation
sign exp normalized range frac
float 1 8 -126~127 23
double 1 11 -1022~1023 52
Special value
  1. In

    float nan1 = u2f(0xffc00000u);
    float nan2 = u2f(0xffc00001u);
    printf("%d %d\n", nan1==nan1, nan1!=nan1);
    printf("%d %d\n", nan1==nan2, nan1!=nan2);
    

    result:

    0 1
    0 1
    

    Simply put: Nan!= Nan Heng is established

  2. denormed

  3. INF

3 float and int

  1. float to int, directly intercept
  2. int/double to float, rounding

4 About TMIN

cout << (-2147483648 > 0) << endl;	// 0
cout << (0x80000000 > 0) << endl;	// 1

Explanation:

​ Hexadecimal conversion order int->unsigned->long; decimal conversion order: int->long

​ So the first one is treated as long and the second one is treated as unsigned

5 Big-endian and little-endian

Big-endian: put the high bit in front (low address)

Little-endian: low-order first (higher address)

6 other

Pay attention to operator precedence

When type conversion,Change the size first, then perform signed/unsigned conversion

Chp3 compilation

1 Basic

format Value
$Imm Imm
Imm M[Imm]
(r1, r2) M[R[r1]+R[r2]]
Imm(r1,r2)
(,r1,s) M[s*R[r1]]
Imm(,r1,s)

scaler must be 1/2/4/8

register

% r12 ~% r15Same as ==%rbx and %rbp==, both are callee saved

%r8 and %r9 are the 5th and 6th parameters respectively

suffix

qlwb

lea can only be used with q

Operand

Unary arithmetic / logical operation ofOperand can be memory

Two of binary arithmetic/logical operationsOperands can be memoryThe first operand can also be an immediate value

The first operand of MOVS/Z and CMOV instructions can be memory,The second operand can only be a register

CMOV type instruction ofOperand cannot be single byteSpecify length by register suffix(The meaning of the suffixes of movb and movl!)

Condition code
instruction Condition code
leaq No condition code
inc/dec Set ZF and OF, not CF
logic operation Let CF and OF be 0
Shift operation Let CF and OF be 0
instruction condition Remarks
setl OF^SF
setb CF
sets SF If it is negative

2 Rare assembly instructions

Data transfer related
instruction effect Remarks
cltq Extend %eax sign to %rax convert long to quad
movabsq The operand of movabsq is an immediate value, which can be 64 bits,The purpose can only be a register And ordinary movq can only be 32 bits
Arithmetic related

Shift operation: k can be stored in ==%cl== (single byte), taken by the low m bit

sar: When there is only one operand, k is 1

instruction effect Remarks
(i)mulq S rdx : rax ← \leftarrow S * rax i means a signed number
otherwise it is an unsigned number
cqto rdx : rax ← \leftarrow rax consert quad to oct
Cqto before using idivq
Clear rdx before using divq
(i)divq rax ← \leftarrow rdx:rax / S
rdx ← \leftarrow rdx:rax % S
noteS is the divisor
Jump related

jmp *%rax

jmp *(%rax)

Process related

leaveIs equivalent to:

movq %rbp, %rsp
popq %rbp

Followed by ret

3 Logic

See review materials P6~7

Pay attention to switch:, jmp *JUMP_LIST(, index, 8)the ==*== is important!

4 process

Stack frame layout

  • (Rbp)
  • Register (callee saved)
  • Local variables (not necessarily 8-byte alignment, consistent with struct alignment)
  • 7+th parameter (all8-byte alignment,%rsp+8*(k-6))
  • Return address RA

call and jmp generally use relative coding

5 floating point

%xmm0 return value

%xmm0~7 8 parameters

Chp4 Y86 processor

Condition code without CF

Register does not have %r15

call can only be addressed absolutely, not PC relative addressing

Push/pop %rsp behavior: always process the original value

Logic gate

The round one is AND, the sharp one is OR(It is exactly the reverse of the letter shape)

CISC vs RISC

index CISC RISK
delay Different lengths All short
Encoding length Variable length Fixed length (4 bytes)
Memory addressing Diverse Only base address and offset addressing
Memory access Arithmetic/logical operation can access memory load/store architecture
Arithmetic / logical operation operand Can be memory Can only be a register
Degree of abstraction abstract The details are visible
Condition code Have no
process Stack dense Register intensive
For example

concept

delay: The time it takes to process an instruction from start to finish

Throughput: the total number of instructions processed per unit time (unit: GIPS, or number of instructions/ns)
throughput = 1 maximum module delay + register delay (ps) ∗ 1000 throughput = \dfrac{1}{maximum module delay Time + register delay (ps)} * 1000Swallow spit amount=Most large mold block delay time+Register memory device delay time ( P S )11000

其他

  • 注意运行前要填充流水线,5阶段流水线要填充4个周期
  • 注意循环可能导致潜在的数据冒险
  • 计算周期数时,注意单独考虑第一次/最后一次循环

Chp5 优化

循环展开级数并不是越多越好,考虑容量不命中(寄存器也算)

Chp6 缓存

RAM

晶体管数/bit 访问时间 成本 应用 敏感
SRAM 6 x1 x1000 缓存
DRSM 1 x10 x1 内存

传统DRAM

超单元:由 ω \omega ω个单元组成

DRAM芯片有rc=d个超单元

访问DRAM内容时,先发RAS请求,DRAM取出相应行的数据,放进一个缓冲区,再发CAS请求,复制出相应的 ω \omega ω位数据。RAS和CAS占用相同的引脚。两次发送是为了降低芯片的引脚数量

总共需要 ω + m a x ( l o g 2 r + l o g 2 c ) \omega+max(log_2r+log_2c) ω+max(log2r+log2c)个引脚

增强DRAM

名称 特点
FPM 对于同一行数据的访问,可以直接从缓冲区中读取,只发一次RAS请求即可
EDO FPM的增强
SDRAM 比异步的更快
DDR SDRAM 相比SDRAM速度翻倍
VDRAM 对图形系统的优化

ROM

擦写次数 应用
PROM 1
EPROM 1000
EEPROM 10^5 闪存、SSD

固件:ROM上的程序,例如BIOS、驱动程序

BUS

总线事务:读事务、写事务

总线:系统总线、内存总线

DISK

注意单位GB与GiB的区别

盘面->表面->磁道->扇区

柱面,个数等于每个表面的磁道个数

计算磁盘容量:注意每个盘片有两个表面

计算访问时间 = T a v g   s e e k + 60 ∗ 1000 R P M ∗ ( 1 2 + 1 磁 道 平 均 扇 区 数 ) =T_{avg\ seek} + \dfrac{60*1000}{RPM}*(\dfrac{1}{2}+\dfrac{1}{磁道平均扇区数}) =Tavg seek+RPM601000(21+1)

寻道时间和旋转延迟大致相等,所以可以用寻道时间*2估计旋转延迟

磁盘控制器将物理磁盘与逻辑磁盘之间建立映射

概念:内存映射I/O

概念:DMA直接内存访问

SSD

读比写快

以页为单位读写

一页被擦除后才能写入数据

写慢的原因

  • 擦除慢,1ms量级(读是50us量级)
  • 若块中已有数据,要先复制

Cache

一路(way)有很多行(line)

缓存不命中的几种特殊情况

  • 冷不命中/强制性不命中
  • 冲突不命中
  • 容量不命中

计算Cache Size

c a c h e s i z e = d a t a s i z e + ( v a l i d b i t s i z e + t a g s i z e ) ∗ b l o c k n u m b e r cachesize = datasize + (validbitsize + tagsize) * blocknumber cachesize=datasize+(validbitsize+tagsize)blocknumber

Cache参数的影响

命中时间 命中率 不命中处罚 有效数据占比
缓存大 增大(理解) 增大(减少容量不命中) —— ——
块大 —— 块大,空间局部性提高;
行数变小,时间局部性降低
增大(复制成本) 增大
组相连度高 增大(理解) 减少冲突不命中/抖动
可能放大容量不命中的影响
增大(选择牺牲行的成本) 减小(tag位变长)

存储结构越往下走,就越不能忍受不命中,宁可牺牲一点命中时间,因此会选择更高的组相连度

写策略影响

  • 直写 & 非写分配
    • 减少总线流量,增大复杂性(修改位dirty bit)
  • 写回 & 写分配
    • 层次较低的多用,因为不能忍受反复不命中

Guess you like

Origin blog.csdn.net/w112348/article/details/109742106