computer arithmetic operations
Sub-word parallel (rough browsing) pdf about 170 pages
Floating point addition is not associative:
The parallel execution strategy that applies to integer data types does not apply to floating-point data types for the reasons mentioned above↑
processor
assembly line
Pipelining is an implementation technology that enables overlapping execution of multiple instructions.
Pipelining improves performance by increasing instruction throughput rather than reducing the execution time of individual instructions
forward or bypass
Inter-instruction parallelism
Pipelining technology exploits the potential parallelism between instructions, which is called instruction-level parallelism.
There are two main ways to improve instruction-level parallelism:
Increase the number of stages in the pipeline
Increasing the number of functional components inside the pipeline so that multiple instructions can be issued per cycle is a technique called multi-issue
There are two main ways to implement a multi-issue processor:
The judgment of whether the instruction is issued or not is completed at compile time, which is called static multi-issue.
If the judgment of whether to issue an instruction is completed by hardware during dynamic execution, it is called dynamic multi-issue.
Register Renaming ,The goal of register renaming is to eliminate other data,dependencies between instructions in addition to data dependencies. For example, ld x30,0(x20), add x31, x31,x21, sd x31, 8(x20), except that they all use x31, these instructions are actually independent of each other. This situation is called anticorrelation, or name correlation
Anti-correlation: also known as name correlation, the ordering caused by forced name reuse. This is not a true data correlation between instructions
Dynamic multi-issue processor
Dynamic multiple emitters also known as superscalar processors
Dynamic scheduling pipeline: The hardware selects instructions for subsequent execution and rearranges the instructions to avoid pipeline stalls. In such a processor, the pipeline is divided into three main parts: instruction fetch, issue unit, multi-function unit, and commit unit
Acceleration: Instruction-level parallelism and matrix multiplication
hierarchical storage
Temporal locality If a data item is accessed, it is likely to be accessed again in the near future
Spatial locality: If a data item is accessed, data items adjacent to its address may also be accessed soon.
The data in the layer closest to the processor is a subset of the data in the farther layer. All data is stored in the furthest layer.
The smallest unit of information exchange between two adjacent layers is called a block or row
Handle write operations
Write through or write through: A writing strategy. Write operations always update the cache and next-level storage at the same time to ensure data consistency between the two.
Write returns: A write strategy. When processing a write operation, only the value of the corresponding data block in the cache is updated. When the data block is replaced, the updated data block is written to the next level of storage.
Performance evaluation and improvement of cache
Direct mapping: a data block has only one corresponding location in the cache
Fully associative: Data blocks can be stored anywhere in the cache. To find a given data block in the fully associative cache, all entries must be compared.
Set associative cache: The number of locations where each data block is stored in the cache has a fixed value (at least 2). A set associative cache with n locations for each data block is called an n-way set associative cache. In an n-way set-associative cache, there are several groups, each group containing n data blocks.
Find data block in cache
Reliable memory hierarchy
The best way to increase reliability is redundancy
Definition of failure:
Three ways to improve MTTF:
Correct 1-bit error and detect 2-bit error Hamming coding
virtual machine
The software that supports virtual machines is called a virtual machine monitor or hypervisor . The underlying hardware platform is called a host, and its resources are shared by client virtual machines.
A hypervisor provides a software interface to client software, isolates the state of each client, and must protect itself from client software, including the client operating system.
Qualitative needs:
- Except for behavior related to performance or limitations of fixed resources due to sharing among multiple VMs, the client software should run on the virtual machine as if it were on local hardware.
- Client software cannot directly change the allocation of actual system resources
In order to "virtualize" the processor, the VMM must null almost everything: privileged access, I/O, exceptions and interrupts
virtual storage
Main memory can act as a cache for auxiliary storage usually implemented by disks. This technology is called virtual storage.
In virtual storage, addresses are divided into virtual page numbers and intra-page offsets
Since we cannot know in advance when a page in memory will be replaced, the operating system usually creates space on flash memory or disk for all pages when it creates a process. This area of space is called the swap area (the disk space reserved for the entire virtual address space of the process). At that time, it also creates a data structure to record where each virtual page is stored on the disk.
Speed up address translation: TLB
Block table (TLB, cache used to record mapping information of recently used addresses, thereby avoiding having to access the page table every time)
Protection in virtual storage
The most important function of virtual storage is to allow multiple processes to share a main memory while providing memory protection for these processes and the operating system. The protection mechanism must be clear: multiple processes share the same main memory, but whether intentionally or unintentionally, a malicious process cannot write to the address space of another user process or the operating system.
Context switch: To allow a different process to use the processor, change the state inside the processor and save the state needed when the current process returns
Handling TLB failures and page faults
A TLB failure indicates one of two possibilities:
- The page is in memory, only the missing TLB entry needs to be created
- The page is not in memory, and control needs to be transferred to the operating system to handle page faults.
Handling TLB or page faults requires the use of an exception mechanism to terminate the active process, transfer control to the operating system, and then resume execution of the interrupted process.
Once the operating system knows the virtual address that caused the page fault, it must complete the following three steps:
Summarize
Virtual storage is a first-level storage hierarchy for data cache management between main memory and secondary storage. Virtual storage allows a single program to extend its address space beyond the limits of main memory. Virtual storage supports sharing memory between multiple simultaneously active processes in a protected manner
A general framework for storage hierarchies
where blocks can be placed
How to find blocks
In a storage hierarchy, the choice between direct mapping, set associative or fully associative mapping depends on the trade-off between the cost of failure and the cost of associativity implementation, including time and additional hardware overhead.
Which block to replace when cache failure occurs
- Random: Randomly select candidate blocks, possibly using some hardware-assisted implementation
- Least Recently Used (LRU): The block being replaced is the block that has not been used for the longest time
How to implement write operation:
- Write-through: Information is written to blocks in the cache and blocks lower in the storage hierarchy (main memory for the cache)
- Write return: Information is only written to the block in the cache. A modified block is written to lower levels in the hierarchy only if it is replaced
Advantages of writing returns:
- The processor can write individual words at a rate that the cache, rather than the memory, can receive
- Multiple writes within a block require only one write to a lower level in the storage hierarchy
- When writing back a block, the system can effectively utilize high-bandwidth transmission by writing an entire block.
Advantages of write-through:
3C: An intuitive model for understanding storage hierarchies
Control a simple cache using finite state automata
Finite state automaton: a sequential logic function that contains a set of inputs/outputs, a state transition function (mapping the current state and inputs to a new state), and an output function (mapping the current state and inputs to a predicate output)
Next state function: A combinatorial function that, given an input and the current state, can derive the next state of a finite state automaton