Computer composition and design hardware and software interface learning 1

computer arithmetic operations

Sub-word parallel (rough browsing) pdf about 170 pages

Floating point addition is not associative:

The parallel execution strategy that applies to integer data types does not apply to floating-point data types for the reasons mentioned above↑

processor

assembly line

Pipelining is an implementation technology that enables overlapping execution of multiple instructions.

Pipelining improves performance by increasing instruction throughput rather than reducing the execution time of individual instructions

forward or bypass

 Inter-instruction parallelism

Pipelining technology exploits the potential parallelism between instructions, which is called instruction-level parallelism.

There are two main ways to improve instruction-level parallelism:

Increase the number of stages in the pipeline  

Increasing the number of functional components inside the pipeline so that multiple instructions can be issued per cycle is a technique called multi-issue

There are two main ways to implement a multi-issue processor:

The judgment of whether the instruction is issued or not is completed at compile time, which is called static multi-issue.

If the judgment of whether to issue an instruction is completed by hardware during dynamic execution, it is called dynamic multi-issue.

Register Renaming ,The goal of register renaming is to eliminate other data,dependencies between instructions in addition to data dependencies. For example, ld x30,0(x20), add x31, x31,x21, sd x31, 8(x20), except that they all use x31, these instructions are actually independent of each other. This situation is called anticorrelation, or name correlation

Anti-correlation: also known as name correlation, the ordering caused by forced name reuse. This is not a true data correlation between instructions

Dynamic multi-issue processor

Dynamic multiple emitters also known as superscalar processors

Dynamic scheduling pipeline: The hardware selects instructions for subsequent execution and rearranges the instructions to avoid pipeline stalls. In such a processor, the pipeline is divided into three main parts: instruction fetch, issue unit, multi-function unit, and commit unit

Acceleration: Instruction-level parallelism and matrix multiplication

hierarchical storage

Temporal locality  If a data item is accessed, it is likely to be accessed again in the near future

Spatial locality:  If a data item is accessed, data items adjacent to its address may also be accessed soon.

The data in the layer closest to the processor is a subset of the data in the farther layer. All data is stored in the furthest layer.

The smallest unit of information exchange between two adjacent layers is called a block  or  row

Handle write operations

Write through or write through: A writing strategy. Write operations always update the cache and next-level storage at the same time to ensure data consistency between the two.

Write returns: A write strategy. When processing a write operation, only the value of the corresponding data block in the cache is updated. When the data block is replaced, the updated data block is written to the next level of storage.

Performance evaluation and improvement of cache

Direct mapping: a data block has only one corresponding location in the cache

Fully associative: Data blocks can be stored anywhere in the cache. To find a given data block in the fully associative cache, all entries must be compared.

Set associative cache: The number of locations where each data block is stored in the cache has a fixed value (at least 2). A set associative cache with n locations for each data block is called an n-way set associative cache. In an n-way set-associative cache, there are several groups, each group containing n data blocks.

Find data block in cache

Reliable memory hierarchy

The best way to increase reliability is redundancy

Definition of failure:

Three ways to improve MTTF:

Correct 1-bit error and detect 2-bit error Hamming coding

virtual machine

The software that supports virtual machines is called a virtual machine monitor or hypervisor . The underlying hardware platform is called a host, and its resources are shared by client virtual machines.

A hypervisor provides a software interface to client software, isolates the state of each client, and must protect itself from client software, including the client operating system.

Qualitative needs:

  • Except for behavior related to performance or limitations of fixed resources due to sharing among multiple VMs, the client software should run on the virtual machine as if it were on local hardware.
  • Client software cannot directly change the allocation of actual system resources

In order to "virtualize" the processor, the VMM must null almost everything: privileged access, I/O, exceptions and interrupts

virtual storage

Main memory can act as a cache for auxiliary storage usually implemented by disks. This technology is called virtual storage.

In virtual storage, addresses are divided into virtual page numbers and intra-page offsets

Since we cannot know in advance when a page in memory will be replaced, the operating system usually creates space on flash memory or disk for all pages when it creates a process. This area of ​​space is called the swap area (the disk space reserved for the entire virtual address space of the process). At that time, it also creates a data structure to record where each virtual page is stored on the disk.

Speed ​​up address translation: TLB

Block table (TLB, cache used to record mapping information of recently used addresses, thereby avoiding having to access the page table every time)

Protection in virtual storage

The most important function of virtual storage is to allow multiple processes to share a main memory while providing memory protection for these processes and the operating system. The protection mechanism must be clear: multiple processes share the same main memory, but whether intentionally or unintentionally, a malicious process cannot write to the address space of another user process or the operating system.

Context switch: To allow a different process to use the processor, change the state inside the processor and save the state needed when the current process returns

Handling TLB failures and page faults

A TLB failure indicates one of two possibilities:

  • The page is in memory, only the missing TLB entry needs to be created
  • The page is not in memory, and control needs to be transferred to the operating system to handle page faults.

Handling TLB or page faults requires the use of an exception mechanism to terminate the active process, transfer control to the operating system, and then resume execution of the interrupted process.

Once the operating system knows the virtual address that caused the page fault, it must complete the following three steps:

Summarize

Virtual storage is a first-level storage hierarchy for data cache management between main memory and secondary storage. Virtual storage allows a single program to extend its address space beyond the limits of main memory. Virtual storage supports sharing memory between multiple simultaneously active processes in a protected manner

A general framework for storage hierarchies

where blocks can be placed

How to find blocks

In a storage hierarchy, the choice between direct mapping, set associative or fully associative mapping depends on the trade-off between the cost of failure and the cost of associativity implementation, including time and additional hardware overhead.

Which block to replace when cache failure occurs

  • Random: Randomly select candidate blocks, possibly using some hardware-assisted implementation
  • Least Recently Used (LRU): The block being replaced is the block that has not been used for the longest time

How to implement write operation:

  • Write-through: Information is written to blocks in the cache and blocks lower in the storage hierarchy (main memory for the cache)
  • Write return: Information is only written to the block in the cache. A modified block is written to lower levels in the hierarchy only if it is replaced

Advantages of writing returns:

  • The processor can write individual words at a rate that the cache, rather than the memory, can receive
  • Multiple writes within a block require only one write to a lower level in the storage hierarchy
  • When writing back a block, the system can effectively utilize high-bandwidth transmission by writing an entire block.

Advantages of write-through:

3C: An intuitive model for understanding storage hierarchies

Control a simple cache using finite state automata

Finite state automaton: a sequential logic function that contains a set of inputs/outputs, a state transition function (mapping the current state and inputs to a new state), and an output function (mapping the current state and inputs to a predicate output)

Next state function: A combinatorial function that, given an input and the current state, can derive the next state of a finite state automaton

Parallelism and storage hierarchies: cache consistency

Basic approach to achieving consistency

Guess you like

Origin blog.csdn.net/zaizai1007/article/details/132775168