理解cache-line&tuple-at-a-time&clock cycles

  • Overview

    Those who cannot remember the past are condemned to repeat it - George Santayana

  • Cache

    A Cache is a hardware or software component that stores data so that future requests for that data can be served faster.

  • CPU cache

    A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.

    When trying to read from or write to a location in the main memory, the processor checks whether the data from that location is already in the cache. If so, the processor will read from or write to the cache instead of the much slower main memory.

    Most modern desktop and server CPUs have at least three independent caches:

    • instruction cache to speed up executable instruction fetch
    • data cache to speed up data fetch and store
    • translation lookaside buffer(TLB) to speed up virual-to-physical address translation for both executable instructions and data
  • Cache line

    Data is transferred between memory and cache in blocks of fixed size, called cache lines or cache blocks.

    When a cache line is copied from memory into the cache, a cache entry is created.

    The cache entry will include the copied data as well as the requested memory location (called a tag).

  • Query Processing

    The vectorization model aims to increase the efficiency of the materialization model with a better use of the CPU caches.

    There are three ways for a DBMS to execute a query plan:

    • Tuple-at-a-time: Each operator calls next on their child to get the next tuple to process. Also known as the Volcano interator model;

    • Operator-at-a-time: Each operator materializes their entire output for their parent operator, it is ideal for in-memory OLTP engine;

    • Vector-at-a-time: Each operator calls next on their child to get the bext batch of data to process;

  • Volcano iterator model

    OceanBase: 数据库查询引擎的进化之路

    Volcano–An Extensible and Parallel Query Evaluation System

  • clock cycles

    A clock signal oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits.

    A clock cycle is a single electronic pulse of a CPU. During each cycle, a CPU can perform a basic operation such as fetching an instruction, accessing memory, or writing data.

    In physics, the frequency of a signal is determined by cycles per second, or “hertz”, similarly, the frequency of a processor is measured in clock cycles per second.

    The speed of a computer processor, or CPU, is determined by the Clock Cycle, which is the amount of time between two pulses of an oscillator.

  • References

  1. How do cache lines work?
  2. What is a “cache-friendly” code?
  3. The Elements of Cache Programming Style
  4. Notes on Cache Memory
  5. Why software developers should care about CPU caches
  6. Lecture #03: Query Compilation
  7. Data Processing on Modern Hardware : Assignment 2

猜你喜欢

转载自blog.csdn.net/The_Time_Runner/article/details/115333506
今日推荐