draft document
This is a draft document and will undergo significant changes before the official release. Don't rely on its current content.
1 Scope
This document specifies the Open Media Alliance AV1 codec bitstream format and decoding process.
2. Terms and Definitions
AC coefficient
any transform coefficient whose frequency domain index is non-zero in at least one dimension
Altref
(Alternative reference frame) A frame that can be used in inter-frame coding
Base layer
Layers with both spatial_id and temporal_id equal to 0
Bitstream
bit sequence resulting from a sequence of encoded frames
Bit string
An ordered string with a limited number of bits, the leftmost bit is the MSB and the rightmost bit is the LSB
Block
A square or rectangular area of pixels consisting of one luminance and two chrominance matrices
Block scan
The order in which the quantization coefficients are specified
Byte
an 8-bit string
Byte alignment
A bit is an integer multiple of 8 compared to the beginning of the bit stream, then the bit is byte aligned
CDEF
(Constrained Directional Enhancement Filter) is designed to be an adaptive filter block based on the recognized direction
CDF
(Cumulative distribution function) represents the probability that the value of a symbol is less than or equal to a given level multiplied by 32768
Chroma
A sample value or a sample matrix of two color-difference signals, the chrominance symbols are U and V
Coded frame
Represents a frame before decoding
Component
One of the luma or two chroma matrices, or one of its sample values
Compound prediction
A type of inter prediction that computes sample values by blending predictions from two reference frames
DC-coefficient
A transform coefficient whose frequency domain index is zero in both dimensions
Decoded frame
Frames reconstructed in the bitstream by the decoder
Decoder
A concrete realization of the decoding process
Decoding process
The process of deriving decoded frames from syntax elements
Dequantization
The process of obtaining transform coefficients by scaling the quantized coefficients
Encoder
A concrete realization of the encoding process
Encoding process
Generate a bitstream that conforms to the description of this document, but the specific process is not specified in this specification.
Enhancement layer
Layers with spatial_id or temporal_id greater than 0
Flag
A binary variable that highlights syntax elements that can only be equal to 0 or equal to 1
Frame
The representation of a video signal in the spatial domain, consisting of a luminance matrix (Y) and two chrominance matrices (UV)
Frame context
a series of probabilities used in the decoding process
Frame buffer
An area to store decoded frames and related information
Golden frame
Frames that can be used in inter-frame coding, usually golden frames are encoded with higher quality and are used as a reference for multiple inter-frame frames
Inter coding
Encoding a block or a frame with inter-frame prediction
Inter frame
Intra-frame prediction or inter-frame prediction can be used by referencing a previously decoded frame compressed frame
Inter prediction
The process of deriving the predicted value of the current frame from the previous decoded frame
Intra coding
Encode a block or a frame with intra prediction
Intra frame
A frame that uses only intra-frame prediction and can be decoded independently
Intra prediction
In the same decoded frame, the process of deriving the predicted value of the current sample using the previously decoded sample value
Inverse transform
The process of converting a matrix of transform coefficients into a matrix of spatial sample values (to get residual coefficients)
Key frame
an intra frame, when it occurs resets the decoding process
Layer
A series of slice group OBUs with the same spatial_id and temporal_id
Level
A set of constraints defined on syntax elements and variable values
Loop filter
A filtering process for reconstructing frames, designed to reduce blockiness
Luma
A matrix of sample values, or a single sample value, representing the monochromatic signal associated with the dominant color, symbolized by Y
Mode info
During decoding, contains a block syntax element that indicates how the block was predicted
Mode info block
A block of luma samples of size 4x4 or larger, and its two corresponding blocks of chroma samples (if present)
Motion vector
A two-dimensional vector that references the current frame to the reference frame, whose value provides the coordinate offset from the current frame's position to the position in the reference frame
OBU
All syntax structures are packed in "Open Bitstream Units", each OBU has a header which provides identification information for the contained data (payload)
Parse
The process of obtaining syntax elements from a bitstream
Prediction
Implementation of prediction process, including intra prediction and inter prediction
Prediction process
The process of using a predictor to estimate decoded sample values or data elements
Prediction value
value, i.e. the combination of previously decoded sample values or data elements, used in the decoding process of the next sample value or data element
Profile
A subset of partial syntax, semantics, and algorithms
Quantization parameter
Variable used to scale quantization coefficients during decoding
Quantized coefficient
Transform coefficients before inverse quantization
Raster scan
(Raster Scan) Maps a 2D rectangular raster to a 1D raster, where the entry to the 1D raster starts at the first row of the 2D raster, then scans into the second and third rows, and so on. Each row is scanned from left to right
Reconstruction
The residual obtained by decoding is added to the corresponding predicted value
Reference frame
A previously decoded frame used in the inter prediction process
Reserved
A special syntax element value that can be used to extend this section in the future
Residual
The difference between the reconstructed value and the corresponding predicted value
Sample
The basic elements that make up a frame
Sample value
The value of the sample point, for 8-bit frames, this is an integer from 0 to 255; for 10-bit frames, this is an integer from 0 to 1023; for 12-bit frames, this is an integer from 0 to 4095 the integer
Segmentation map
A 3-bit number containing the segment relationship of each 4x4 block in the image, each reference buffer stores a segment map to allow new frames to use the previously encoded map
Sequence
The highest-level syntactic structure of an encoded bitstream, consisting of one or several consecutively encoded frames
Superblock
The highest level of the block quadtree in a tile. All superblocks in a frame are the same size and square. Superblocks can be 128x128 pixels or 64x64 pixels. A superblock can contain 1 or 2 mode information blocks. Or can be bisected in each direction, creating 4 sub-blocks which themselves can be further subdivided to form a block quadtree
Switch Frame
An inter-coded frame can be used as the point of sequence switching, and the switching frame will cover all frame buffers without forcing the use of intra-coding. The intent is to allow a streaming use case where video can be encoded into small chunks (say 1 second duration), each starting with a toggle frame. If the available bandwidth drops, the server can start sending chunks from the lower bitrate encoding, the decoded image after the switch may be a bit incorrect, but this approach allows a switch without the cost of a full keyframe
Syntax element
data elements represented in the bitstream
Temporal delimiter OBU
Indicates that the following OBUs have different display/decode timestamps relative to the last frame before the time separator
Temporal unit
It consists of a time separator OBU and all following OBUs, but not the next time separator
Temporal group
A set of frames whose temporal prediction structure is used periodically in a video sequence
Tile
A rectangular region of the frame that can be decoded and encoded independently, although loop filtering across tile edges still applies
Transform block
A square matrix of transform coefficients used as input to the inverse transform process
Transform coefficient
a scalar value in the frequency domain, contained in a transform block
Uncompressed header
High-level description of the decoded frame, encoding without arithmetic coding