ISO/IEC 23001: Encryption standard for digital rights management (3)

Media data encryption

Insert image description here

Field semantics

This passage is about the semantics of sample groups and sample auxiliary information used in common encryption schemes. Among them, the following fields have the following meanings:

- isProtected is an identifier of the protection status of samples in a track or sample group. This flag can take on the following values:
- - 0x0: not protected;
- - 0x1: Protected (signaled by the scheme_type field in the scheme type box, e.g. for scheme_type "cenc", the track defaults to using the "cenc" scheme for AES-CTR encryption);
  - - 0x02-0xFF: reserved.
- Per_Sample_IV_Size is the number of bytes of the InitializationVector field. The following values are supported:
- - 0 if the isProtected flag is 0x0 (unprotected) or a constant IV is used;
- - 8 designated 64th place Initialization Vectors;
- - 16 designated 128th place Initialization Vectors.
- constant_IV_size is the number of bytes of the constant_IV field. The following values are supported:
- - 8 designated 64th place Initialization Vectors;
- - 16 designated 128th place Initialization Vectors.
- The KID is a key identifier that uniquely identifies the key required to decrypt the sample in question, such that the KID is sufficient to identify the separately stored license of the key used to encrypt the content . This allows multiple encryption keys to be identified within a file or track. Unprotected samples in a protected track should be identified by having an isProtected flag of 0x0, a Per_Sample_IV_Size of 0x0, and a KID value of 0x0. It is strongly recommended to use UUID[2] as the KID to meet uniqueness requirements across all applications.
- InitializationVector specifies the Initialization Vector (IV) required to decrypt the sample. For the isProtected flag to be 0x0, no Initialization Vectors are required and the auxiliary information should have size 0, i.e. not present. For isProtected flag is 0x1:
- - IVs should be provided using Per_Sample IVs or Constant IVs.
- - If the Per_Sample_IV_Size field is 16, the InitializationVector specifies the entire 128-bit IV value;
- - If the Per_Sample_IV_Size field is 8, its value is copied into bytes 0 to 7 of the Initialization Vector and bytes 8 to 15 of the Initialization Vector are set to zero.
- subsample_count specifies the number of Subsample encrypted entries present in this sample. If present, this field should be greater than 0.
- BytesOfClearData specifies the number of clear data bytes at the beginning of this Subsample encrypted entry. Note that this value may be zero if no clear bytes are present for this Subsample.
- BytesOfProtectedData specifies the number of bytes of protected data following the clear data. Note that this value may be zero if no protected bytes exist for this Subsample. Subsample encrypted entries shall not include entries that have zero in both the BytesOfClearData field and the BytesOfProtectedData field. The total length of all BytesOfClearData and BytesOfProtectedData in the sample should be equal to the length of the sample. Subsample encrypted entries should be represented as compactly as possible. For example, instead of using one entry of {32 clear, 500 protected}, use two entries of {15 clear, 0 protected}, {17 clear, 500 protected}. If mode-based encryption is used, the mode applies to the protected byte range BytesOfProtectedData; otherwise, all protected bytes are encrypted.
- crypt_byte_block should be zero unless mode-based encryption is enabled.
- skip_byte_block should be zero unless mode-based encryption is enabled.

initialization vector

The initialization vector (IV) value of each sample should be a constant IV located in the sample entry or sample group description, or should be signaled and located on a case-by-case basis in the sample side information of each protected sample. See the previous section for details on how initialization vectors are formed and stored. It is recommended that applications generate a random number in the first initialization vector of the sequence.

Among them, for each sample, you can choose to use an 8-byte or 16-byte IV. For 8-byte IVs, it is recommended to use a random starting value and increment up to 8 bytes in subsequent samples to ensure that each 16-byte IV and CTR counter value combination is unique. If the maximum 8-byte value is exceeded, the 8-byte IV can roll back from the maximum value (0xFFFFFFFFFFFFFFFF) to the minimum value (0x0).

For a 16-byte IV, you can optionally generate the IV for subsequent samples using the previous sample's encryption block counter plus the previous sample's IV. For CBC mode IVs, they can be generated randomly or sequentially and do not need to be unique for each sample or subsample. Storing a unique IV for each sample increases cryptographic entropy and provides random access and error recovery for each sample. CTR mode requires a unique counter value for each encrypted block that shares the key.

Some schemes use a constant IV as a default value or an IV that maps to a group of samples. For segmented files, the constant IV usually requires a sample group box and a sample group description box containing the sample group IV.

AES-CTR mode counter operation

This encryption method uses the Advanced Encryption Standard (AES) published by the National Institute of Standards and Technology (NIST), using a 128-bit key for counter mode (AES-CTR) encryption. AES-128 CTR mode is a 16-byte block cipher that can encrypt byte streams of any size without padding or leaving significant remainders. Counter mode works by encrypting a counter block using the AES block encryption algorithm, using the key specified by the KID, and then XORing the result with the data to be encrypted or decrypted. The CTR mode counter block should be constructed from the IV of each sample and incremented as described below and in 9.2.

When a Per_Sample_IV_Size of 8 bytes is specified, the least significant 8 bytes (bytes 8 to 15) of the 16-byte IV should be set to zero and used as a 64-bit block counter for each 16-byte ciphertext block encrypted. The counter will be incremented by 1. When specifying a Per_Sample_IV_Size of 16 bytes, incrementing it should reset the 8-byte block counter to zero (bytes 8 to 15) when the least significant 8 bytes (64-bit counter) reaches the maximum value (0xFFFFFFFFFFFFFFFF), Without affecting the other 8 bytes of the counter (bytes 0 to 7).

Within each sample, the encrypted data should be a sequence of bytes in a logically contiguous 16-byte block, regardless of the physically interleaved plaintext data identified by subsample encryption or pattern encryption. Only the last ciphertext block in the sample can be a partial ciphertext block (less than 16 bytes). The counter should be incremented once after each encrypted ciphertext block and restarted on the next sample using the InitializationVector stored in the sample side information.

CENSUS && CENSUS

CENC (Content Encryption and Decryption) and CENS (Content Encryption but Not Storage) are two different encryption algorithms.

CENC is an encryption algorithm used to protect media content. It encrypts media content for storage and decrypts it during playback. CENC uses a mechanism called a "key container", which contains multiple keys, each key used to encrypt a different piece of media. This approach allows media content to be played on different devices because the device can select the appropriate key for decryption based on its capabilities.

CENS is an encryption algorithm used to protect data. It encrypts data and stores it, but does not provide decryption functionality. Instead, the decryption key is provided by an external system or service and transmitted to the decryption party over a secure channel when needed. CENS is often used to protect sensitive data, such as user passwords, credit card information, etc., to prevent data leakage.

Full sample encryption

1-General

All encrypted media types can use full sample encryption except NAL structured video, which must use subsample encryption.

2-Use AES-CTR mode for full sample encryption

AES-CTR mode encryption should use an IV unique to each sample and encrypt all bytes in the sample.

In AES CTR (Counter) mode, iv (Initialization Vector) is a fixed-length random number used to initialize the counter of the encryption algorithm. The length of the iv is usually the same as the encryption algorithm's block size.
The role of iv is to ensure that the result of each encryption is unique, even if the same plaintext will get different ciphertext. This is because iv will perform an XOR operation with the counter, thereby changing the initial value of the counter. The purpose of this is to prevent repeated patterns in the plaintext from being exposed in the ciphertext.
When using AES CTR mode for encryption, the iv needs to be kept secret, but it does not need to be kept secret as strictly as the key. The IV can be transmitted along with the ciphertext, just make sure the IV used for each encryption is unique. When decrypting, the receiver needs to use the same iv to initialize the counter in order to correctly decrypt the ciphertext.
It should be noted that the IV cannot be reused during use, otherwise the security of the ciphertext will be threatened. Therefore, the iv needs to be randomly generated on each encryption and cannot be reused under the same key.
The IV is the value used to initialize the counter, not the value of the counter itself. The counter is an incrementing value that is automatically incremented each time a block is encrypted. Therefore, the IV only needs to be used when encrypting the first block, and subsequent blocks will use the incremented counter value as the IV.

picture

aes-ctr mode is a block cipher that can encrypt complete samples, but the size is not the 16-byte complete sample. The encrypted block is used as a demonstration of how the underlying block encrypts the sample. Block 7 is less than 16 bytes, to demonstrate that ctr mode can encrypt partially encrypted blocks, that is, encrypted blocks less than 16 bytes. Each sample starts with a unique iv.

3-Full sample encryption using AES-CBC mode

The full AES-CBC mode should be encrypted in Cipher Block Chaining mode (AES-CBC-128) using a 128-bit key using the Advanced Encryption Standard specified in AES [FIPS197], as in Block Cipher Mode [NIST 800-38A] specified.

Each sample should be encrypted using the IV defined in the sample side information and stored in the sample side information. Encrypted NAL structured video tracks shall be protected using subsamples defined later.

All other types of encrypted tracks should be encrypted using complete whole block samples.

Each sample shall be encrypted as a continuous chain of cipher blocks, starting with an initialization vector (IV), which may be specified per sample by sample auxiliary information, or may be specified by the sample group and sample group description as common to multiple samples. constant.

According to ISO/IEC 23001, AES-CBC mode requires that all encrypted cipher blocks are 16 bytes , and the scheme defined in this part results in some blocks being unencrypted. To avoid adding padding that would change the file size, CBC mode does not encrypt blocks smaller than 16 bytes . The 7th block in the example is shown as less than 16 bytes to illustrate this point. The initialization vector (IV) of each sample is applied when using 'cbc1' full sample encryption.

Subsample encryption

1-Definition (Normative)

Subsampling encryption divides each sample into one or more consecutive subsamples. Each subsample consists of an unprotected part and a protected part, only one of which can be zero bytes in length (usually both are non-zero ).

The total length of all subsamples should be equal to the size of the sample itself, and they should not overlap (determined by BytesOfClearData + BytesOfProtectedData of all subsamples composing the sample
).

Except for the "cbcs" scheme, a sample's protected byte sequence should be treated as a logically contiguous chain of 16-byte cipher blocks, even if they are separated by a subsampled BytesOfClearData or skip_byte_block.
In the "cbcs" scheme, each subsample should be treated as a separate chain of cipher blocks, starting with the sample-associated initialization vector. The CTR mode counter should be incremented after each complete encrypted cipher block, ignoring subsampling boundaries.

In the "cbc1" scheme, CBC mode cipher block chaining should occur continuously after applying the IV to the first cipher block in the sample. When using CTR mode, all cipher blocks should be 16 bytes except the last cipher block which may not be 16 bytes. When the protected data range terminated by the subsample, a partial CTR cipher block can be encrypted as the last block of the sample.

For the "cenc" and "cens" protection schemes, the number of bytes of protected data should be adjusted to a multiple of 16 bytes to avoid partial chunks at the end of the subsampling . Application specifications may prohibit partial CTR cipher blocks and require end alignment of subsampling blocks to reduce decryption complexity.

For the "cbc1" protection scheme, the size of the protected data should be adjusted to a multiple of 16 bytes to avoid partial chunks at the end of the subsample .

For the "cbcs" protection scheme, the partial block at the end of the subsample should remain unencrypted .

In the "cbcs" scheme, CBC mode cipher block chaining should be done continuously in each subsample , and the IV is applied to the first encrypted cipher block of each subsample . Application specifications may require that protected data starts from the first full byte of video slice data so that sizes multiples of 16 bytes may not be feasible to avoid partial chunks in subsampling.

Figure 3 is an example of subsampling encryption, showing two samples, each sample contains two subsamples, each subsample has a per-sample initialization vector and a logically consecutive sequence of 16-byte ciphertext blocks interspersed with Unencrypted byte range. It is possible that block 2 of sample 1 continues in the second subsample, but the scheme "cenc" is not recommended, and also do not use "cens", "cbc1" and "cbcs".

The encryption block of a subsample is associated with the encryption block of the previous subsample and, together with the counting area, is continuously encrypted.

All encrypted blocks in the first sample are 16 bytes, except the last encrypted block, which may be smaller than 16 bytes.

To illustrate that in CTR mode, the encryption block can be smaller than 16 bytes without changing the file size, we encrypt an encryption block smaller than 16 bytes together with a counting area.

If CBC mode is used, an encrypted block smaller than 16 bytes is encrypted and associated with the previous encrypted block.

Subsample encryption of 2-nal structured video tracks

2.1 Structure of nal video samples and use of subsamples (informative)

The Network Abstraction Layer (NAL) Structured Video specification defines NAL unit syntax elements that can be sequentially composed into elementary streams and access units that can be decoded into images. ISO/IEC 14496-15 specifies how NAL structured video is stored in an ISO base media file and how each access unit is stored as a sample in a track. Each sample consists of multiple NAL units, each NAL unit is separated by a length field indicating the length of the NAL unit. Each NAL unit contains a NAL type header, and video NAL contains a slice header.

Secure video processors typically do not provide decrypted video stream data to applications to protect the decrypted video, so display applications that need to access the information stored in the video clip header or SEI NAL unit (such as subtitles and framing information) Programs will not be able to access protected data. To protect the video encryption key, the audio track should not be encrypted with the same key, as audio generally does not have the same level of key protection as video.

Some video clip data may remain unencrypted in order to align encrypted bytes or eliminate the need for partial cipher block decryption in the device. Since NAL structured videos are usually compressed by spatial and temporal prediction, and the resulting entropy encoding (e.g. CABAC), missing some samples will still make reconstructing the image almost impossible, and the prediction of the image from the source will also be affected.

Protection coverage may not be sufficient to encrypt all video data. The beginning of the protection range may leave some video unencrypted to enable concatenation of bytes or 16-byte blocks of the protection range. Protection scopes can also be partially encrypted using the 'cens' and 'cbcs' schemes, which apply encryption and decryption modes within the protection scope.

Not all decoders are designed to decode ISO media format streams (such as the "avc1" sample entry format) that contain NAL-sized headers and are missing decoding parameter NALs (such as sequence parameter set (SPS) and picture parameter set (PPS) NALs).

Some decoders are designed to decode video elementary streams in the ISO/IEC 14496-10:2014 Appendix B byte stream format, which contain NAL units separated by a start code and followed by each access point in the stream SPS/PPS parameter NAL (common encrypted ISO media elementary streams may require reformatting into byte stream format for decoding.

It may also be necessary to reformat common encrypted elementary streams to group NAL units using network protocols like RTP or to repackage common encrypted elementary streams between ISO media and MPEG-2 transport stream containers. Leaving non-video NAL units and all NAL size and type headers unencrypted allows the elementary stream to be reformatted without decryption. Full sample encryption prevents reformatting of the video stream and information access before decrypting the sample.

However, if the NAL header and the full NAL other than the video type are left unencrypted, the application can do so by replacing the unencrypted NAL size header with a start code that matches the NAL type indicated in the NAL type header, and in each A PPS/SPS NAL unit is inserted after the access unit delimiter NAL to convert ISO media video samples (e.g. "avc1", "avc3", "hev1", etc.) to ISO/IEC 14496-10:2014 Appendix B byte stream.

Since NAL start codes are always unencrypted in common encryption, any start codes in encrypted data are invalid and can be ignored by the processor. The ISO base media file parser ignores all start codes. Pre-encryption and post-decryption NAL include emulation prevention that complies with the NAL Structured Video specification so decoders can reliably detect start codes.

Common Encryption specifies subsampling encryption of NAL structured video, encrypting only the video data and leaving other NAL types, all NAL size and type headers, and the video slice header unencrypted. Encryptors should be aware of the NAL structure, but decryptors may be video format agnostic and simply decrypt the byte range indicated by the subsampling information stored in the sample side information.

Encrypting only the video slice data allows applications to access the information in the SEI NAL as well as the image information in the video slice header. Accessing the video NAL slice header information may be critical for demonstration applications to manage image buffers, layers, tiles, parallel slice decoding, etc., by reading the slice header information before secure video decryption.

2.2 Subsample encryption applied to nal structured video (specification)

NAL structured video samples must be completely covered by one or more consecutive subsamples. Slice data in a video NAL can be overwritten by multiple subsamples to create multiple clear and protected ranges, or by overwriting protected slice data that is larger than the maximum size of a single BytesOfProtectedData field, with each subsample having a BytesOfClearData size of zero.

Multiple unprotected NALs should be covered by a single subsample clear range, but large clear ranges can be covered by multiple subsamples with a BytesOfProtectedData size of zero. For AVC video using the "avc1" sample description stream format, the NAL lengthSizeMinusOne field and nal_unit_type field (the first byte after the length) of each NAL unit must be unencrypted, and only the video data in the slice NAL should be encrypted.

Note 1: In the first edition of ISO/IEC 23001, encrypted slice headers were not prohibited, but were prohibited by the application specification. For 'avc1', the 'SHOULD' requirement to leave slice headers unencrypted allows possible legacy content with encrypted slice headers to conform to this new version.

However, new content should not encrypt the slice header, otherwise it may not be decoded correctly in the secure video decoder. Note 2: The size of the length field is variable length. It can be 1, 2, or 4 bytes long and is specified as the lengthSizeMinusOne field in the sample entry of the AVCDecoderConfigurationRecord. For other NAL structured video sample description stream formats (e.g. "avc3", "hvc1", "hev1", etc.), only the video slice data should be protected.

For the avoidance of doubt: Video NAL slice, size and type headers must be unencrypted, and other NAL types must be unencrypted. There may be multiple subsamples per NAL, and there may be multiple NALs per subsample, such as when multiple unencrypted NALs are contained within a clear byte range for efficient representation.

Partial video encryption can be achieved using multiple subsamples per video NAL indicating multiple clear and protected byte ranges per video slice; however, mode encryption should be used (e.g. using "cens" and "cbcs" ” scheme) to represent partial encryption more efficiently.

2.3 Subsample encryption of AES-CTR mode applied to video nal

Figure 6 details the IVs used, areas for clear data, areas for protected data, and AL unit and sample boundaries. This diagram applies to the "cenc" and "cens" protection schemes.

AES-CTR mode is a block cipher that encrypts partial cipher blocks. Cipher blocks are used to illustrate the chain of cipher blocks between each sample. The last cipher block in the example (Block 6) is less than 16 bytes in both sample 1 and sample 2 to illustrate that CTR mode allows partial cipher blocks to be encrypted.

Also, note that cipher block 2 of sample 1 continues in the next subsample, forming a 16-byte cipher block and a counter value. This example shows subsamples matching the size of each video NAL unit, but this is not a general constraint of ISO/IEC 23001.

Protection scheme "cens" may apply a mode of encrypted and plaintext cipher blocks within the "encrypted data" scope.

2.4 Subsample encryption using "cbc1" AES-CBC mode applied to video nal

NOTE The AES-CBC mode "cbc1" scheme starts each sample with a sample IV and then forms a 16-byte cipher block, regardless of the cross-subsample bytes OfClearData. The clear data is sized appropriately so that the last chunks in each subsample are 16 bytes (chunks 2 and 6 in this example).

2.5 Subsample encryption using "cbcs" AES-CBC applied to video nal

"cbcs" scheme in AES-CBC mode. In this scheme, each subsample (a piece of data that needs to be encrypted) starts with a fixed initialization vector (IV), and then the complete 16-byte ciphertext block is encrypted, leaving anything less than 16 bytes at the end. Unencrypted.

Protected mode consists of a sequence of encrypted ciphertext blocks and skipped plaintext blocks, terminating at the end of the BytesOfProtectedData range. If the last block in the range is incomplete, it is not encrypted.

AES-CBC is a symmetric encryption algorithm, where CBC stands for Cipher Block Chaining. In CBC mode, each plaintext block is XORed with the previous ciphertext block before being encrypted. This linking method can increase the randomness and security of the password.
CBCS (Cipher Block Chaining with Segmenting) and CBC1 (Cipher Block Chaining with One-bit Ciphertext Stealing) are improvements and extensions to the CBC mode.

CBCS: CBCS is an improvement on the CBC mode, which introduces the concept of segmented encryption. In CBCS, the plaintext is divided into multiple segments, and each segment is XORed with the ciphertext block of the previous segment before being encrypted. This improves encryption efficiency and security.

CBC1: CBC1 is an extension to CBC mode, which solves the problem when the length of the last plaintext block is less than one block size. In CBC1, the last plaintext block is padded to a full block size and then encrypted. The encrypted ciphertext block is XORed with the previous ciphertext block, and then part of the last plaintext block is intercepted as the final ciphertext block. This method ensures the integrity and security of the last plaintext block.

In summary, CBCS is an improvement on the CBC mode and introduces the concept of segmented encryption; while CBC1 is an extension of the CBC mode and solves the problem when the length of the last plaintext block is less than one block size. These improvements and extensions are designed to improve the efficiency and security of encryption algorithms.

CBC and CBCS are both modes in encryption algorithms, where CBC is the abbreviation of Cipher Block Chaining and CBCS is the abbreviation of Cipher Block Chaining with Cipher Text Stealing.
The main difference is the way they encrypt and decrypt. In CBC mode, each plaintext block is XORed with the previous ciphertext block before being encrypted. In CBCS mode, the last plaintext block may not be a complete block, so it needs to be processed using a technique called "ciphertext snatching".
Specifically, in CBCS mode, the last plaintext block is divided into two parts, one part is XORed with the previous ciphertext block and then encrypted, and the other part is retained. During decryption, the last ciphertext block needs to be decrypted, and then XORed with the remaining plaintext block to obtain the complete plaintext block.
Therefore, CBCS mode is more suitable for processing incomplete plaintext blocks than CBC mode.

CBC1 (Cipher Block Chaining 1) mode:
CBC1 mode is an improvement on CBC mode and aims to solve the parallelism problem of CBC mode.
CBC1 mode divides the plaintext into fixed-length blocks and XORs the previous ciphertext block with the current plaintext block before encrypting. Different from CBC mode, CBC1 mode uses two initialization vectors (IV), one as the input of the first ciphertext block and the other as the input of the second ciphertext block.
Encryption and decryption in CBC1 mode can be performed in parallel because the encryption of each ciphertext block only depends on the encryption results of the previous two plaintext blocks. This improves the efficiency of encryption and decryption.

mode encryption

1-Definition

Pattern encryption utilizes a pattern of encrypted and plaintext ("skipped") 16-byte blocks covering a protected range of subsamples. Note that subsampling is only used to protect the video slice data, leaving the NAL size, NAL type, video slice header, and other NAL types as clear text. Pattern encryption is applied when the fields default_crypt_byte_block and default_skip_byte_block in version 1's Track Encryption Box ('tenc') are non-zero.

The pattern shall include the number of encrypted cipher blocks indicated by the field default_crypt_byte_block or crypt_byte_block (if present in the sample group description), followed by the number of unencrypted sample data blocks indicated by the field default_skip_byte_block or skip_byte_block (if present in the sample group description).

If the last block pattern in the subsample is incomplete, partial patterns should be followed until truncated by the BytesOfProtectedData size, and any portion of the crypt_byte_block should be left unencrypted.

When using AES-CTR mode, the IV is applied to the first encrypted cipher block of each sample. When using AES-CBC mode, the IV is applied to the first encrypted cipher block of each subsample.

2- Example of pattern encryption applied to video nal unit

Pattern encryption is a pattern of encrypted cipher blocks represented by vertical black and white lines, followed by plaintext blocks. This pattern spans the subsampling protection range specified by BytesOfProtectedData and roughly spans the video data after the title.

Byte or block alignment may require that the starting position of BytesOfProtectedData is not at the first bit of the slice data, but at some bits or bytes thereafter.

In the "cens" scheme, multiple subsamples can be mapped to a single NAL, and multiple plaintext NALs can be mapped to a single subsample, but in the "cbcs" scheme, each VCL NAL may require a separate subsample.

Full-block, full-sample encryption

In whole-block full-sample encryption, the entire sample is protected. Each sample is encrypted starting at offset 0 (with no unencrypted leading), encrypted all the way to the last 16-byte boundary, leaving any trailing 0-15 bytes unencrypted. The initialization vector (IV) is reset in every sample.