Encapsulation format MP4

  Abstract: The MP4 format is a commonly used video packaging format. This article mainly describes the specific description of the mp4 format and related analysis tools.
  Keyword: MP4

1 Introduction to MP4 format

  ISO/IEC 14496-12:2004:Information technology — Coding of audio-visual objects — Part 12:ISO base media file format
  ISO/IEC 14496-14:2003: Information technology — Coding of audio-visual objects — Part 14:MP4 file format

  The MP4 file format is ISO/IEC 14496-14:2003, which is developed from the QuickTime file format and is a specific example of ISO/IEC 14496-12:2004 (ISO Base Media File format). There are two versions of MP4 implementation: the first version is ISO/IEC 14496-1:2001 (MPEG-4 Part 1 (Systems), First edition), released in 2001; the second version is ISO/IEC 14496-14 :2003 (MPEG-4 Part 14 (MP4 file format), Second edition) was released in 2003 to improve the former.

  The MP4 file format is a standard digital multimedia container format, mainly used to store digital audio and digital video data, supports a variety of audio encoding data, video encoding data and other additional data that needs to be embedded, and can also store still images and subtitles. MP4 files can contain metadata as defined by the format standard, and can also contain Extensible Metadata Platform (XMP) metadata. In addition, the extension of the MP4 file is generally .mp4, .m4a, .m4p, .m4b, .m4r, .m4vwhich is also the extension of the MP4 format, but different extensions are used in different scenarios (for example, m4ait is usually used for files that only contain audio streams).

  It should be noted that the following file format should be strictly speaking the description of ISO Base Media File Format, and MP4 is just one of the implementations. For example, 3GP, mov and other formats of ISO Base Media File Format are implemented in the way of storing data in boxes, while MP4 has two versions of mp41 and mp42.

2 MP4 file format basic unit

  All data in the MP4 file format is described by Box (or Atom), and Box can be nested. Box is described by Box Header and Box Body. Box Header describes the size type of Box and some descriptors (network byte order, big endian). Box Body is the data contained in the current Box. The pseudocode defined by Box in the standard is as follows:

aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) {
    
    
    unsigned int(32) size;
    unsigned int(32) type = boxtype;
    if (size==1) {
    
    
        unsigned int(64) largesize;
    } else if (size==0) {
    
    
        // box extends to end of file
    }
    if (boxtype==‘uuid’) {
    
    
        unsigned int(8)[16] usertype = extended_type;
    }
} 
  • size: 32bit describes the size of the Box, size=sizeof(Box Header) + sizeof(Box Body). There are different extension methods for different situations:
    • size==1: The real size is 64bit largesize:
    • size==0: Indicates that the current Box is the last Box of the file, and the Box Body is the remaining content of the file;
  • type: Describe the type of the current Box (if an unrecognized Box is found in MP4, it will be ignored), the standard format is 4 characters, for example, moovif it is a user-defined type, it is fixed uuid, and the type is usertypedescribed by .

  Another Box extension—FullBox, adds the Box version and flag description on the basis of the original Box:

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) {
    
    
    unsigned int(8) version = v;
    bit(24) flags = f;
} 
  • version: Describe the version of the current Box;
  • flags: As the name suggests, the identifier describes the current Box.

  Registered box types can be viewed at mp4ra.org. In addition, you can use MP4Info , mp4explorer and Online MP4 file parser to parse MP4 Box.

3 Some important boxes

  ISO media files index and store data through Box, and the data and index are stored separately. From the figure below, you can see that trak stores index information, and the specific data is stored in mdat.
  The Box described below is based on the basic format of the Box and describes the detailed content of the Box Body.
insert image description here
insert image description here
insert image description here

3.1 ftype

//类型:ftype
//容器:当前文件
//强制:必需
//数量:1
aligned(8) class FileTypeBox extends Box(‘ftyp’) {
    
    
    unsigned int(32) major_brand;
    unsigned int(32) minor_version;
    unsigned int(32) compatible_brands[]; // to end of the box
} 

  ftyp(File Type Box) as early as possible in the file to help the decapsulator identify the file type and compatible version. ftypDescribes the major specification of the file, and the decapsulator tries to use the corresponding specification to demux the file, while the minor version is for reference only and should not be used to determine whether the file conforms to the standard. It may allow more precise identification of major specifications for inspection, debugging, or improved decoding.

  • major_brand: The main compatible format of the current file, it is better for the decapsulator to use the modified specification to demultiplex the file;
  • minor_version: minor version;
  • compatible_brands: A list of specifications that the current file is compatible with.

  ftypThe size of the Box is 32 bytes, the main specification is isom, the minor version number is 512, and the compatible specification is isom,ios2,avc1,mp41. for example:

[ftyp] size=8+24
  major_brand = isom
  minor_version = 200
  compatible_brand = isom iso2 avc1 mp41

3.2 rice

//类型:moov
//容器:当前文件
//强制:必需
//数量:1
aligned(8) class MovieBox extends Box(‘moov’){
    
    
}

  moov(Movie Box) stores the metadata of the media file and is a container box, that is, the specific information is described by the sub-Box in the Box. In general, the position in the file is close to the beginning or end of the file, and in most cases it is ftypbehind.

3.3 mdat

//类型:mdat
//容器:当前文件
//强制:非必需
//数量:任何数量
aligned(8) class MediaDataBox extends Box(‘mdat’) {
    
     
    bit(8) data[];
}

  mdat(Media Data Box) contains the actual encoded media stream data, and its specific media data type is described by other Box indexes, which means that the media data can be mdatparsed normally even without the Box Header.

3.4 mvhd

//类型:mvhd
//容器:moov
//强制:必需
//数量:1
aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
    
     
    if (version==1) {
    
     
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) timescale;
        unsigned int(64) duration;
    } else {
    
     // version==0 
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) timescale;
        unsigned int(32) duration;
    }

    template int(32) rate = 0x00010000; // typically 1.0 
    template int(16) volume = 0x0100; // typically, full volume 
    const bit(16) reserved = 0; 
    const unsigned int(32)[2] reserved = 0; 
    template int(32)[9] matrix = {
    
     0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // Unity matrix
    bit(32)[6] pre_defined = 0; 
    unsigned int(32) next_track_ID; 
}

  mvhd(Movie Header Box) is a FullBox that stores meta information describing media file information, and content that is not related to specific media data.

  • version: version, 0 or 1
  • creation_time: A 64-bit int representing the creation time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • modification_time: A 64-bit int representing the modification time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • timescale: The time scale of 1s, for example, 60 means that the time interval is 1 60 \frac{1}{60}601s;
  • duration: Indicates the duration of the current media, derived from the track information in the file, equal to the duration of the longest track;
  • rate: Recommended playback rate, a 32-bit integer, the upper 16 bits and lower 16 bits represent the integer part and the decimal part respectively, and 1.0 means the normal playback rate;
  • volume: Recommended volume, 16-bit integer, the upper 8 bits and lower 8 bits represent the integer part and the decimal part respectively, and 1.0 means the normal playback volume;
  • matrix: The transformation matrix of the video, the default is an identity matrix;
  • next_track_ID: Indicates a value for the track ID of the next track that will be added to this presentation. Zero is not a valid track ID value. next_track_IDThe value of should be greater than the maximum being used track-ID. If the value is equal to or greater than 0xffff(maximum 32 bits), and a new media track is to be added, the file must be searched for an unused track identifier. for example:
  [mvhd] size=12+96
    version = 0
    flags = 0 
    creation_time = 2022-11-06T03:40:48.000Z
    modification_time = 2022-11-06T03:40:48.000Z
    timescale = 600
    duration = 63250
    rate = 1
    volume = 1
    next_track_ID = 3

3.5 trak

//类型:trak
//容器:moov
//强制:必需
//数量:大于等于1
aligned(8) class TrackBox extends Box(‘trak’) {
    
    
}

  trak(Track) is a container Box, which itself does not contain any content, and its meaning is defined by the internal Box. A media file may contain multiple tracks. Each track is independent and has its own time and space information. For example, audio and video can be stored separately in two tracks, and a single video stream can also be stored in multiple tracks. .

3.6 tkhd

//类型:tkhd
//容器:trak
//强制:必需
//数量:1
aligned(8) class TrackHeaderBox extends FullBox(‘tkhd’, version, flags){
    
     
    if (version==1) {
    
     
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) track_ID;
        const unsigned int(32) reserved = 0; 
        unsigned int(64) duration;
    } else {
    
     // version==0 
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) track_ID;
        const unsigned int(32) reserved = 0; 
        unsigned int(32) duration; 
    }

    const unsigned int(32)[2] reserved = 0; 
    template int(16) layer = 0; 
    template int(16) alternate_group = 0; 
    template int(16) volume = {
    
    if track_is_audio 0x0100 else 0}; 
    const unsigned int(16) reserved = 0; 
    template int(32)[9] matrix= {
    
     0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // unity matrix
    unsigned int(32) width; 
    unsigned int(32) height;
}

  tkhd(Track Header) is used to store the description information of the current Track, and there is only one Track tkhd. The default value of track header flags for media tracks is 7 (track_enabled, track_in_movie, track_in_preview). If neither track_in_movie nor track_in_preview is set on all tracks in the demo, all tracks shall be treated as having both flags set on all tracks. Hint tracks should have track header flags set to 0 so they are ignored for local playback and preview.

  • version: 0 or 1 indicates the version;
  • flag: 24bit flag:
    • Track_enabled(0x000001): Indicates that the current Track is available, if disabled it should be set to 0;
    • Track_in_movie(0x000002): Indicates that the current Track is used for playback;
    • Track_in_preview(0x000004): Indicates that the current Track is used for preview;
  • creation_time: A 64-bit int representing the creation time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • modification_time: A 64-bit int representing the modification time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • track_ID: The ID of the current Track, which is the only indication that there should not be two Tracks with the same Track_ID in a file;
  • reserved: Reserved;
  • duration: The duration of the current track, if there is an edit list of the track, the value is the sum of the duration of all edit lists; otherwise, it is the sum of the sample duration. If it cannot be confirmed, the time is set to 0xfffffff. In addition, the time needs to be calculated according mvhdto time_scale;
  • layer: The stacking order of video tracks, the smaller the number, the closer to the viewer, for example, 1 is higher than 2, and 0 is higher than 1;
  • alternate_group: The group ID of the current Track, 0 means that the current track is not in the same group as any other track, and only one Track in a group can be played at any time.
  • volume: Recommended volume, 16-bit integer, the upper 8 bits and lower 8 bits represent the integer part and the decimal part respectively, and 1.0 means the normal playback volume;
  • matrix: The transformation matrix of the video, the default is an identity matrix;
  • width: A floating-point number representing the video width of the current track (the upper 16 bits are the integer part, and the lower 16 bits are the fractional part), and the image will be scaled to the current size before operating on the track;
  • height: A floating-point number representing the video height of the current track (the upper 16 bits are the integer part, and the lower 16 bits are the fractional part), and the image will be scaled to the current size before operating on the track.
//这个示例视频有两个track一个视频流一个是音频流
[tkhd] size=12+80, flags=3
    id = 1
    duration = 63250
    width = 640.000000
    height = 360.000000

[tkhd] size=12+80, flags=1
    id = 2
    duration = 63223
    width = 0.000000
    height = 0.000000

3.7 edts

//类型:edts
//容器:trak
//强制:非必需
//数量:0或1
aligned(8) class EditBox extends Box(‘edts’) {
    
    
} 

  edts(Edit Box) is a container Box. Describes the mapping relationship when playing media streams. If it is empty, it will be played one by one according to the time in the track. If it is set, it will be played according to the elsttime in the track.

3.8 oldest

//类型:elst
//容器:edts
//强制:非必需
//数量:0或1
aligned(8) class EditListBox extends FullBox(‘elst’, version, 0) {
    
    
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) {
    
    
        if (version==1) {
    
    
            unsigned int(64) segment_duration;
            int(64) media_time;
        } else {
    
     // version==0
            unsigned int(32) segment_duration;
            int(32) media_time;
        }
        
        int(16) media_rate_integer;
        int(16) media_rate_fraction = 0;
    }
} 

  elst(Edit List) is an array, and each item in the array stores certain display rules, such as start time, duration, and speed. Adding that we hope that the video starts 10s and the screen does not understand, and then the playback starts from 0s for 30s, which elstshould be

Entry-count = 2
Segment-duration = 10 seconds Media-Time = -1 Media-Rate = 1
Segment-duration = 30 seconds Media-Time =  0 Media-Rate = 1
  • version: version information, 0 or 1;
  • entry_point: the number of items in the current list;
  • segment_duration: The duration of the current editing item expressed mvhdin ;time_scale
  • media_time: The start time in media containing this edit mdhdsegment time_scale. Empty edit if this field is set to –1. The last edit in a track is never an empty edit;
  • media_rate: If the rate of the edit segment is 0, the screen will stop. The screen will media_timestop segment_durationtime at the point. Otherwise this value is always 1.

media_rate_integerIt should be noted that although there are and   in the above fields media_rate_fraction, it is only described in the standard media_time. media_timeBy comparing the binary bits, I think it is in the standard media_rate_integer.
media_rate specifies the relative rate at which to play the media corresponding to this edit segment. If this value is 0, then the edit is specifying a 'dwell': the media at media-time is presented for the segment-duration. Otherwise this field shall contain the value 1.

3.9 average

//类型:mdia
//容器:trak
//强制:必需
//数量:1
aligned(8) class MediaBox extends Box(‘mdia’) {
    
    
} 

  Media Box is a container Box that stores the media information of the current track.

3.10 billion

//类型:mdhd
//容器:mdia
//强制:必需
//数量:1
aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) {
    
     
    if (version==1) {
    
     
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) timescale;
        unsigned int(64) duration;
    } else {
    
     // version==0 
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) timescale;
        unsigned int(32) duration;
    }
    bit(1) pad = 0; 
    unsigned int(5)[3] language; // ISO-639-2/T language code 
    unsigned int(16) pre_defined = 0;
}

  mdhd(Media Header Box) stores metadata such as the duration of the media.

  • version: version number 0 or 1;
  • creation_time: A 64/32-bit int representing the modification time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • modification_time: A 64/32-bit int representing the modification time (in seconds since midnight, Jan. 1, 1904, in UTC time);
  • timescale: readable contained in the current media stream 1s, that is, a scale represents 1 timescale \frac{1}{time_scale}timescale1s;
  • duration: timescalethe duration expressed in;
  • ```language``: The language code of the media of the current track. See ISO 639-2/T for the three-character code set. Each character is packed as the difference between its ASCII value and 0x60. Since codes are limited to three lowercase letters, these values ​​are strictly positive numbers.

3.11 hdlr

//类型:mdia
//容器:mdia或者meta
//强制:必需
//数量:1
aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) {
    
     
    unsigned int(32) pre_defined = 0; 
    unsigned int(32) handler_type;
    const unsigned int(32)[3] reserved = 0; 
    string name;
}

  hdlr(Handler Reference) declares the process of rendering the media data in the track, and therefore also declares the nature of the media in the track. For example, a video track would be handled by a video handler. When present within a Meta Box, this box declares the structure or format of the "Meta Box" content.

  • version: the version of the current box;
  • handler_type
    • When hdlrpresent mdia, it is four characters indicating the type of the current track:, and the possible values vide,soun,hint​​respectively indicate video, audio and hint Track;
    • When in metaa Box, it indicates the content format of the current Box;
  • name: A UTF-8 string ending \0with , which describes the track.

3.12 min

//类型:minf
//容器:mdia
//强制:必需
//数量:1
aligned(8) class MediaInformationBox extends Box(‘minf’) {
    
    
}

  minf(Media Information) is a container Box that contains information such as the index of Track data.

3.13 vmhd,smhd,hmhd,nmhd

//类型:vmhd,smhd,hmhd,nmhd
//容器:minf
//强制:必需
//数量:1

  vmhd,smhd,hmhd,nmhdThese four Boxes all store the description information of the current media data. Just use different boxes for different types of data, such as video and vmhdaudio smhd.

vmhd

aligned(8) class VideoMediaHeaderBox extends FullBox(‘vmhd’, version = 0, 1) {
    
     
    template unsigned int(16) graphicsmode = 0; // copy, see below 
    template unsigned int(16)[3] opcolor = {
    
    0, 0, 0};
}

vmhd

aligned(8) class SoundMediaHeaderBox extends FullBox(‘smhd’, version = 0, 0) {
    
     
    template int(16) balance = 0; 
    const unsigned int(16) reserved = 0;
}

vmhd

aligned(8) class HintMediaHeaderBox extends FullBox(‘hmhd’, version = 0, 0) {
    
     
    unsigned int(16) maxPDUsize;
    unsigned int(16) avgPDUsize;
    unsigned int(32) maxbitrate;
    unsigned int(32) avgbitrate;
    unsigned int(32) reserved = 0; 
}

vmhd

aligned(8) class NullMediaHeaderBox extends FullBox(’nmhd’, version = 0, flags) {
    
    
}

3.14 stbl

//类型:stbl
//容器:minf
//强制:必需
//数量:1
aligned(8) class SampleTableBox extends Box(‘stbl’) {
    
     }

  The data in the media file is accessed through the index, and the specific data is stored in mdat, and the index for this access is stbl(Sample Table Box). stblIt is a container Box, and the specific content is described by the internal Box.

3.15 stsd

  stsdGive the description information of the sample, which contains any initialization information that needs to be used in the decoding stage, such as encoding. The description information for data storage of different Track types is also different.

//类型:stsd
//容器:stbl
//强制:必需
//数量:1
aligned(8) abstract class SampleEntry (unsigned int(32) format) extends Box(format){
    
    
    const unsigned int(8)[6] reserved = 0; 
    unsigned int(16) data_reference_index;
}

class HintSampleEntry() extends SampleEntry (protocol) {
    
     
    unsigned int(8) data [];
} // Visual Sequences

class VisualSampleEntry(codingname) extends SampleEntry (codingname){
    
     
    unsigned int(16) pre_defined = 0; 
    const unsigned int(16) reserved = 0; 
    unsigned int(32)[3] pre_defined = 0; 
    unsigned int(16) width; 
    unsigned int(16) height;
    template unsigned int(32) horizresolution = 0x00480000; // 72 dpi 
    template unsigned int(32) vertresolution = 0x00480000; // 72 dpi 
    const unsigned int(32) reserved = 0; 
    template unsigned int(16) frame_count = 1; 
    string[32] compressorname;
    template unsigned int(16) depth = 0x0018; 
    int(16) pre_defined = -1;
} // Audio Sequences

class AudioSampleEntry(codingname) extends SampleEntry (codingname){
    
     
    const unsigned int(32)[2] reserved = 0; 
    template unsigned int(16) channelcount = 2; 
    template unsigned int(16) samplesize = 16; 
    unsigned int(16) pre_defined = 0; 
    const unsigned int(16) reserved = 0 ;
    template unsigned int(32) samplerate = {
    
    timescale of media}<<16; 
}

aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', 0, 0){
    
     
    int i ;
    unsigned int(32) entry_count; 
    for (i = 1 ; i u entry_count ; i++){
    
     
        switch (handler_type){
    
    
        case ‘soun’: // for audio tracks 
            AudioSampleEntry(); break;
        case ‘vide’: // for video tracks 
            VisualSampleEntry(); break;
        case ‘hint’: // Hint track 
            HintSampleEntry(); break;
        } 
    } 
}

  stsdThe encoding information corresponding to the corresponding Track is stored, and each type of Box has a corresponding one SampleEntry(not to mention the meaning of the field, which can be judged obviously from the above words), and the specific type is judged according to the hdlrabove handle type. The specific Box is inherited from the corresponding encoding Box, such as avc1the Box described by the encoding of AVC. If the video information uses AVC encoding, SampleEntryit should inherit from avc1the Box. In addition, it can also be seen from the structure definition that a Track can be described by multiple descriptions, and each description is formatnameuniquely protocolidentified, such as avc1, mp4a, etc.

3.16 stts

  

//类型:stts
//容器:stbl
//强制:必需
//数量:1
aligned(8) class TimeToSampleBox extends FullBox(’stts’, version = 0, 0) {
    
     
    unsigned int(32) entry_count; 
    int i;
    for (i=0; i < entry_count; i++) {
    
     
        unsigned int(32) sample_count; 
        unsigned int(32) sample_delta;
    }
}

  sttsContains a mapping table from DTS to sample number, which is mainly used to derive the duration of each frame. It can be seen from the above definition sttsthat the content is a list, and each item of the list is:

  • sample_count: The number of current samples, the duration of each sample is sample_delta;
  • sample_delta: The duration described by the current sample in mdhdthe scale;timescale

  For a video with a constant frame rate, there is generally only one item. For example, "sample_count":2530,"sample_delta":512the duration of the entire video stream is 2530x512, timescale=12288, converted to 105s, and the frame rate is 12288 512 \frac{12288}{512}51212288for 24 frames. For videos with variable frame rates, there are generally multiple items.

3.17 stss

//类型:stss
//容器:stbl
//强制:非必需
//数量:0 or 1
aligned(8) class SyncSampleBox extends FullBox(‘stss’, version = 0, 0) {
    
     
    unsigned int(32) entry_count; 
    int i;
    for (i=0; i < entry_count; i++) {
    
     
        unsigned int(32) sample_number;
    } 
}

  stss(Sync Sample Box) Stores the sample number of the key frame in the file. If there is no stss, all samples are key frames.

3.18 ctts

//类型:ctts
//容器:stbl
//强制:非必需
//数量:0 or 1
aligned(8) class CompositionOffsetBox extends FullBox(‘ctts’, version = 0, 0) {
    
     
    unsigned int(32) entry_count; 
    int i;
    for (i=0; i < entry_count; i++) {
    
     
        unsigned int(32) sample_count; 
        unsigned int(32) sample_offset;
    } 
}

  ctts(Composition Time To Sample Box) stores the difference between decoding (dts) and rendering (pts). For videos with only I frames and P frames, the decoding order and rendering order are consistent, and there is cttsno need to exist at this time. For videos with B frames, cttsthey need to exist. When PTS and DTS are not equal, it is needed ctts, and the formula is $CT(n) = DT(n) + CTTS(n) $ (DTS can be sttsobtained according to).

3.19 stsc

//类型:stsc
//容器:stbl
//强制:必需
//数量:1
aligned(8) class SampleToChunkBox extends FullBox(‘stsc’, version = 0, 0) {
    
     
    unsigned int(32) entry_count;
    for (i=1; i u entry_count; i++) {
    
     
        unsigned int(32) first_chunk; 
        unsigned int(32) samples_per_chunk; 
        unsigned int(32) sample_description_index;
    } 
}

  The sample is divided into multiple groups in units of chunks. The size of the chunk can be different, and the size of the sample in the chunk can also be different. You can see that the content is an array, and each item has a mapping representing chunk:

  • first_chunk: The index of the first chunk (starts with 1)
  • samples_per_chunk: The number of samples of the current chunk;
  • sample_description_index: stsdThe index corresponding to the description information in .

  It should be noted that the chunks between the table entries [entry[i],entry[i+1])have the same samples_per_chunksum sample_description_index.

3.20 stsz

//类型:stsz
//容器:stbl
//强制:必需
//数量:1
aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) {
    
     
    unsigned int(32) sample_size; 
    unsigned int(32) sample_count;
    if (sample_size==0) {
    
     
        for (i=1; i <= sample_count; i++) {
    
     
            unsigned int(32) entry_size; 
        }
    } 
}

  stsz(Sample Size Box) stores the size of each Sample, and the video is the size of each frame (there is another structure stsz2):

  • sample_size: The default sample size (unit is byte), usually 0. If sample_size is not 0, then all samples are the same size. If sample_size is 0, then the size of sample may be different;
  • sample_count: The number of samples in the current track. If sample_size==0, then sample_count is equal to the entry of the entry below;
  • entry_size: The size of a single sample (if sample_size==0).

3.21 stco

//类型:stco
//容器:stbl
//强制:必需
//数量:1
aligned(8) class ChunkOffsetBox extends FullBox(‘stco’, version = 0, 0) {
    
     
    unsigned int(32) entry_count;
    for (i=1; i <= entry_count; i++) {
    
     
        unsigned int(32) chunk_offset;
    } 
}

  The chunk's offset in the file. For small files and large files, there are two different box types, namely stco and co64. Their structures are the same, but the field lengths are different. chunk_offsetRefers to the offset in the file itself, not the offset inside a box. When constructing mp4 files, you need to pay special attention to the position of moov, which chunk_offsethas an impact on the value of . Some MP4 files moovare at the end of the file. In order to optimize the speed of the first frame, the moov needs to be moved to the front of the file. At this time, the moov needs to be chunk_offsetrewritten.

Guess you like

Origin blog.csdn.net/GrayOnDream/article/details/127815260