Integrated display can also be hardware encoded: Intel SDK && various audio and video codec learning details

Reposted from: https://www.shuzhiduo.com/A/6pdDplDDdw/
http://blog.sina.com.cn/s/blog_4155bb1d0100soq9.html

INTEL MEDIA SDK is a codec technology launched by INTEL based on its built-in display core. We benefit from this hard decoding when playing high-definition video, which greatly reduces the CPU usage. In addition to decoding, it also has encoding functions, including hardware encoding (SDK HARDWARE) and software encoding (SDK SOFTWARE). So, how capable is this technology of INTEL?

The encoding engine of the latest TMPGEnc Video Mastering Works 5 in Japan can call INTEL MEDIA SDK and X264 respectively, let's test it.

One software: TMPGEnc Video Mastering Works 5 version: 5.0.6.38 (its built-in X264 is already the latest version 114)
Two test computer: i3-2100 H67 mainboard integrated display 64-bit WIN7 professional edition

Three-source video: The sample video "Wild Animals" that comes with WIN7, its specifications are as follows:

Four X264, SDK hardware encoding, SDK software encoding settings are output H264+AAC MKV files, as shown below:

Five test results:

X264 encoding time: 55 seconds File size: 14.4M

SDK hardware encoding time: 16 seconds File size: 14.8M

SDK software encoding time: 2 minutes and 26 seconds File size: 14.7M

Play back the encoded MKV files on the computer, there is no obvious difference in video quality.

It seems that INTEL's SDK hardware encoding technology is very powerful, it can greatly improve the encoding efficiency, and it is worth our adoption. At the same time, thanks to the software TMPGEnc Video Mastering Works 5 for making full use of its performance.

Codec study notes (1): basic concepts

Media business is among the main business of the network. Especially with the rise of mobile Internet services, media services play a very important role among operators and application developers, among which media codec services involve demand analysis, application development, release of license charges, and so on. Recently, because of the relationship between the project, it is necessary to sort out the codec of the media. What is more interesting is to read the specifications and standards of the operators on Douding. The same service of the same operator has different requirements in different documents, and some requirements are in my opinion. It should be a continuation of history, that is, it is rarely used now. So I can't see the reason on Douding, check it from the wiki. The Chinese wiki has limited information and is very short, while the English wiki has a lot of content, and the abridged version is too slim. I also saw a copycat Chinese wiki on the Internet, which looks very similar, in red, called "World Wiki". The Chinese of the wiki is still very good, but it is recommended to read English after reading it.

I have sorted out and summarized the media codec. The information comes from wiki, and a small part comes from the collection of online blogs. Netizen information we will give the source. If the information has changed hands several times, there is no other way, and we can only give a certain trajectory.

basic concept

codec

Codec (codec) refers to a device or program that can transform a signal or a data stream. The transformation referred to here not only includes the operation of encoding a signal or data stream (usually for transmission, storage or encryption) or extracting an encoded stream, but also includes recovering from this encoded stream a form suitable for observation or operation for observation or processing operation. Codecs are often used in applications such as videoconferencing and streaming.

container

Many multimedia data streams need to contain audio data and video data at the same time. At this time, some metadata for synchronizing audio and video data, such as subtitles, is usually added. These three data streams may be handled by different programs, processes, or hardware, but when they are transmitted or stored, these three types of data are usually encapsulated together. Usually this encapsulation is realized by video file format, such as common *.mpg, *.avi, *.mov, *.mp4, *.rm, *.ogg or *.tta. Some of these formats can only Use some codecs and more can use various codecs in a container way.

The full name of FourCC is Four-Character Codes, which is composed of 4 characters (4 bytes). It is a four-byte independently marked video data stream format. There will be a section of FourCC in wav and avi files to describe this AVI file. What kind of codec is used to encode. Therefore, there are many FourCCs equal to "IDP3" in wav and avi.

Video is an important part of the multimedia system in the computer now. In order to meet the needs of storing video, people have set different video file formats to put video and audio in one file to facilitate simultaneous playback. A video file is actually a container with different tracks wrapped in it, and the format of the container used is related to the scalability of the video file.

Parameter introduction

Sampling Rate

Sampling rate (also known as sampling speed or sampling frequency) defines the number of samples per second that are extracted from a continuous signal to form a discrete signal, expressed in hertz (Hz). The reciprocal of the sampling frequency is called the sampling period or sampling time, which is the time interval between samples. Be careful not to confuse sample rate with bit rate (also known as "bit rate").

The sampling theorem states that the sampling frequency must be greater than twice the bandwidth of the signal being sampled. Another equivalent statement is that the Nyquist frequency must be greater than the bandwidth of the signal being sampled. If the bandwidth of the signal is 100Hz, the sampling frequency must be greater than 200Hz to avoid aliasing. In other words, the sampling frequency must be at least twice the frequency of the largest frequency component in the signal, otherwise the original signal cannot be recovered from the signal samples.

For speech samples:

8,000 Hz - Sampling rate used by telephone, enough for human speech
11,025 Hz
- Sampling rate used by radio broadcasting
32,000 Hz - Sampling rate used by miniDV digital video camcorder, DAT (LP mode)
44,100 Hz - Audio CD, also commonly used in MPEG -1 Sampling rate 47,250 Hz for audio (VCD, SVCD, MP3)
- Sampling rate 48,000 Hz for the world's first commercial PCM recorder developed by Nippon Columbia (Denon)
- miniDV, digital TV, DVD, DAT, movies and professional audio Digital sound used Sampling rate
50,000 Hz - 50,400 Hz for the first commercially available digital recorders developed by 3M and Soundstream in the late 1970s -
Mitsubishi X-80 digital recorder
96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD audio tracks, Blu-ray Disc (Blue Disc) audio tracks, and HD-DVD (High Definition DVD) audio tracks use a sampling rate of 2.8224 MHz - A name jointly developed by SACD, Sony, and
Philips The sampling rate used by Direct Stream Digital's 1-bit sigma-delta modulation process.
  In analog video, sample rate is defined as frame rate and field rate, not as a conceptual pixel clock. The image sampling frequency is the cycle rate of the sensor integration period. Since the integration period is much smaller than the time required to repeat, the sampling frequency may differ from the reciprocal of the sampling time.

50 Hz - PAL video
60 / 1.001 Hz - NTSC video
  When converting analog video to digital, a different sampling process occurs, this time using the pixel frequency. Some common pixel sampling rates are:

13.5 MHz - CCIR 601, D1 video
resolution

Resolution generally refers to the ability of a measurement or display system to distinguish details. This concept can be measured in fields such as time and space. The resolution in everyday language is mostly used for the clarity of images. The higher the resolution, the better the image quality, and the more details can be shown. But relatively, because the more information recorded, the larger the file will be. At present, the images in the personal computer can use image processing software to adjust the size of the image, edit photos, etc. Such as photoshop, or photoimpact and other software.

Image Resolution:

It is used to describe the ability to distinguish image details, and it is also applicable to digital images, film images, and other types of images. Commonly used 'line per millimeter', 'line per inch' and so on to measure. Usually, "resolution" is expressed as the number of pixels in each direction, such as 640x480 and so on. In some cases, it can also be expressed as "pixels per inch" (pixels per inch, ppi) and the length and width of the graphic at the same time. Such as 72ppi, and 8x6 inches.

Video resolution:

The screen size of various TV specification resolution comparison videos is called "resolution". Digital video is measured in pixels, while analog video is measured in horizontal scan lines. The resolution of SDTV is 720/704/640x480i60 (NTSC) or 768/720x576i50 (PAL/SECAM). The resolution of the new high-definition television (HDTV) can reach 1920x1080p60, that is, each horizontal scan line has 1920 pixels, and each screen has 1080 scan lines, which can be played at a speed of 60 frames per second.

frame rate fps

Frame rate is often translated as "picture update rate" or "frame rate" in Chinese, which refers to the number of static pictures played per second in a video format. Typical picture refresh rates range from 6 or 8 frames per second (frame per second, fps for short) in the early days to 120 frames per second today. PAL (television broadcast format in Europe, Asia, Australia, etc.) and SECAM (television broadcast format in France, Russia, some Africa, etc.) stipulate that the update rate is 25fps, while NTSC (television broadcast format in the United States, Canada, Japan, etc. format) specifies an update rate of 29.97 fps. Movie film is shot at a slightly slower 24fps, which requires some complicated conversion procedures (refer to Telecine conversion) when TV broadcasts in various countries broadcast movies. About 10fps is needed to achieve the most basic persistence effect.

compression method

Lossy and lossless compression

The concepts of lossy (Lossy) and lossless (Lossless) in video compression are basically similar to those in static images. Lossless compression means that the data before compression and after decompression are exactly the same. Most lossless compression uses the RLE run-length encoding algorithm. Lossy compression means that the decompressed data is not consistent with the uncompressed data. During the compression process, some image or audio information that is not sensitive to human eyes and ears will be lost, and the lost information cannot be recovered. Almost all high-compression algorithms use lossy compression in order to achieve the goal of low data rates. The lost data rate is related to the compression ratio. The smaller the compression ratio, the more data will be lost, and the decompressed effect will generally be worse. In addition, some lossy compression algorithms use multiple repetitions of compression, which can cause additional data loss.

Lossless formats such as WAV, PCM, TTA, FLAC, AU, APE, TAK, WavPack (WV)
Lossy formats such as MP3, Windows Media Audio (WMA), Ogg Vorbis (OGG), AAC
Intra and Inter compression

帧内(Intraframe)压缩也称为空间压缩 (Spatial compression)。当压缩一帧图像时,仅考虑本帧的数据而不考虑相邻帧之间的冗余信息,这实际上与静态图像压缩类似。帧内一般采用有损压缩算法,由于帧内压缩时各个帧之间没有相互关系,所以压缩后的视频数据仍可以以帧为单位进行编辑。帧内压缩一般达不到很高的压缩。

采用帧间(Interframe)压缩是基于许多视频或 动画的连续前后两帧具有很大的相关性,或者说前后两帧信息变化很小的特点。也即连续的视频其相邻帧之间具有冗余信息,根据这一特性,压缩相邻帧之间的冗余量就可以进一步提高压缩量,减小压缩比。帧间压缩也称为时间压缩(Temporalcompression),它通过比较时间轴上不同帧之间的数据进行压缩。帧间压缩一般是无损的。帧差值(Frame differencing)算法是一种典型的时间压缩法,它通过比较本帧与相邻帧之间的差异,仅记录本帧与其相邻帧的差值,这样可以大大减少数据量。

对称编码和不对称编码

对称性(symmetric)是压缩编码的一个关键特 征。对称意味着压缩和解压缩占用相同的计算处理能力和时间,对称算法适合于实时压缩和传送视频,如视频会议应用就以采用对称的压缩编码算法为好。而在电子出版和其它多媒体应用中,一般是把视频预先压缩处理好,尔后再播放,因此可以采用不对称(asymmetric)编码。不对称或非对称意味着压缩时需要花费大量的处理能力和时间,而解压缩时则能较好地实时回放,也即以不同的速度进行压缩和解压缩。一般地说,压缩一段视频的时间比回放(解压缩)该视频的时间 要多得多。例如,压缩一段三分钟的视频片断可能需要10多分钟的时间,而该片断实时回放时间只有三分钟。

除wiki外的资料来源:http://tech.lmtw.com/csyy/Using/200411/3142.html

编解码学习笔记(二):codec类型
资料(港台将information翻译为资料)压缩是透过去除资料中的冗余资讯而达成。就视讯资料而言,资料中的冗余资讯可以分成四类:

时间上的冗余资讯(temporal redundancy)
  在视讯资料中,相邻的帧(frame)与帧之间通常有很强的关连性,这样的关连性即为时间上的冗余资讯。这即是上一次学习中的帧间压缩。

空间上的冗余资讯(spatial redundancy)
  在同一张帧之中,相邻的像素之 间通常有很强的关连性,这样的关连性即为空间上的冗余资讯。这即是上一次学习中的帧内压缩。

统计上的冗余资讯(statistical redundancy)
  统计上的冗余资讯指的是欲编码的符号(symbol)的机率分布是不均匀(non-uniform)的。

感知上的冗余资讯(perceptual redundancy)
  感知上的冗余资讯是指在人在观看视讯时,人眼无法察觉的资讯。

视讯压缩(英文:Video compression)是指运用资料压缩技术将数位视讯资料中的冗余资讯去除,降低表示原始视讯所需的资料量,以便视讯资料的传输与储存。实际上,原始视讯资料的资料量往往过大,例如未经压缩的电视品质视讯资料的位元率高达216Mbps,绝大多数的应用无法处理如此庞大的资料量,因此视讯压缩是必要的。目前最新的视讯编码标准为ITU-T视讯编码专家组(VCEG)和ISO/IEC动态图像专家组(MPEG)联合组成的联合视讯组(JVT,Joint Video Team)所提出的H.264/AVC。

A typical video encoder: When encoding the current signal, the encoder will first generate a signal that predicts the current signal, called the predicted signal. The prediction method can be temporal prediction (interprediction), also That is, using the signal of the previous frame for prediction, or spatial prediction (intra prediction), that is, using the signal of adjacent pixels in the same frame for prediction. After obtaining the predicted signal, the encoder will subtract the current signal from the predicted signal to obtain the residual signal (residual signal), and only encode the residual signal. In this way, part of the redundancy in time or space can be removed Information. Next, the encoder does not directly encode the residual signal, but first transforms the residual signal (usually discrete cosine transform) and then quantizes to further remove spatial and perceptual redundant information. The quantized coefficients obtained after quantization are then entropy coded to remove statistical redundant information.

Development of Video Coding Standards

years

standard

formulate organization

Remove copyright protection
(DRM-free)

main application

1984

H.120

IT-T

yes

1990

H.261

IT-T

yes

video conferencing, video calling

1993

MPEG-1 Part 2

ISO /IEC

yes

Video CD (VCD)

1995

H.262/MPEG-2 Part 2

ISO/IEC、ITU-T

no

DVD-Video, Blu-Ray, DVB, SVCD

1996

H.263 [6]

IT-T

Video conferencing, video calling, 3G mobile video (3GP)

1999

MPEG-4 Part 2

ISO /IEC

no

2003

H.264/MPEG-4 AVC [1]

ISO/IEC、ITU-T

no

Blu-ray (Blu-Ray) video disc, digital video broadcasting (DVB), iPod video, high-definition DVD (HD DVD)

See the table below for common codecs, which will be discussed in the following categories:

video codec

ISO/IEC

MJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2 )· MPEG-4 (Part 2/ASP · Part 10/AVC )· HVC

IT-T

H.120 · H.261 · H.262 · H.263 · H.264 · H.265

other

AMV · AVS · Bink · CineForm · Cinepak · Dirac · DV · Indeo · Microsoft Video 1 · OMS Video · Pixlet ·RealVideo · RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark· Theora· VC-1· VP3·VP6· VP7· VP8· WMV

audio codec

ISO/IECMPEG

MPEG-1 Layer III (MP3) · MPEG-1 Layer II · MPEG-1 Layer I · AAC · HE-AAC · MPEG-4 ALS · MPEG-4 SLS · MPEG-4 DST

IT-T

G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1

other

AC-3 · AMR · AMR-WB · AMR-WB+ · Apple Lossless · ATRAC · DRA · DTS · FLAC · GSM-HR · GSM-FR ·GSM-EFR · iLBC · Monkey’s Audio · TTA (True Audio)· MT9 · μ-law · Musepack · Nellymoser ·OptimFROG · OSQ · RealAudio · RTAudio · SD2 · SHN · SILK · Siren · Speex · TwinVQ · Vorbis ·WavPack · WMA

Image Compression

ISO/IEC/ITU-T

JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG · WBMP

Others

APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · TIFF

media container

universal

3GP · ASF · AVI · Bink · BXF · DMF · DPX · EVO · FLV · GXF · M2TS · Matroska · MPEG-PS · MPEG-TS ·MP4 · MXF · Ogg · QuickTime · RealMedia · RIFF · Smacker · VOB

audio only

AIFF· AU· WAV

For the above table, you can search for a specific codec in the Chinese wiki, but the English wiki is richer in consultation, see the table below

Multimedia compression formats

Video compression

ISO/IEC

MJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2 )· MPEG-4 (Part 2/ASP · Part 10/AVC )·HEVC

IT-T

H.120 · H.261 · H.262 · H.263 · H.264 · HEVC

Others

AMV · AVS · Bink · CineForm · Cinepak · Dirac · DV · Indeo · Microsoft Video 1 · OMS Video · Pixlet· RealVideo · RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark · Theora · VC-1 · VP3 · VP6 · VP7 · VP8 · WMV

Audio compression

ISO/IEC

MPEG-1 Layer III (MP3) · MPEG-1 Layer II · MPEG-1 Layer I · AAC · HE-AAC · MPEG-4 ALS ·MPEG-4 SLS · MPEG-4 DST · MPEG-4 HVXC · MPEG-4 CELP

IT-T

G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1

Others

AC-3 · AMR · AMR-WB · AMR-WB+ · Apple Lossless · ATRAC · DRA · DTS · FLAC · GSM-HR ·GSM-FR · GSM-EFR · iLBC · Monkey’s Audio · TTA (True Audio)· MT9 · μ-law · Musepack ·Nellymoser · OptimFROG · OSQ · RealAudio · RTAudio · SD2 · SHN · SILK · Siren · Speex ·TwinVQ · Vorbis · WavPack · WMA

Image compression

ISO/IEC/ITU-T

JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG · WBMP

Others

APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · QTVR · TIFF

Media containers

ISO/IEC

MPEG-PS · MPEG-TS · MPEG-4 Part 12 /JPEG 2000 Part 12 · MPEG-4 Part 14

IT-T

H.222.0

Others

3GP and 3G2 · ASF · AVI · Bink · DivX Media Format · DPX · EVO · Flash Video · GXF · M2TS ·Matroska · MXF · Ogg · QuickTime · RealMedia · REDCODE RAW · RIFF · Smacker · MOD and TOD · VOB · WebM

Audio only

AIFF · AU · WAV

Codec study notes (3): Mpeg series - Mpeg 1 and Mpeg 2
MPEG is the abbreviation of Moving Picture Experts Group. The original meaning of this name refers to a group that studies video and audio coding standards. Now what we call MPEG generally refers to a series of video coding standards formulated by this group. The group was formed in 1988, and has so far formulated multiple standards such as MPEG-1, MPEG-2, MPEG-3, MPEG-4, and MPEG-7, and MPEG-21 is being formulated.

MPEG has developed and is developing the following video-related standards so far:

MPEG-1: The first official video and audio compression standard, which was later adopted in Video CD. The third level of audio compression (MPEG-1 Layer 3) referred to as MP3 has become a more popular audio compression format.
MPEG-2: Broadcast-quality video, audio, and transport protocol. Used in wireless digital TV - ATSC, DVB and ISDB, digital satellite TV (such as DirecTV), digital cable TV signals, and DVD-Video disc technology.
MPEG-3: The original goal was to design for high-definition television (HDTV), and then it was found that MPEG-2 was sufficient for HDTV applications, so the development of MPEG-3 was discontinued.
MPEG-4: A video compression standard released in 2003, which mainly extends MPEG-1, MPEG-2 and other standards to support video/audio "objects" encoding, 3D content, low bit rate encoding ( low bitrate encoding) and Digital Rights Management (Digital Rights Management), of which Part 10 is jointly issued by ISO/IEC and ITU-T, called H.264/MPEG-4 Part 10. See H.264.
MPEG-7: MPEG-7 is not a video compression standard, it is a multimedia content description standard.
MPEG-21: MPEG-21 is a standard under development, and its goal is to provide a complete platform for future multimedia applications.
  The media codec lies in MPEG-1, MPEG-2, MPEG-4, as shown in the figure above.

Explanation of the name in the picture: Everyone on the earth knows DVD in the picture, what is DVB?

DVB: Digital Video Broadcasting (DVB, Digital Video Broadcasting), is a series of internationally recognized digital television open standards maintained by the "DVB Project". DVB system transmission methods are as follows:

· Satellite TV (DVB-S and DVB-S2)

· Cable TV (DVB-C)

· Wireless TV (DVB-T)

· Handheld terrestrial wireless (DVB-H)

These standards define the physical layer and data link layer of the transmission system. The device interacts with the physical layer through a synchronous parallel interface (SPI), a synchronous serial interface (SSI), or an asynchronous serial interface (ASI). The data is transported as an MPEG-2 Transport Stream and requires compliance with stricter restrictions (DVB-MPEG). A standard (DVB-H) for real-time compressed transmission of data to mobile terminals is currently being tested.

The main difference between these transmission methods is the modulation method used, because the requirements for the frequency bandwidth of different applications are different. DVB-S using high-frequency carriers uses QPSK modulation, DVB-C using low-frequency carriers uses QAM-64 modulation, and DVB-T using VHF and UHF carriers uses COFDM modulation.

In addition to audio and video transmission, DVB also defines a data communication standard (DVB-DATA) with a return channel (DVB-RC).

DVB codec, video: MPEG-2, MPEG-4 AVC; audio: MP3, AC-3, AAC, HE-AAC.

MPEG-1

MPEG-1 was officially released as ISO/IEC11172.

The earlier MPEG-1 video encoding has relatively poor quality and is mainly used to store video on CD-ROM. The most familiar one in China is VCD (Video CD), and its video encoding uses MPEG-1. It is a video and audio compression format customized for CD disc media. A 70-minute CD transfer rate is about 1.4Mbps. However, MPEG-1 adopts block-based motion compensation, discrete cosine transform (DCT), quantization and other technologies, and optimizes the transmission rate of 1.2Mbps. MPEG-1 was subsequently adopted by Video CD as the core technology. The output quality of MPEG-1 is about the same as that of a traditional video recorder VCR, and the signal quality is equivalent. This may be the reason why Video CD has not been successful in developed countries.

MPEG-1 audio is divided into three layers, which are MPEG-1 Layer I, II, III, and the third layer protocol is MPEG-1 Layer 3, referred to as MP3. MP3 has become a widely circulated audio compression technology.

MPEG-1有下面几个部分:

第一部分(Part 1):系统;
第二部分(Part 2):视频;
第三部分(Part 3):音频;定义level1,level2,level3,并在MPEG-2中定义了扩展。
第四部分(Part 4):一次性测试;
第五部分(Part 5):参考软件;
MPEG-1的缺点:

1个音频压缩系统限于两个通道(立体声)
没有为隔行扫描视频提供标准化支持,且压缩率差
只有一个标准化的“profile” (约束参数比特流), 不适应 更高分辨率的视频。MPEG - 1可以支持4k的视频,但难以提供更高分辨率的视频编码并且标识硬件的支持能力。
支持只有一个颜色空间,4:2:0。
MPEG-2

MPEG-2内容介绍

MPEG-2作为ISO/IEC 13818正式发布,通常用来为广播信号提供视频和音频编码,包括卫星电视、有线电视等。MPEG-2经过少量修改后,也成为DVD产品的内核技术。

MPEG-2有11部分,具体如下:

第一部(Part 1):系统-描述视频和音频的同步和多路技术

正式名称是 ISO/IEC 13818-1或 ITU-T中的H.222.0

MPEG-2的系统描述部分(第1部分)定义了传输流,它用来一套在非可靠介质上传输数字视频信号和音频信号的机制,主要用在广播电视领域。

Two different but related container formats are defined, MPEG transport stream and MPEG program stream, which are TS and PS in the figure. MPEG Transport Stream (TS) is used to carry lossy digital video and audio. The beginning and end of the media stream may not be marked, just like broadcast or tape. Examples include ATSC, DVB, SBTVD and HDV. The MPEG-2 system also defines the MPEG Program Stream (PS), which designs a container format for file-based media, for hard drives, optical discs, and flash memory.

MPEG-2 PS (Program Stream) was developed for storing video information on storage media. MPEG-2 TS (Transport Stream) was developed for the transmission of video information over the network. At present, the most widely used MPEG-2 TS is the DVB system. The difference between the TS stream and the PS stream is that the packet structure of the TS stream is fixed, while the packet structure of the PS stream is variable length. The difference in structure between PS packets and TS packets leads to their different resistance to transmission errors, so the application environments are also different. Since the TS code stream adopts a fixed-length packet structure, when a transmission error destroys the synchronization information of a certain TS packet, the receiver can detect the synchronization information in the packet behind it at a fixed position, thereby restoring synchronization and avoiding information loss. lost. Since the length of the PS packet changes, once the synchronization information of a certain PS packet is lost, the receiver cannot determine the synchronization position of the next packet, which will cause out-of-synchronization, resulting in serious information loss. Therefore, when the channel environment is relatively bad and the transmission error is high, the TS code stream is generally used; and when the channel environment is good and the transmission error is low, the PS code stream is generally used because the TS code stream has a strong resistance to transmission. Therefore, the MPEG-2 bit streams currently transmitted in the transmission media basically use the packet format of the TS bit stream.

Part 2 (Part 2): Video-Video Compression

The official name is ISO/IEC 13818-2 or ITU-T H.262.

Compression codec that provides interlaced and non-interlaced video signals.

The second part of MPEG-2, the video part, is similar to MPEG-1, but it provides support for interlaced video display modes (interlaced scanning is widely used in broadcast television). MPEG-2 video is not optimized for low bit rates (less than 1Mbps), and MPEG-2 is significantly better than MPEG-1 at bit rates of 3Mbit/s and above. MPEG-2 is backward compatible, that is, all MPEG-2 decoders that conform to the standard can also play MPEG-1 video streams normally.

MPEG-2 technology is also applied in HDTV transmission system. MPEG-2 is not only used in DVD-Video, but now most HDTV (High Definition Television) also use MPEG-2 encoding, with a resolution of 1920x1080. Due to the popularity of MPEG-2, MPEG-3 originally prepared for HDTV was finally abandoned.

MPEG-2 video usually contains multiple GOPs (GroupOf Pictures), and each GOP contains multiple frames (frame). The frame type of the frame usually includes I-frame (I-frame), P-frame (P-frame) and B-frame (B-frame). Among them, the I-frame adopts intra-frame coding, the P-frame adopts forward estimation, and the B-frame adopts two-way estimation. Generally speaking, the input video format is 25 (CCIR standard) or 29.97 (FCC) frames per second.

MPEG-2 supports both interlaced and progressive scans. In progressive scan mode, the basic unit of encoding is the frame. In interlaced scanning mode, the basic code can be a frame or a field.

The original input image is first converted to YCbCr color space. where Y is the luma, and Cb and Cr are the two chrominance channels. Cb refers to the shade of blue and Cr refers to the shade of red. For each channel, block partitioning is first used, and then "macroblocks" are formed, which constitute the basic unit of encoding. Each macroblock is subdivided into 8x8 subblocks. The number of chroma channel partitions into tiles depends on the initial parameter setting. For example, in the commonly used 4:2:0 format, each chroma macroblock samples only one small block, so the number of small blocks that can be divided into three channel macroblocks is 4+1+1=6.

For I-frames, the entire image goes directly into the encoding process. For P-frames and B-frames, motion compensation is done first. Generally speaking, due to the strong correlation between adjacent frames, it is better for macroblocks to find similar areas in corresponding similar positions in the previous frame and the subsequent frame. This offset is recorded as a motion vector. The errors of the regions reconstructed by motion estimation are sent to the encoder for encoding.

For each 8×8 small block, the discrete cosine transform transforms the image from the spatial domain to the frequency domain. The resulting transform coefficients are quantized and rearranged to increase the likelihood of long zeros. Then do the run-length code (run-length code). Finally, do Huffman Encoding (Huffman Encoding).

I frame encoding is to reduce redundancy in the space domain, and P and B frames are to reduce redundancy in the time domain.

GOP is composed of a series of I frames, P frames, and B frames in a fixed pattern. A commonly used structure consists of 15 frames and has the form IBBPBBPBBPBBPBB. The selection of the ratio of each frame in the GOP has a certain relationship with the bandwidth and image quality requirements. For example, because the compression time of a B frame may be three times that of an I frame, it may be necessary to reduce the proportion of a B frame for some real-time systems with weak computing power.

MPEG-2 output bit stream can be constant speed or variable speed. The maximum bit rate, eg in DVD applications, is up to 10.4 Mbit/s. If a fixed bit rate is to be used, the quantization scale needs to be constantly adjusted to produce a uniform bit stream. However, increasing the quantization scale may introduce visible distortion effects. Such as the mosaic phenomenon.

Part 3 (Part 3): Audio-Audio Compression

Part III of MPEG-2 defines audio compression standards. MPEG-2 BC (Backwards compatible), backward compatible with MPEG-1 audio. This part improves the audio compression of MPEG-1, and supports more than two channels of audio, up to 5.1 multi-channel. The MPEG-2 audio compression section also maintains a backward compatibility feature (also known as MPEG-2 BC), allowing MPEG-1 audio codecs to decode the two main stereo components. Also define audio MPEG-1 Layer I, II, III additional bit rate and sampling frequency.

For example, mp2 is MPEG-1 Audio level 2, and the standards include: ISO/IEC 11172-3, ISO/IEC 13818-3. MPEG-1Layer II is defined in ISO/IEC 11172-3, which is the third part of MPEG-1, and is defined in ISO/IEC 13818-3, which is the third part of MPEG-2.

Part Four (Part 4): Test Specifications

Describe the test procedure.

Part 5 (Part 5): Simulation software

Describe the software simulation system.

Part 6 (Part 6): DSM-CC (Digital Storage Media Command and Control) extension

Describes the DSM-CC (Digital Storage Media Command and Control) extension.

Part Seven (Part 7): Advanced Audio Coding (AAC)

Part VII of MPEG-2 defines audio compression that is not backward compatible (also known as MPEG-2 NBC). Also became MPEG-2 NBC (not-backwards compatible MPEG-1Audio). This section provides enhanced audio capabilities. Usually what we call MPEG-2 AAC refers to this part. AAC stands for Advanced Audio Coding. AAC is more efficient than previous MPEG audio standards, and to some extent less complex than its predecessor, MPEG-1 Layer 3 (MP3), which does not have complex hybrid filter banks. It supports from 1 to 48 channels, sampling rate from 8-96 kHz, multi-channel, multi-language and multi-program (multiprogram) capabilities. AAC is also described in Part 3 of the MPEG-4 standard.

Eighth (Part 8):

Cancelled.

Part 9 (Part 9): Real-time interface extension

Real-time interface extension.

Part 10 (Part 10): DSM-CC Conformance Extension

DSM-CC conformance extension.

Part 11 (Part 11): IP

Intellectual Property Management (IPMP). XML is defined in ISO/IEC23001-3. MPEG-2 core technology involves about 640 patents, and these patents are mainly concentrated in 20 companies and a university.

MPEG-2 audio

MPEG-2 provides a new audio coding method. Introduced in Sections 3 and 7.

the third part

MPEG-2 BC (backward compatible with MPEG-1 audio formats), using half the sampling rate to process low bit rate audio, (MPEG-1 Layer 1/2/3 LSF), multi-channel encoding up to 5.1 channels.

Part VII

MPEG-2 NBC (Non-Backward Compatible), providing MPEG-2AAC, and not backward compatible, multi-channel encoding up to 48 channels.

MPEG- 2 profile和level

MPEG-2 provides a wide range of applications. For most applications, it is impractical and too expensive to support the entire standard, usually only a subset, so the standard defines profile and level to represent these subsets. profile defines features related, such as compression algorithm, chroma format, etc. level defines performance related, such as maximum bit rate, maximum frame size, etc. An application should express its capabilities through profile and level. The combination of profile and level constitutes a subset of the MPEG-2 video coding standard for a specific application. For an image in a certain input format, a specific set of compression coding tools is used to generate a coded stream within a specified rate range. For example a DVD player can say that it supports the most major profiles and major levels (often written as MP@ML).

MPEG-2 main profile:

name

English

Chinese

Image encoding type

Chroma format YCbCr

aspect ratio

stretch mode

SP

Simple Profile

simple class

I frame, P frame

4:2:0

4:3 or 16:9

MP

Main Profile

main class

I frame, P frame, B frame

4:2:0

4:3 or 16:9

SNR

SNR Scalable profile

SNR Hierarchical Class

I frame, P frame, B frame

4:2:0

4:3 or 16:9

SNR scalable

Spatial

Spatially scalable profile

Spatially layerable classes

I frame, P frame, B frame

4:2:0

4:3 or 16:9

SNR or spatial scalability

442P

4:2:2 Profile

I frame, P frame, B frame

4:2:2

HP

High profile

high class

I frame, P frame, B frame

4:2:0 or 4:2:2

4:3 or 16:9

SNR or spatial scalability

MPEG-2 main levels:

name

English

frame rate

Maximum Length×Maximum Width

Maximum brightness samples per second (approximately height x width x frame frequency)

Maximum bit rate (Mbit/s)

LL

Low Level

23.976, 24, 25, 29.97, 30

352×288

3,041,280

4

ML

Main Level

23.976, 24, 25, 29.97, 30

720×576

10,368,000, with exceptions: 14,475,600 for 4:2:0 and 11,059,200 for 4:2:2 in HP

15

H-14

High-1440 level

23.976, 24, 25, 29.97, 30, 50, 59.94, 60

1440×1152

47,001,600, with exception: 62,668,800 for 4:2:0 in HP

60

HL

High level

23.976, 24, 25, 29.97, 30, 50, 59.94, 60

1920×1152

62,668,800, with exception: 83,558,400 for 4:2:0 in HP

80

Combination example

Profile @ Level

Resolution (px)

Framerate max. (Hz)

Sampling

Bitrate (Mbit/s)

Example Application

SP@LL

176 × 144

15

4:2:0

0.096

Wireless handsets

SP@ML

352 × 288

15

4:2:0

0.384

PDAs

320 × 240

24

MP@LL

352 × 288

30

4:2:0

4

Set-top boxes (STB)

MP@ML

720 × 480

30

4:2:0

15 (DVD: 9.8)

DVD, SD-DVB

720 × 576

25

MP@H-14

1440 × 1080

30

4:2:0

60 (HDV: 25)

HDV

1280 × 720

30

MP@HL

1920 × 1080

30

4:2:0

80

ATSC 1080i, 720p60, HD-DVB (HDTV).

(Bitrate for terrestrial transmission is limited to 19.39Mbit/s)

1280 × 720

60

422P@LL

4:2:2

422P@ML

720 × 480

30

4:2:2

50

Sony IMX using I-frame only, Broadcast “contribution” video (I&P only)

720 × 576

25

422P@H-14

1440 × 1080

30

4:2:2

80

Potential future MPEG-2-based HD products from Sony and Panasonic

1280 × 720

60

422P@HL

1920 × 1080

30

4:2:2

300

Potential future MPEG-2-based HD products from Panasonic

1280 × 720

60

Application of MPEG-2 on DVD

DVD adopts the MPEG-2 standard and introduces the following technical parameter restrictions:

  • Resolution
    o 720 x 480, 704 x 480, 352 x 480, 352 x 240 pixels (NTSC)
    o 720 x 576, 704 x 576, 352 x 576, 352 x 288 pixels (PAL)
  • Aspect ratio
    o 4:3
    o 16:9
  • Frame rate (frame playback speed)
    o 59.94 fields/sec, 23.976 frames/sec, 29.97 frames/sec (NTSC)
    o 50 fields/sec, 25 frames/sec (PAL)
  • Video+Audio bitrate
    o average maximum buffer 9.8 Mbit/s
    o peak 15 Mbit/s
    o minimum 300 Kbit/s
  • YUV 4:2:0
  • subtitle support
  • Embedded subtitle support (NTSC only)
  • Audio
    o LPCM encoding: 48kHz or 96kHz; 16 or 24-bit; up to 6 channels
    o MPEG Layer 2 (MP2): 48 kHz, up to 5.1 channels
    o Dolby Digital (DD, also known as AC-3): 48 kHz, 32-448kbit/s, up to 5.1 channels
    o Digital home theater system-Digital Theater Systems (DTS): 754 kbit/s or 1510 kbit/s
    o NTSC DVD must contain at least one LPCM Or Dolby Digital
    o PAL format DVD must contain at least one MPEG Layer 2, LPCM or Dolby Digital
  • GOP structure
    o Serial header information must be provided for GOP
    o Maximum number of frames that GOP can contain: 18 (NTSC) / 15 (PAL)

Application of MPEG-2 on DVB

DVB-MPEG related technical parameters:

  • 必须符合以下一种分辨率:
    o 720 × 480 像素,24/1.001,24,30/1.001或30帧/秒
    o 640 × 480 像素,24/1.001,24,30/1.001或30帧/秒
    o 544 × 480 像素,24/1.001,24,30/1.001或30帧/秒
    o 480 × 480 像素,24/1.001,24,30/1.001或30帧/秒
    o 352 × 480 像素,24/1.001,24,30/1.001或30帧/秒
    o 352 × 240 像素,24/1.001,24,30/1.001或30帧/秒
    o 720 × 576 像素,25帧/秒
    o 544 × 576 像素,25帧/秒
    o 480 × 576 像素,25帧/秒
    o 352 × 576 像素,25帧/秒
    o 352 × 288 像素,25帧/秒

MPEG- 2和NTSC

必须符合以下一种分辨率:
o 1920 × 1080 像素,最多60帧/秒(1080i)
o 1280 × 720 像素,最多60帧/秒(720p)
o 720 × 576 像素,最多50帧/秒,25帧/秒(576i,576p)
o 720 × 480 像素,最多60帧/秒,30帧/秒(480i,480p)
o 640 × 480 像素,最多60帧/秒
注:1080i按 1920×1088像素编码,但是最后8行在显示时抛弃。

对YCbCr的补充资料

YCbCr不是一种绝对色彩空间,是YUV压缩和偏移的版本。右图为UV色版。

Y (Luma, Luminance) video, which is the grayscale value. UV is regarded as C (Chrominance or Chroma) for chroma. The main sampling (subsample) formats are YCbCr 4:2:0, YCbCr 4:2:2, YCbCr 4:1:1 and YCbCr 4:4:4. The representation of YUV is called A:B:C representation:

  • 4:4:4 means full sampling.
  • 4:2:2 means 2:1 horizontal sampling without vertical downsampling.
  • 4:2:0 means 2:1 horizontal sampling, 2:1 vertical down sampling.
  • 4:1:1 means 4:1 horizontal sampling without vertical downsampling.

The most commonly used Y:UV recording ratio is usually 1:1 or 2:1. DVD-Video is recorded in YUV 4:2:0, which is what we commonly call I420. YUV4:2:0 does not mean that only U ( That is, Cb), V (that is, Cr) must be 0, but it means that U:V refer to each other, sometimes hidden, that is to say, for each row, there is only one U or V component, if a row is 4:2:0 Then, the next line is 4:0:2, and the next line is 4:2:0...and so on.

The above is compiled from wiki data.

Codec study notes (4): Mpeg series - Mpeg 4
In the last study of MPEG-2, there is a doubt, the understanding of two channels is left and right stereo, but what is 5.1 channels? We often see Dolby 5.1 channels. What exactly does the "0.1" channel refer to? I went to the wiki to check it today, and the relevant content was also organized into our study notes. Sources for this document:

wiki
http://baike.baidu.com/view/190268.htm
http://baike.baidu.com/view/25047.htm
5.1-channel

With Dolby Digital, a 5.1 channel setup is most commonly used, but Dolby Digital allows a range of different channel options. All available channels are listed below:

单声道(中央)
双声道立体声(左、右),选择性地交叉应用杜比环回
三声道立体声(左、中、右)
双声道立体声加单环回(左、右、环回)
三声道立体声加单环回(左、中、右、环回)
四声道环回立体声(左前、右前、左后、右后)
五声道环回立体声(左前、中、右前、左后、右后)
  以上所有这些设置可选择性地使用低频效果和杜比数字EX矩阵编码中加入附加后环绕声道。杜比编码技术是向下兼容 的,很多杜比播放器/解码器均备有向下混音作用是发布不同声道至可供使用的扬声器。这包括一些功能例如声音数据通过前扬声器播放(如适用),和当中央扬声器不适用时发布中央频道至左或右扬声器。或当用户只有2.0喇叭时,杜比解码器能把多声道信号混音编码为 2.0立体声。

在5.1, 7.1 或其他等文字中,'.1’指的是低频LFE声道。

其实5.1声道就是使用5个喇叭和1个超低音扬声器来实现一种身临其境的音乐播放方式,它是由杜比公司开发的,所以叫做“杜比5.1声道”。在5.1声道系统里采用左(L)、中©、右®、左后(LS)、右后(RS)五个方向输出声音,使人产生犹如身临音乐厅的感觉。五个声道相互独立,其中“.1” 声道,则是一个专门设计的超低音声道。正是因为前后左右都有喇叭,所以就会产生被音乐包围的真实感。如右图所示。

MPEG-4

总体介绍

MPEG-4 is a set of compression coding standards for audio and video information, formulated by the Moving Picture Experts Group (MPEG) under the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). The first edition was adopted in October 1998 and the second edition in December 1999. The main uses of the MPEG-4 format are Internet streaming, optical discs, voice transmission (video telephony), and television broadcasting. MPEG-4 was officially released as ISO/IEC14496. ISO/IEC 14496-Coding of audio-visual object (AV object coding).

In order to cope with the environment such as network transmission, the traditional MPEG-1/2 can no longer adapt, so it prompted the birth of MPEG-4. Compared with MPEG-1 and MPEG-2, the feature of MPEG-4 is that it is more suitable for interactive AV service and remote monitoring. MPEG-4 is the first dynamic image standard that makes you change from passive to active (no longer just watching, allowing you to join in, that is, interactive). Another feature of it is its comprehensiveness. At its root, MPEG-4 attempts to blend natural objects with man-made objects (in the sense of visual effects). The design goal of MPEG-4 also has wider adaptability and more flexible scalability. MPEG-4 adopts a series of new technologies to meet the demand of transmitting higher video quality under low bandwidth. DivX, XviD, and MS MPEG4 all use MPEG-4 video coding. In addition to the application on DVDRip, 3GPP now also accepts MPEG-4 as a video coding scheme.

Originally the main purpose of MPEG-4 was for video communication at low bit rates, but its scope was eventually extended as a multimedia coding standard. In terms of technology, MPEG-4 allows different software/hardware developers to create multimedia objects to provide better adaptability and flexibility, and provide better quality for digital TV, dynamic images, Internet and other services.

MPEG-4 provides data ranging from a few kilobits per second to tens of megabits per second, and it has the following features:

Improve MPEG-2 coding efficiency
MPEG-4 is based on higher coding efficiency. Compared with other existing or upcoming standards, at the same bit rate, it is based on higher visual and auditory quality, which makes it possible to transmit video and audio on low-bandwidth channels. At the same time, MPEG-4 can also encode simultaneous data streams. The multi-view or multi-channel data streams of a scene can be efficiently and synchronously synthesized into the final data stream. This can be used for virtual 3D games, 3D movies, flight simulation exercises, etc.
Provides encoding capabilities for mixed media data (video, audio, voice)
Error tolerance enables stable transmission of content.
When there is a bit error or packet loss during transmission, MPEG4 is less affected and can recover quickly.
To provide the interactive capability of the audience's audio-visual scene, MPEG-4 terminal users provide different objects to support various interactive requirements.
MPEG-4 provides content-based multimedia data access tools, such as index, hyperlink, upload, download, delete, etc. With these tools, users can easily and selectively obtain the content related to the object they need from the multimedia database, and provide content manipulation and bit stream editing functions, which can be applied to interactive home shopping, digitization of fading in and out effects etc. MPEG-4 provides efficient natural or synthetic multimedia data encoding methods. It can combine natural scenes or objects into synthetic multimedia data.
MPEG-4 is transparent to the transmission data network, and it can be compatible with various networks.
MPEG-4 provides the robustness of error-prone environments to ensure its application in many wireless and wired networks and storage media. In addition, MPEG-4 also supports content-based scalability, that is, content, quality, The complexity is divided into many small blocks to meet the different needs of different users, supporting transmission channels and receiving ends with different bandwidths and different storage capacities.
这些特点无疑会加速多媒体应用的发展,从中受益的应用领域有:因特网多媒体应用;广播电视;交互式视频游戏;实时可视通 信;交互式存储媒体应用;演播室技术及电视后期制作;采用面部动画技术的虚拟会议;多媒体邮件;移动通信条件下的多媒体应用;远程视频监控;通过ATM网 络等进行的远程数据库业务等。
MPEG-4视频编码核心思想

在MPEG-4制定之前,MPEG-1、MPEG-2、H.261、H.263都是采用第一代压缩编码技术,着 眼于图像信号的统计特性来设计编码器,属于波形编码的范畴。第一代压缩编码方案把视频序列按时间先后分为一系列帧,每一帧图像又分成宏块以进行运动补偿和编码,这种编码方案存在以下缺陷:

将图像固定地分成相同大小的块,在高压缩比的情况下会出现严重的块效应,即马赛克效应;
不能对图像内容进行访问、编辑和回放等操作;
未充分利用人类视觉系统(HVS,Human Visual System)的特性。
  MPEG-4则代表了基于模型/对象的第二代压缩编码技术,它充分利用了人眼视觉特性,抓住了图像信息传输的本质,从轮廓、纹理思路出发,支持基于视觉内容的交互功能,这适应了多媒体信息的应用由播放型转向基于内容的访问、检索及操作的发展趋势。

AV object (AVO, AudioVisual Object) is an important concept proposed by MPEG-4 to support content-based coding. Objects are entities that can be accessed and manipulated in a scene, and objects can be classified based on their unique texture, motion, shape, model, and high-level semantics. The video and audio seen in MPEG-4 is no longer the concept of image frames in MPEG-1 and MPEG-2 in the past, but audio-visual scenes (AV scenes), these different AV scenes are composed of different AV objects . An AV object is a representation unit of auditory, visual, or audio-visual content, and its basic unit is an original AV object, which can be natural or synthetic sound or image. The original AV object has the characteristics of high-efficiency coding, high-efficiency storage and transmission, and interoperability, and it can further form a compound AV object. Therefore, the basic content of the MPEG-4 standard is to efficiently encode, organize, store and transmit AV objects. The introduction of AV object enables multimedia communication to have highly interactive and high-efficiency coding capabilities. AV object coding is the core coding technology of MPEG-4.

The primary task of MPEG-4 to achieve content-based interaction is to divide the video/image into different objects or separate the moving object from the background, and then use corresponding encoding methods for different objects to achieve efficient compression. Therefore, video object extraction, that is, video object segmentation, is the key technology of MPEG-4 video coding, and it is also a research hotspot and difficulty in the new generation of video coding.

MPEG-4 can not only provide high compression rate, but also achieve better multimedia content interactivity and all-round access. It adopts an open coding system, and new coding algorithm modules can be added at any time. Application requirements On-site configuration of the decoder to support a variety of multimedia applications.

Parts of MPEG-4

MPEG-4 consists of a series of sub-standards, called sections, including the following sections. For media codec, focus on Part2, Part 3, Part 10.

Part 1 (ISO/IEC 14496-1): Systems

Describe the synchronization and mixing of video and audio (Multiplexing, abbreviated as MUX). Defines the MP4 container format, supports intuitive and interactive features like DVD menus, etc.

Part 2 (ISO/IEC 14496-2): Video

Defines a codec for various visual information (including video, still textures, computer-generated graphics, etc.). For the video part, one of the most commonly used "Profiles" is Advanced Simple Profile (ASP). For example, XviD encoding belongs to MPEG-4Part 2. Including 3ivx, DivX4/Project Mayo, DivX 5, Envivio, ffmpeg/ffds, mpegable, Nero Digital, QuickTime, Sorenson, XviD and other common video formats, it should be noted that Divx 3.11, MS MPEG-4, RV9/10, VP6 ,WMV9 is not part of the standard MPEG-4 standard.

Part 3 (ISO/IEC 14496-3): Audio

Defines a collection of codecs for encoding various audio signals. Includes several variants of Advanced Audio Coding (AAC) and other audio/speech coding tools. That is, the AAC audio standard, including LCAAC, HE AAC, etc., supports 5.1-channel encoding, and can achieve better results with a lower bit rate (compared to MP3, OGG, etc.).

Part 4 (ISO/IEC 14496-4): Conformity

Defines procedures for conformance testing to other parts of this standard.

Part 5 (ISO/IEC 14496-5): Reference software

Software is provided to demonstrate the functionality and to illustrate the functionality of other parts of this standard.

Part VI (ISO/IEC 14496-6): Multimedia Transmission Integration Framework

即DMIF:Delivery Multimedia IntegrationFramework

Part 7 (ISO/IEC 14496-7): Optimized reference software

Provides examples of optimizations for implementations (here the implementations refer to Section 5).

Part 8 (ISO/IEC 14496-8): Transmission over IP networks

Defines the way to transport MPEG-4 content over IP networks.

Part 9 (ISO/IEC 14496-9): Reference hardware

A hardware design scheme is provided to demonstrate how to implement the functions of other parts of this standard on hardware.

Part Ten (ISO/IEC 14496-10): Advanced Video Coding, also known as ITU H.264, often written as H.264/AVC

Or Advanced Video Coding (Advanced Video Coding, abbreviated as AVC): A video codec (codec) is defined. Both AVC and XviD belong to MPEG-4 encoding, but since AVC belongs to MPEG-4 Part 10, it is more technically characterized than XviD belonging to MPEG-4 Part2 is more advanced. In addition, technically speaking, it is consistent with the ITU-T H.264 standard, so the full name is MPEG-4 AVC/H.264.

Part Eleven (ISO/IEC 14496-11): Scene Description and Application Engine

Interactive media available in multiple profiles including 2D and 3D versions. Revised two amendments of MPEG-4 Part 1:2001 and Part1. It defines the application engine (delivery, life cycle, format, behavior of downloadable Java byte code applications), binary scene format (BIFS: Binary Format for Scene), extensible MPEG-4 text format (a description using XML Text format of MPEG-4 multimedia content) system level representation. That is, BIFS, XMT, and MPEG-J in MPEG-4 Part21.

Part 12 (ISO/IEC 14496-12): ISO-based media file formats

Defines a file format for storing media content.

Part Thirteen (ISO/IEC 14496-13): IP

Expansion of IPMP for Intellectual Property Management and Protection.

Part 14 (ISO/IEC 14496-14): MPEG-4 file format

Defines a video file format for storing MPEG-4 content based on Part XII.

Part 15 (ISO/IEC 14496-15): AVC file format

A file format for storing the video content of the tenth part based on the twelfth part is defined.

Part 16 (ISO/IEC 14496-16): Animation framework extension

Animation framework extension (AFX: Animation Framework eXtension).

Part 17 (ISO/IEC 14496-17): Synchronized Text Subtitle Format

Not yet done - "Final Committee Draft", FCD: Final Committee Draft reached in January 2005.

Part 18 (ISO/IEC 14496-18): Font compression and streaming (for public font formats).

Part 19 (ISO/IEC 14496-19): Synthesized TextureStream.

Part 20 (ISO/IEC 14496-20): Simple scene representation

LASeR for Lightweight Scene Representation, not yet completed - reached "Final Committee Draft" in January 2005, FCD for Final Committee Draft.

Part 21 (ISO/IEC 14496-21): MPEG-J Extensions for Rendering

Not yet complete - "Committee Draft" reached in January 2005, CD for Committee Draft).

Profile and Level

MPEG-4 provides a large number of encoding methods and rich settings. Like MPEG-2, it is generally impossible for an application to support the full set of MPEG-4, and the subset is described by profile and level. These subsets indicate the requirements of the decoder through "profile". In order to avoid the complexity of calculation, each profile has one or more "levels". The effective combination of profile and level makes the code generator only need to implement the subset required in the standard, while maintaining the intercommunication with other MPEG-4 equipment. (The decoding support range is usually larger than the encoding support range), check whether other MPEG-4 devices conform to the standard, that is, conformance testing.

For H.264/AVC (that is, MPEG-4 Part 4), the following profiles are provided:

Feature support in particular profiles

Feature

CBP

BP

XP

MP

HiP

Hi10P

Hi422P

Hi444PP

B slices

No

No

Yes

Yes

Yes

Yes

Yes

Yes

SI and SP slices

No

No

Yes

No

No

No

No

No

Flexible macroblock ordering (FMO)

No

Yes

Yes

No

No

No

No

No

Arbitrary slice ordering (ASO)

No

Yes

Yes

No

No

No

No

No

Redundant slices (RS)

No

Yes

Yes

No

No

No

No

No

Data partitioning

No

No

Yes

No

No

No

No

No

Interlaced coding (PicAFF, MBAFF)

No

No

Yes

Yes

Yes

Yes

Yes

Yes

CABAC entropy coding

No

No

No

Yes

Yes

Yes

Yes

Yes

8×8 vs. 4×4 transform adaptivity

No

No

No

No

Yes

Yes

Yes

Yes

Quantization scaling matrices

No

No

No

No

Yes

Yes

Yes

Yes

Separate Cb and Cr QP control

No

No

No

No

Yes

Yes

Yes

Yes

Monochrome (4:0:0)

No

No

No

No

Yes

Yes

Yes

Yes

Chroma formats

4:2:0

4:2:0

4:2:0

4:2:0

4:2:0

4:2:0

4:2:0/4:2:2

4:2:0/4:2:2/4:4:4

Sample depths (bits)

8

8

8

8

8

8 to 10

8 to 10

8 to 14

Separate color plane coding

No

No

No

No

No

No

No

Yes

Predictive lossless coding

No

No

No

No

No

No

No

Yes

level is used to indicate the range of performance requirements of a profile decoder, such as the maximum picture allocation, frame rate, bit rate, and so on. For the decoder, a specified level requires that the code streams of this level and lower than this level can be decoded. (A decoder that conforms to given level is required to be capable of decoding all bitstreams that are encoded for that level and for all lower levels. Source: http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC)

Levels with maximum property values

Level

Max macroblocks

Max video bit rate (VCL)

Examples for high resolution @
frame rate
(max stored frames)

per second

per frame

BP, XP, MP
(kbit/s)

HiP
(kbit/s)

Hi10P
(kbit/s)

Hi422P, Hi444PP
(kbit/s)

1

1,485

99

64

80

192

256

128×[email protected] (8)
176×[email protected] (4)

1b

1,485

99

128

160

384

512

128×[email protected] (8)
176×[email protected] (4)

1.1

3,000

396

192

240

576

768

176×[email protected] (9)
320×[email protected] (3)
352×[email protected] (2)

1.2

6,000

396

384

480

1,152

1,536

320×[email protected] (7)
352×[email protected] (6)

1.3

11,880

396

768

960

2,304

3,072

320×[email protected] (7)
352×[email protected] (6)

2

11,880

396

2,000

2,500

6,000

8,000

320×[email protected] (7)
352×[email protected] (6)

2.1

19,800

792

4,000

5,000

12,000

16,000

352×[email protected] (7)
352×[email protected] (6)

2.2

20,250

1,620

4,000

5,000

12,000

16,000

352×[email protected](10)
352×[email protected] (7)
720×[email protected] (6)
720×[email protected] (5)

3

40,500

1,620

10,000

12,500

30,000

40,000

352×[email protected] (12)
352×[email protected] (10)
720×[email protected] (6)
720×[email protected] (5)

3.1

108,000

3,600

14,000

17,500

42,000

56,000

720×[email protected] (13)
720×[email protected] (11)
1280×[email protected] (5)

3.2

216,000

5,120

20,000

25,000

60,000

80,000

1,280×[email protected] (5)
1,280×1,[email protected] (4)

4

245,760

8,192

20,000

25,000

60,000

80,000

1,280×[email protected] (9)
1,920×1,[email protected] (4)
2,048×1,[email protected] (4)

4.1

245,760

8,192

50,000

62,500

150,000

200,000

1,280×[email protected] (9)
1,920×1,[email protected] (4)
2,048×1,[email protected] (4)

4.2

522,240

8,704

50,000

62,500

150,000

200,000

1,920×1,[email protected] (4)
2,048×1,[email protected] (4)

5

589,824

22,080

135,000

168,750

405,000

540,000

1,920×1,[email protected] (13)
2,048×1,[email protected] (13)
2,048×1,[email protected] (12)
2,560×1,[email protected] (5)
3,680×1,[email protected] (5)

5.1

983,040

36,864

240,000

300,000

720,000

960,000

1,920×1,[email protected] (16)
4,096×2,[email protected] (5)
4,096×2,[email protected] (5)

Codec study notes (5): Mpeg series - AAC audio
 The following information comes from wiki. AAC is defined in MPEG2 and MPEG4.

Extensions: .m4a, .m4b, .m4p, .m4v, .m4r, .3gp, .mp4, .aac
Internet media types: audio/aac, audio/aacp, audio/3gpp, audio/3gpp2, audio/mp4, audio/MP4A-LATM, audio/mpeg4-generic
formats: Lossy data compression
Extended from: MPEG-2 Audio
standards: ISO/IEC 13818-7 (MPEG-2 Part 7), ISO/IEC 14496-3 (MPEG-2 4 Part 3)

AAC (Advanced Audio Coding), known as "Advanced Audio Coding" in Chinese, appeared in 1997, based on MPEG-2 audio coding technology. Jointly developed by Fraunhofer IIS, Dolby Laboratories, AT&T, Sony (Sony) and other companies, the purpose is to replace the MP3 format. In 2000, after the emergence of the MPEG-4 standard, AAC re-integrated its features and added SBR technology and PS technology. In order to distinguish it from the traditional MPEG-2 AAC, it is also called MPEG-4 AAC.

There are three main extensions of the AAC format:

AAC - 使用MPEG-2 Audio Transport Stream( ADTS,参见MPEG-2 )容器,区别于使用MPEG-4容器的MP4/M4A格式,属于传统的AAC编码(FAAC默认的封装,但FAAC亦可输出 MPEG-4 封装的AAC)
MP4 - 使用了MPEG-4 Part 14(第14部分)的简化版即3GPP Media Release 6 Basic (3gp6,参见3GP ) 进行封装的AAC编码(Nero AAC 编码器仅能输出MPEG-4封装的AAC);
M4A - 为了区别纯音频MP4文件和包含视频的MP4文件而由苹果(Apple)公司使用的扩展名,Apple iTunes 对纯音频MP4文件采用了".M4A"命名。M4A的本质和音频MP4相同,故音频MP4文件亦可直接更改扩展名为M4A。
  作为一种高压缩比的音频压缩算法,AAC压缩比通常为18:1,也有资料说为20:1,远胜mp3; 在音质方面,由于采用多声道,和使用低复杂性的描述方式,使其比几乎所有的传统编码方式在同规格的情况下更胜一筹。不过直到2006年, 使用这一格式储存音乐的并不多,可以播放该格式的mp3播放器更是少之又少,目前所知仅有苹果iPod、Sony Walkman(NWZ-A、NWZ-S、NWZ-E、NWZ-X系列)、任天堂NDSi。魅 族M8,此外计算机上很多音乐播放软体都支持AAC(前提是安装过AAC解码器),如苹果iTunes。但在移动电话领域,AAC的支持度已很普 遍,Nokia、Sony Ericsson、Motorola等品牌均在其中高端产品中支持 AAC(一开始主要是LC-AAC,随着移动电话性能的发展,HE-AAC的支持也已广泛)。

AAC可以支持多达48个音轨,15个低频(LFE)音轨,5.1多声道支持,更高的采样率(最高可达 96kHz,音频CD为44.1kHz)和更高的采样精度(支持8bit、16bit、24bit、32bit,音频CD为 16bit)以及有多种语言的兼容能力,更高的解码效率,一般来说,AAC可以在对比MP3文件缩小30%的前题下提供更好的音质。

相对于传统的LC-AAC,High Efficiency AAC(HE-AAC或写为 AAC-HE)又称为 “aacPlus v1” 或 “AAC+” - 结合了 SBR (Spectral Band Replication) and AAC技术;适用于低比特率(64kbps以下);
HE-AAC v2,又称为 “aacPlus v2” - 结合了 Parametric Stereo(参数化立体 声,PS)和 HE-AAC 中的SBR技术。
  因为"AAC"是一个大家族,他们共分为 9 种规格,以适应不同场合的需要,也正是由于 AAC 的规格(Profile)繁多,导致普通电脑用户感觉十分困扰:

MPEG-2 AAC LC 低复杂度规格(Low Complexity)
MPEG-2 AAC Main 主规格
MPEG-2 AAC SSR 可变采样率规格(Scaleable Sample Rate)
MPEG-4 AAC LC 低复杂度规格(Low Complexity),现在的手机比较常见的 MP4 文件中的音频部份就包括了该规格音频文件
MPEG-4 AAC Main 主规格
MPEG-4 AAC SSR 可变采样率规格(Scaleable Sample Rate)
MPEG-4 AAC LTP 长时期预测规格(Long Term Predicition)
MPEG-4 AAC LD 低延迟规格(Low Delay)
MPEG-4 AAC HE 高效率规格(High Efficiency)
  上述的规格中,主规格(Main)包含了除增益控制之外的全部功能,其音质最好,而低复杂度规格(LC)则是比较简单,没有了增益控制,但提高了 编码效率,至‘SSR’对‘LC’规格大体是相同,但是多了增益的控制功能,另外,MPEG-4 AAC/LTP/LD/HE,都是用在低比特率下编码,特别是‘HE’是有 Nero ACC 编码器支持,是近来常用的一种编码器,不过通常来说,Main 规格和 LC 规格的音质相差不大,因此目前使用最多的 AAC 规格多数是‘LC’规格,因为要考虑手机目前的存储器能力未达合理水准。

编解码学习笔记(六):H.26x系列
部分资料来源与wiki以及http://www.365pr.net/tech_view.asp?id=315。

H.26x includes H.261, H.262, H.263, H.263v2 and H.264, and H.261 is basically no longer used. Among them, H.262 and H.264 have been introduced in the MPEG series, and they correspond to Part 2 of MPEG2 and Part 10 of MPEG-4 respectively. This information is not being compiled.

H.261

The rate of H.261 is an integral multiple (1 to 30 times) of 64kbps. It was originally designed for two-way audio-visual services (especially video telephony and video conferencing) on ​​ISDN (Integrated Services Digital Network).

H.261 is the earliest moving image compression standard. It only processes CIF and QCIF image formats. Each frame of image is divided into image layer, macroblock group (GOB) layer, macroblock (MB) layer, block (Block) And each part of video coding is formulated in detail, including inter-frame prediction with motion compensation, DCT (discrete cosine transform), quantization, entropy coding, and rate control adapted to fixed-rate channels. The actual encoding algorithm is similar to the MPEG algorithm, but not compatible with the latter. H.261 requires much less CPU calculation than MPEG during real-time encoding. In order to optimize bandwidth usage, this algorithm introduces a balance compromise mechanism between image quality and motion range. In other words, images with intense motion are of lower quality than relatively still images. Therefore, this method belongs to constant code stream variable quality coding.

H.261 is the first practical digital video coding standard. The design of H.261 is quite successful, and the subsequent international video coding standards are basically based on the same design framework of H.261, including MPEG-1, MPEG-2/H.262, H.263, and even H.264. Similarly, the basic mode of operation of the H.261 Development Committee (led by Sakae Okubo, whose Japanese name is Okubo Rong) was also inherited by the subsequent video coding standard development organization. H.261 uses a hybrid coding framework, including inter-frame prediction based on motion compensation, spatial transform coding based on discrete cosine transform, quantization, zig-zag scanning and entropy coding.

实际上H.261标准仅仅规定了如何进行视频的解码(后继的各个视频编码标准也继承了这种做法)。这样的话,实际上开发者在编码器的设计上拥有相当的自由来设计编码算法,只要他们的编码器产生的码流能够被所有按照H.261规范制造的解码器解码就可以了。编码器可以按照自己的需要对输入的视频进行任何预处理,解码器也有自由对输出的视频在显示之前进行任何后处理。去块效应滤波器是一个有效的后处理技术,它能明显的减轻因为使用分块运动补偿编码造成的 块效应(马赛克)–在观看低码率视频(例如网站上的视频新闻)的时候我们都会注意到这种讨厌的效应。因此,在之后的视频编码标准如H.264中就把去块 效应滤波器加为标准的一部分(即使在使用H.264 的时候,再完成解码后再增加一个标准外的去块效应滤波器也能提高主观视频质量)。

后来的视频编码标准都可以说是在H.261的基础上进行逐步改进,引入新功能得到的。现在的视频编码标准比起H.261 来在各性能方面都有了很大的提高,这使得H.261成为了过时的标准,除了在一些视频会议系统和网络视频中为了向后兼容还支持H.261,已经基本上看不到使用H.261的产品了。 但是这并不妨碍H.261成为视频编码领域一个重要的里程碑式的标准。

H.263

H.263最初设计为基于H.324的系统进行传输 (即基于公共交换电话网和其它基于电路交换的网络进行视频会议和视频电话)。后来发现H.263也可以成功的应用与H.323(基于RTP/IP网络的视 频会议系统),H.320(基于综合业务数字网的视频会议系统),RTSP(流式媒体传输系统)和SIP(基于因特网的视频会议)。

基于之前的视频编码国际标准(H.261,MPEG-1和H.262/MPEG-2),H.263的性能有了革命性的提高。它的第一版于1995年 完 成,在所有码率下都优于之前的H.261。 之后还有在1998 年增加了新的功能的第二版H.263+,或者叫H.263v2,以及在2000年完 成的第三版H.263++,即H.263v3。

H.263v2 (also commonly called H.263+ or H.263 1998) is the informal name for the second edition of the ITU-TH.263 video coding standard. It maintains all the technologies of the original version of H.263, but significantly improves the coding efficiency and provides some other capabilities by adding several appendices, such as enhancing the ability to resist data loss in the transmission channel (Robustness). The H.263+ project was completed in late 1997/early 1998 (depending on how we define "complete").

H.263v3: The next project called "H.263++" was launched immediately, adding more new functions on the basis of H.263+. H.263++ was completed at the end of 2000. Added the following appendix:

Annex A - Inverse transform accuracy specification
Annex B - Hypothetical Reference Decoder
Annex C - Considerations for Multipoint
Annex D - Unrestricted Motion Vector mode
Annex E - Syntax-based Arithmetic Coding mode
Annex F - Advanced Prediction mode
Annex G - PB-frames mode
Anne x H - Forward Error Correction for coded video signal
After H.263, the next generation video codec of ITU-T (in cooperation with MPEG) is H.264, or AVC and MPEG-4 Part 10. Since H.264 surpasses H.263 in performance by a lot, H.263 is generally considered to be an obsolete standard (although its development was not completed a long time ago). Most new videoconferencing products already support the H.264 video codec, just as they previously supported H.263 and H.261.

Having said that, H.263 still occupies a very high position in 3GPP. Subsequent revised versions, including operators' standards, have always retained H.263. As a mandatory requirement, its status is much higher than H.264. It's a strange phenomenon. An important possible reason is that the encoding of H.263 is lighter than that of H.264. The modem of the mobile phone provides the encoding and decoding capability of H.263, but does not provide the encoding and decoding capability of H.264, or only provides H.264 The decoding ability does not provide the encoding ability. If it is not that the smartphone cannot provide the H.264 encoding and decoding ability on other chips (such as the CPU) on the motherboard, developers have nothing to do. H.263 can be provided through software, and H.264 is The requirements for processing power are very high, and currently need to rely on hardware capabilities to provide. Therefore, H.263 still has a large market, especially for small-sized handheld devices, the screen resolution is limited, and high-definition is meaningless.

H.264

H.264 is equivalent to Part 10 of MPEG-4, where data is still collected for learning records.

After H.263, the ITU-T (in cooperation with MPEG) next-generation video codec is H.264, or AVC and MPEG-4 Part 10. Since H.264 surpasses H.263 in performance by a lot, H.263 is generally considered to be an obsolete standard (although its development was not completed a long time ago). Most new videoconferencing products already support the H.264 video codec, just as they previously supported H.263 and H.261.

H.264/AVC can work at multiple rates, and is widely used in multimedia streaming services on the Internet/intranet, video on demand, video games, low bit rate mobile multimedia communications (video mobile phones, etc.), interactive multimedia applications, real-time multimedia Surveillance, digital TV, studio TV and virtual video conferencing, etc., have a tendency to dominate the world in the above fields, and have very broad development and application prospects.

H.264 is a high video compression technology, also known as MPEG-4 AVC, or MPEG-4 Part10. ITU-T has divided H.26L and H.26S into two groups since 1998. H.26L develops high-compression coding technology with a long program time, and H.26S refers to the short program standard formulation department. The previous H.263 is the H.26S standardization technology, and the H.264 standard is developed on the basis of H.26L. In order not to cause misunderstanding, ITU-T recommends using H.264 as the official name of this standard. H.264 embodies the latest achievements of today's international video coding and decoding technology. Under the same reconstructed image quality, H.264 has a higher compression ratio and better IP and wireless network channel adaptability than other video compression codes.

First, H.264 has an ultra-high compression rate, which is twice that of MPEG-2 and 1.5 times that of MPEG-4. Such a high compression rate is obtained in exchange for a large amount of computation in encoding, and the computation amount of encoding processing in H.264 is more than ten times that of MPEG-2. However, the computational load of its decoding has not increased a lot. From the perspective of the rapid development of CPU frequency and memory, when MPEG-2 was launched in 1995, the mainstream CPU was Pentium 100, and the memory was even more pitiful. Today, the operating frequency of mainstream CPUs is 30 times faster than at that time, and the memory has been expanded by more than 50 times. Therefore, the large calculation of H.264 encoding is not a big problem now.

The high compression rate reduces the amount of image data, which brings convenience to storage and transmission. In addition to the open international standard of basic specifications and a fair licensing system, the three major industries of television broadcasting, home appliances and communications have all entered the actual application R&D center of H.264. Both the American Advanced Television System Conference and the Japan Radio Industry and Affairs Association are preparing to use H.264 as the encoding method for terrestrial portable digital television broadcasting. The European Digital TV Broadcasting Standardization Group is also adopting H.264 as a coding method for digital TV.

Manufacturers of video storage equipment in the home appliance industry have also taken a fancy to H.264. Toshiba and NEC’s next-generation blue laser disc HD DVD-ROM, because the capacity is smaller than the Blu-ray discs of the nine major companies such as Sony, so the video compression coding is changed to H.264, so that the final recording time of the program can be compared with Blu-ray discs are close. H.264 can also make HDTV program recording and SDTV long-term recording possible. Therefore, manufacturers of LSI chips also attach great importance to H.264. The D9 DVD disc is only 8.5GB, which is not enough to store 2 hours of HDTV programs. It becomes possible to compress it with H.264. Meanwhile, in the field of communications, the Internet Engineering Task Force has begun to standardize H.264 as a format for real-time transport protocol streams. Video transmission on the Internet and mobile phones will also use H.264 as the encoding method.

One of the changes compared to MPEG compression coding H.264 is that in the intra-frame coding I picture, the intra-frame prediction coding technology is added, that is, the difference value of the surrounding data can be used to reconstruct the picture during decoding. In the motion prediction block, after H.264 adopts comprehensive motion prediction and I-picture intra-frame prediction, the coding amount is reduced, but the operation processing amount of LSI is increased. For this reason, H.264 introduces the simplified processing technology of DCT to reduce the burden of LSI and improve the picture quality. The difference between H.264 and MPEG-2 and MPEG-4 also exists in the entropy coding block. The entropy coding CAVLC (Content Adaptive Variable Length Code) and CABAC (Content Adaptive Binary Algorithm Coding) of H.264 can improve correction wrong ability. And MPEG-2 and MPEG-4 are Huffman coding. In addition, an unlocking filter (Deblocking Filter) has been added, which has the effect of reducing noise. The integer transformation of H.264 takes 4×4 pixel blocks as the unit, which has less block noise than the original 8×8 pixel blocks, which is reduced again, and the image quality has been further improved.

The H.264 standard is divided into three grades: basic grade; main grade (can be used for SDTV, HDTV and DVD, etc.); and extended grade (for video streaming of the network). Among them, the basic level of H.264 is free, and users can use it free of charge. Now it has the support of Apple Inc., Cisco Systems Inc., China Lenovo, Nokia, On2 Technologies Inc., Siemens, TI, etc.; its licensing system requires Simpler than MPEG-4, it treats users and patent holders fairly and without discrimination. The call for H.264 to replace MPEG-4 is very high. In addition to its high performance, low patent fees and a fair non-discriminatory licensing system are also crucial. Due to the increasingly mature technology, semiconductor manufacturers are already developing H.264 encoding/decoding LSI. Especially in equipment such as HDD video recorder and DVD video recorder, there are many examples of adopting H.264, which has aroused the concern of semiconductor manufacturers. In addition, the animation encoding method and audio encoding method adopted by H.264 have diversified characteristics, and will be one of the main specifications of almost all manufacturers in the future.

Coding efficiency comparison

Codec

MPEG-4

H.263

MPEG-2

H.264

39%

49%

64%

MPEG-4

17%

43%

H.263

31%

Codec Study Notes (7): Microsoft Windows Media Series
Information comes from wiki and http://chaoqunz.blog.163.com/blog/static/6154877720084493941186/.

Microsoft 公司主导的音频视频编码系列,它的出现主要是为了进行网络视频传输,现在已经向HDTV 方面进军,开发了 WMV HD 应用。WMV(Windows Media Video)是微软公司开发的一组数字视频编 解码格式的通称,它是Windows Media架构下的一部分。它最初是为低速率流媒体应用作为专有编解码开发出来的,但是2003年微软公司基于Windows Media Video第9版编解码起草了视频编解码规范并且提交给SMPTE申请作为标准。这个标准在2006年3月作为SMPTE 421M被正式批准,这样Windows Media Video 9编解码就不再是一个专有的技术。早期的变解码版本(7和8)仍然被认为是专有技术,因为它们不在SMPTE 421M标准的涵盖范围内。

微软媒体系列分为WMV(Windows Media Video)和WMA(Windows Media Audio),说白了就是微软的视频和音频。

容器

视频流通常与Windows Media Audio音频流组合在一起并且使用扩展名为.wmv或者.asf的Advanced Streaming Format的文件进行封装。WMV通常使用Advanced StreamingFormat(ASF) 封装,它也可以使用AVI或者Matroska格 式封装。如果是AVI封装的文件结果文件可以是.avi,如果是ASF封装的话则是.wmv或者.asf, 如果是MKV封装的话则是.mkv。当使用VirtualDub编 码器编码和WMV9 VCM编解码实现的时候WMV可以存储在AVI文件中。用于Mac的微软公司媒体播放器不支持所有的WMV编码的文 件,因为它只支持ASF文件 封装,Flip4Mac和QuickTime或 者用于MacOSX的MPlayer可 以播放更多的文件。

WMV

扩展名: .wmv
互联网媒体类型: video/x-ms-wmv
统一类型标识: com.microsoft.windows-?media-wmv
开发者: 微软公司
格式: 数字视频

WMV(Windows Media Video)是微软公司开发的一组数字视频编解码格式的通称,它是Windows Media架构下的一部分。它最初是为低速率流媒体应用作为专有编解码开发出来的,但是2003年微软公司基于Windows Media Video第9版编解码起草了视频编解码规范并且提交给SMPTE申请作为标准。这个标准在2006年3月作为SMPTE 421M被正式批准,这样Windows Media Video 9编解码就不再是一个专有的技术。早期的变解码版本(7和8)仍然被认为是专有技术,因为它们不在SMPTE 421M标准的涵盖范围内。

WMV不是仅仅基于微软公司的自有技术开发的。从第七版(WMV1)开始,微软公司开始使用它自己非标准MPEG- 4 Part 2。但是,由于WMV第九版已经是SMPTE的一个独立标准(421M,也称为VC- 1),有理由相信WMV的发展已经不像之前那样是一个它自己专有的编解码技术。现在VC-1专利共享的企业有16家(2006年4月),微软公司也是 MPEG-4 AVC/H.264专利共享企业中的一家。

正式名称

FourCC

Codec版本

描述

Windows Media Video v7

WMV1

0

Microsoft MPEG-4Video Codec v3

MP43

1

Windows Media Video v8

WMV2

2

Microsoft MPEG-4Video Codec v2

MP42

3

Microsoft ISO MPEG-4Video Codec v1

MP4S

4

Windows Media Video v9

WMV3

5

Windows Media Video v9 Advanced Profile

WMVA

6

deprecated as not VC-1 is not fully compatible.

Windows Media Video v9 Advanced Profile

WVC1

7

VC-1 full support

The full name of FourCC is Four-Character Codes, which is composed of 4 characters (4 bytes). It is a four-byte independently marked video data stream format. There will be a section of FourCC in wav and avi files to describe this AVI file. What kind of codec is used to encode. Therefore, there are many FourCCs equal to "IDP3" in wav and avi.

Microsoft MPEG-4 v1/v2/v3

There are three common versions, 1.0, 2.0, and 3.0, which are based on MPEG-4 technology. Among them, 3.0 cannot be used for AVI encoding, but can only be used to generate ASF files that support "video streaming" technology.

Microsoft MPEG-4 version 1
is Microsoft's basic video encoding, which is a non-standard MPEG-4 and is not compatible with MPEG-4 Part2. FourCC:MPG4


The base codec for Microsoft MPEG-4 version 2 VFW. Incompatible with V and MPEG-4Part2. VFW (Video for Windows) is a software development kit for digital video launched by Microsoft. The core of VFW is the AVI file standard. The audio and video data frames in the AVI (AudioVideo Interleave) file are interleaved. Around AVI files, VFW has launched a complete set of application programming interfaces (APIs) for video capture, compression, decompression, playback and editing. Since the AVI file format was launched earlier and is widely used in digital video technology, VFW still has great practical value and has a further development trend. Calling VFW in the VC++ development environment is no different from using other development kits, except that the VFW32.lib file needs to be added to the project, but other software and hardware settings are required when opening the video capture and compression management program. VFW provides rich processing functions and macro definitions for AVI files. The characteristic of AVI files is that it is a typical data stream file, which consists of video stream, audio stream, and text stream. Therefore, the processing of AVI files is mainly to deal with file streams. FourCC: MP42


The base codec for Microsoft MPEG-4 version 3 VFW. Incompatible with V and MPEG-4Part2. Ultimately only used for ASF files. FourCC: MP43

In addition:
Microsoft ISO MPEG-4 version 1
is based on DirectX Media Objects (DMO)-based codec, compatible with MPEG-4 SP (Simple Profile). FourCC: MP4S.

Microsoft ISO MPEG-4 version 1.1
is compatible with MPEG-4 ASP (Advanced Simple Profile). FourCC: M4S2

At present, the more practical MPEG-4 video codecs on the Windows platform mainly include: Microsoft MPEG-4 Codec v1/v2/v3 developed by Microsoft, which is mainly used in conjunction with Microsoft's streaming media technology; in Microsoft MPEG-4 On the basis of v3, DivX Codec developed by DivXNetworks; on the basis of OpenDivX, follow the open source XviD Codec developed by GPL.
These codecs are presented as dlls in windows.

Windows Media Video 7

It is DirectXMedia Objects (DMO)-based codec. The first Windows Media Video officially developed by Microsoft began to break away from MPEG-4 and was not compatible with MPEG-4. From this point, we can see Microsoft's ambition (Microsoft began to use its own non-standard MPEG-4Part 2). It's a pity that the compression effect of this version is very bad, which broke Microsoft's dream of soaring into the sky, but it is very fast in compression speed, and now there are many WMVs compressed in this format on the Internet. FourCC: WMV1

Windows Media Video 8

Based on the improved version of WMV7, the quality has not improved a lot. It is a DMO-based codec. FourCC: WMV2.

Windows Media Video 9

The highlight of Microsoft is not only this encoding, but also the V9 series is a platform, which allows Microsoft to have enough ability to challenge standardization organizations such as MPEG and ITU. Although this version is not as loud as Microsoft's, especially at low bit rates, it is still a lot of progress compared with the previous version. In particular, the application of WMV HD has allowed Microsoft to enter the field of video standards.
DMO-based codec. Video for Windows (VfW/VCM) version also available. FourCC: WMV3

Windows Media Video 9 Advanced Profile

The Simple and Main profile levels in WMV4 are the same as those in the VC-1 standard. The Advanced Profile in VC-1 uses a new WMV codec called WindowsMedia Video 9 Advanced Profile. It improves the content compression of interlaced scanning, and has nothing to do with the transmission, so that he can be encapsulated in the transport stream (TS) of MPEG or adopt RTP transmission. It is not compatible with the previous WMV9 codec.

With the encoder introduced with Windows Media Player 10, it is possible to further control the quality of WMV9. But it cannot be played on the old version of WMP9, that is, it is not compatible with the old version of WMP9. I really don't know what Microsoft is doing?

FourCC: WVC1, VC-1 compatible. FourCC: WMVA, non-VC-1 compatible methods are not recommended. We can think that WMV9 is compatible with VC-1.

Windows Media Video 9 Screen

Static screen lossless compression encoding, very good quality, high compression rate, only for environments with very small changes such as screens. WMV Screen is a screencast codec that can capture dynamic screen content and convert third-party screen capture programs into WMV9 Screen files. One of them is the step-by*step demonstration video of the computer, the first version is WMV7 Screen, the second version is the current version is WMV9 Screen, supports CBR and VBR.

Windows Media Video 9 Image

Still image compression encoding. WMV Image is a video slideshow encoder, which can perform panning and transition effects according to time when playing multiple images. Compared with WMV9, it has high compression rate and high image quality. Since the codec relies on the decoder (player) to generate video frames, playing WMV Image files (even at ordinary 1024×768, 30fps) requires high processing power. In the latest version, WMV9.1 Image, use Photo Story3 to improve the conversion effect, because the original WMV9 Image is not compatible.

video quality

Microsoft claims that WMV9 provides twice the compression rate of MPEG-4 and three times that of MPEG-2. Microsoft also claims that the compression efficiency of WMV9 is 15% to 50% higher than that of WMV8. However, in a test report in 2005, it was shown that the compression efficiency of WMV9 was worse than that of WMV8.

Windows Media Player 10 Mobile

On the wiki, we notice "Windows Media Player 10 Mobile", showing that WMV10 will be used for mobile, probably Windows Mobile. But we did not find any further information.

WMA

extension .wma
internet media type audio/x-ms-wma
uniform type identifier com.microsoft.windows-?media-wma
developer microsoft
format digital audio

WMA (Windows Media Audio) is a digital audio compression format developed by Microsoft Corporation. Some audio-only ASF files that encode all of their content using the Windows Media Audio encoding format also use WMA as an extension. The WMA format is proprietary to Microsoft, but with Apple's iTunes supporting it, this format is becoming a competitor to the MP3 format. It is compatible with MP3's ID3 metadata tags, while supporting additional tags.

WMA can be used in encoded files in various formats. Applications can use Windows MediaFormat SDK to encode and decode WMA format. Some common applications that support WMA include Windows Media Player, Windows Media Encoder, RealPlayer, Winamp, and many more. Some other platforms, such as Linux and hardware and software in mobile devices also support this format.

WMA after WMA 7 supports certificate encryption. Without permission (that is, without a license certificate), even if it is illegally copied to the local area, it cannot be listened to. At the same time, what Microsoft announced at the beginning: the same file size is twice as small as MP3 but the sound quality remains the same, which has also been fulfilled. In addition, Microsoft has greatly improved its engine in WMA 9. In fact, it can almost reduce the volume of MP3 by about 1/3 under the same file and sound quality, so it is very suitable for network streaming media.

Compared with MP3, WMA's high-level sound quality rendering ability is obviously insufficient, and even worse than MP3; like MP3, the usual WMA is also a file format with lossy data compression. For users with higher requirements, WMA is not a suitable format. But in the WMA9 version began to support lossless compression - Windows Media Audio 9 Lossless (upgrade to 9.1 after installing WMP11 or Windows Media Format 11, the lossless compression version supports up to 5.1 channel encoding). In addition, WMA is also a file format with patent and copyright just like MP3. Supported devices require the purchase of a license to use.

Windows Media Audio v1/v2

Microsoft's earliest audio coding technology, used in ASF, was later cracked and used in DivX Audio, the quality is relatively poor.

Windows Media Audio 7/8/9

With the introduction of various audio codecs for different WMVs, the quality has improved steadily, but it has not yet reached the apotheosis of 64kbps CD sound quality.

Windows Media Audio 9 Professional

The new encoding that appears in WMA9 is mainly used for multi-channel encoding and encoding of high-sampling rate audio, and the quality is good.

Windows Media Audio 9 Voice

For speech encoding, the maximum is 20kbps, but compared with AMR, the effect is too poor.

Windows Media Audio 9 Lossless

Lossless audio coding can perfectly preserve the original quality of CD, which is a good choice for CD backup, but the price is too large.

VC-1

VC-1,全名VC-1视讯编解码器(Video Codec 1),是基于微软WMV9,并推广为工业标准。2003年提出标准化申请,最早名字是VC-9。2006年4月正式通过成为标准。VC-1是SMPTE 421M视频编解码标准的正式名称。HD DVD 和蓝光光碟(Blu-ray Disc)都支援VC-1。微软表示Windows Vista将支援VC-1影像压缩技术的HD DVD规格。电影及电视学会(SMPTE)已采用VC-1为视讯压缩标准。

VC-1是基于Windows Media Video 9压缩技术的影像压缩标准,由三大编解码元件所组成,每一个编解码元件都具有其独自的FourCC编码。

WMV3 :

即WMV9。VC-1的Simple和Main这两种Profile就是WMV3应用,使得与WMV 9兼容,支持逐行扫描编解码。隔行扫描的编解码也提供,但在很快地,在微软推出WMVAdvanced profile后,不推荐采用。逐行扫描编解码用于YUV4:2:0,隔行的(不推荐)用于YUV4:1:1。

WMV3用于高质量的视频和流媒体。同样的质量,它只是MPEG-2的带宽的1/2~1/3。在商业上用于高清电影和视频的WMV HD,编码为WMV3 Main Profile @ High Level(MP@HL)。

WMVA :

它是在WMV Advanced Profile被SMPTE吸收为作为VC-1草案之间出现的。它与WVC1之间细微的差役,因此解码器也不一样,2006年起,WMVA被认为是个过时的编码,因为与VC-1不完全兼容。

WVC1 :

也就是WMV 9 Advanced Profile,实现了个更新的,完全符合的AdvancedProfile的VC-1编码标准。它支持隔行扫描内容,与底层传输无关。

The compression technology integrates the advantages of MPEG and H.264, adopts Biliner and Bicubic methods, and the minimum sub-pixel (Sub-Pixel) can reach 1/4 of a pixel. VC-1 has only 4 kinds of motion compensation (motion composition), and the compression ratio cannot surpass H.264. The compression time of VC-1 is significantly shorter than that of H.264, and the complexity is only about 50% of that of H.264. It has outstanding performance for special effects movies. Since H.264 uses a smaller-sized conversion formula and an unadjustable quantization matrix, the high-frequency details of the image cannot be fully preserved.

There is an article comparing VC-1 and H.264 on the wiki, which is worth reading. I see a segment like this:

VC-1: Fees apply. Reference decoder is not free, but comes with external files

H.264: Free. Reference encoder and decoder are also free. In addition, the Verification Team and the M4IF mailing list are available where one may receive answers to AVC-related questions.

In addition, we also saw the words free in the search H.264 license on Google. But is it so?

Copyright issue

I have always thought that H.264 needs to be paid, so I am not sure whether the above statement is correct. I checked online: I also saw the words "The basic system of H.264 does not need to use copyright, has an open nature, and can well adapt to the use of IP and wireless networks." Based on these questions, I searched online. IP is always a troublesome issue. It is best to provide a platform, copyright, and product maintenance like Andriod. It is the responsibility of mobile phone manufacturers.

MPEG LA is the world's leading alternative technology licensing service provider, which enables users to purchase the global patent rights necessary for a certain technical standard or platform from multiple patent holders through a single transaction, instead of negotiating each item separately. Concession. As long as an independently managed one-stop patent license can open the door of convenience and help users promote a certain technology, the licensing model pioneered by MPEG LA can provide a solution. One of MPEG LA's franchises is MPEG-2 digital video compression, a technology that helped produce the most widely used standard in the history of consumer electronics. The MPEG-2 Patent Portfolio Franchise, which includes more than 870 MPEG-2 essential patents in 57 countries, has at least 1,500 licensees, covering most of the MPEG-2 products currently on the global market, including set-top boxes, DVD players, digital TVs players, personal computers, and DVD-Video discs. As an independent franchise manager, MPEG LA is not affiliated with any standards governing body, nor is it affiliated with any patent holder. For more information, please visit http://www.mpegla.com. (http://www.dvbcn.com/2010-01/28-44547.html)

I went to MEPG-LA to check online and found that there is AVC/H.264, which means that this needs to be paid. There is a PPT below in the data. :

I don't understand it very well. For example, for an H.264 movie, is the content provider who provides the downloaded movie paid for, or the terminal equipment manufacturer who provides the decoder needs to pay? Another example is using H.264 for video calls, no charge for less than 12 minutes, and charge for more than 12 minutes? Chaos. Therefore, it is better to leave a professional legal person in charge of intellectual property matters.

Codec Study Notes (8): Real Series
The following information is obtained from the wiki.

The Real series is provided by RealNetworks and is divided into RealVideo and RealAudio.

RealVideo

RealVideo is a video format developed by RealNetworks in 1997, and it reached RealVideo version 10 by 2006. It has been positioned as an application format for video playback on the web from the beginning of its development. Multiple playback platforms are supported, including Windows, Mac, Linux, Solaris, and some mobile phones. Compared with other video codecs, RealVideo can usually compress video data smaller. Therefore, it can realize uninterrupted video playback under the condition of dial-up Internet access with 56Kbps MODEM.

The general file extension is .rm/.rvm, and the rmvb format is widely popular now, that is, real video with dynamic encoding rate.

RealVideo used H.263 in the early days, and after RealVideo8 and later companies adopted private or undisclosed video formats. The official player is RealNetworks RealPlayer SP, the latest version is v12, available on multiple platforms, including Windows, Macintosh, and Linux. RealNetworks also developed the open source Helix player, but RealVideo is not provided in the Helix project, because the codec of the Real series is still not public.

RealMedia files can be transmitted on the network through RTSP, but RTSP is only used to establish and manage connections, and real video data is transmitted through Real's own proprietary RDT (Real Data Transport) protocol. This method has caused great criticism, because it is difficult to use RealVideo in other players and servers, and now some open source projects, such as MPlayer, can already play RDT streams. In order to support real-time streaming, RealVideo and RealAudio usually adopt CBR (Constant Bit Rate) encoding, so that the data delivered per second is equal. Later, the company developed a variable bit rate, which became RealMedia Variable Bitrate (RMVB) to provide higher level data, but this format is not suitable as a stream, because it is difficult to predict how much network resources a specific media stream will require. Videos with fast movement and scene changes require a higher bit rate, and if the bit rate exceeds the rate that the network can provide, it will cause interruption.

RealNetworks says the source code for the RealVideo and RealAudio codecs is not licensed under RPSL. Licensed in commercial ports of licensed source code to unsupported processors and operating systems. While the company owns most of the intellectual property, it allows third parties to own copyrights for certain features.

RealVideo 1.0

The first version of RealVideo was released in 1997 and was based on the H.263 format. Available in RealPayer5. FourCC for rv10, rv13

RealVideo G2 and RealVideoG2+SVT

It also requires H.263, which is provided in RealPlayer6. The quality is relatively poor. FourCC for rv20

RealVideo 8

The video format launched with RealPlayer 8 is one of the mainstream network video encodings. The encoding speed is slow and the quality is only average. Guess is based on an early H.264 draft, available in RealPlayer 8, FourCC as rv30.

RealVideo 9

The new generation of encoding developed by RealNetworks has improved a lot in quality, especially at the low bit rate, and the encoding speed is very fast, achieving a good unity of speed and quality.
Guess is based on H.264, available in RealPlayer9. FourCC is rv40.

RealVideo 10

On the basis of RealVideo 9, some parameters are added, such as EHQ, etc., to control the bit rate more precisely, and it is compatible with RealVideo 9. Available in RealPlayer10, FourCC is rv40 (same as RealVideo9)

RealAudio

File name extension: .ra, .ram

Internet media type:audio/vnd.rn-realaudio,audio/x-pn-realaudio

RealAUdio is a proprietary guest-friend codec format of RealNetwoks, which was first released in 1995. It includes a range of audio codecs, from the low-speed formats of ancient dial-up modems to high-quality music. It can be used for media streaming. In the past, many Internet radio stations used RealAudio as the real-time audio stream of their programs. In recent years, it has been used less and given way to other popular formats.

The suffix of RealAudio files is .ra. In 1997, the company began offering video in a format known as RealVideo. Merging audio and video is the suffix of the container with .rm. The latest version, however, uses .ra for audio files, .rv for video files with or without audio, and .rmvb for variable-rate video files.

The .ram (Real Audio Metadata) and .smil (Synchronized Multimedia Integration Language) file formats are used for links in streaming media. In many cases, the web does not link directly to a RealAudio file, but to .ram and .smil files. It's a tiny text file that includes a link to the audio stream. When the user clicks a link, the web browser downloads the .ram and .smil files and loads the user's media player, which reads the pnm or rtsp URL from the file and starts playing the stream.

RealAudio files include multiple audio codecs, each codec is represented by FourCC (Four Character Code), as follows:

lpcJ: IS-54 VSELP (RealAudio 1)
28_8: G.728 LD-CELP (RealAudio 2)
dnet: Dolby AC3 (RealAudio 3)
sipr: Sipro Lab Telecom ACELP-NET (RealAudio 4/5)
cook: G2/Cook Codec (RealAudio 6)
atrc: Sony ATRAC3 (RealAudio 8)
raac: MPEG-4 LC-AAC (RealAudio 9)
racp: MPEG-4 HE-AAC (RealAudio 10) ralf: RealAudio Lossless Format
(RealAudio 10)
Nine): QuickTime series
extension .mov, .qt
Internet media type video/quicktime
type code MooV
unified type identification com.apple.quicktime-movie
developer Apple Inc
format media container
dedicated to audio, video, text

Or the title could be changed to Apple Series. QuickTime is not an encoding, but a multimedia platform. There are many encodings on it, and here are only a few mainstream encoders.
  QuickTime technology has three main components:

The media player, Apple Computer, is available as a free download from its own website and built into its computer.
QuickTime file format - public files and available to anyone without royalty.
Software development tools are available for Macintosh and Windows platforms. These tools allow people to develop their own software to manipulate QuickTime and other media files. These are free for registered developers (registration is free).
  Apple releases free official media player software for Mac OS and Windows under the name "QuickTime Player" (earlier versions simply used the name "MoviePlayer"). The player also includes some media editing and media creation features, but users must purchase a serial key from Apple to enable these features, turning the player into "QuickTimePro".

QuickTime History: 1991 to 1998: 1.x- 2.x

Apple Computer released the first version of QuickTime on December 2, 1991, as a multimedia add-on on System 7. QuickTime's lead developer, Bruce Leak, made its first public presentation at the Worldwide Developers Conference in May 1991. His showing of the Mac in Apple Computer's famous 1984 TV commercial was an impressive breakthrough for the time. Microsoft's competing technology - Video for Windows - did not exist until November 1992.

The basic architecture formulated by the first version of QuickTime has basically remained unchanged until now, including multiple movie tracks, scalable media form support, an open file format, and complete editing functions. The original video codec contains:

Apple video codec (also known as "Road Pizza"), suitable for general live action images.
The animation codec uses a simple run-length graphics compression method, which is suitable for the large area color of the cartoon shape.
Graphics codec, optimized for 8-bit-per-pixel images, including graphics with dithering en:dithering.
  Apple Computer released version 1.5 of the Mac operating system in late 1992.

Apple Computer released QuickTime 2.0 for Mac OS in February 1994 – the only version that wasn't free. In this version, support for music tracks is added. Music tracks are equivalent to MIDI data. This function can drive the sound-synthesis engine to create itself in QuickTime (the sound license used is from Roland), or any external MIDI device. , so the created sound occupies only a small fraction of the movie data.

In the following versions 2.1 and 2.5, QuickTime continued to be free. Engineers have improved support for music and added sprite tracks, which enable the creation of complex animations with file sizes only slightly larger than static images.

QuickTime 2.0 for Windows was released in November 1994.

QuickTime History: 1998 - 2001: 3.0 and 4.0

QuickTime 3.0 for Mac OS was released on March 30, 1998. Its existing functions are free, but if you want to get Apple's QuickTime Player and Picture Viewer programs with more features, end users need to purchase a QuickTime Pro license to remove the restrictions on the software.

QuickTime 3.0 adds components that support image import, allowing images to be read from GIF, JPEG, TIFF, and other file formats. The video output component, which is mainly used as video data output through FireWire, also adds visual effects, allowing programmers to apply real-time technology to video tracks. Some effects can even respond to the user's mouse clicks, much like the interactive support of the movie itself.

Apple released QuickTime 4.0 for Mac OS on June 10, 1999. It adds an image export component that supports exporting to non-GIF formats in the same format that the pre-importer can read (perhaps because of the LZW license). It adds the first version of the Sorenson video codec and supports streaming.

QuickTime 4.1 was released at the beginning of 2000, which added the ability to play movies over 2G in Mac OS 9 and subsequent versions; and terminated support for 68K Macs. Users gain the ability to operate QuickTime Player via AppleScript.

QuickTime History: 2001 - present 5.0 and later
  QuickTime 5.0 for Mac OS appeared on April 23, 2001. It adds "skinning" functionality and multiprocessing image compression support. Only users with QuickTime Pro licenses can use full-screen mode in this version, which has caused controversy and has not been resolved to this day.

QuickTime History: QuickTime 6.x
  QuickTime 6.0 for Mac OS, released on July 15, 2002, first included the version used by Mac OS X.

Update to QuickTime 6

release date

Version

platform

feature

July 15, 2002

QuickTime 6

Mac OS 8.6 – Mac OS X, Windows

MPEG-2, MPEG-4 and AAC

January 14, 2003

QuickTime6.1

Mac OS X

Improvement of quality and performance

March 31, 2003

QuickTime 6.1

Windows

Fixed CAN-2003-0168 security vulnerability

April 29, 2003

QuickTime 6.2

Mac OS X

Support for iTunes 4, enhanced AAC support

June 3, 2003

QuickTime 6.3

Mac OS X, Windows

3GPP and AMR

October 16, 2003

QuickTime 6.4

Mac OS X, Windows

Pixlet codec, integrated 3GPP

December 18, 2003

QuickTime 6.5

Mac OS X, Windows

3GPP2 and AMC Mobile Multimedia Formats

April 28, 2004

QuickTime 6.5.1

Mac OS X, Windows

Apple Lossless

October 27, 2004

QuickTime 6.5.2

Mac OS X, Windows (last version for Windows 98/Me)

Bug fixes, security updates, and quality and performance improvements

October 12, 2005

QuickTime 6.5.3

Mac OS X v10.2.8

QuickTime History QuickTime 7.x

Updated to QuickTime 7.

Update to QuickTime 7

release date

Version

platform

feature

May 31, 2005

QuickTime 7.0.1

Mac OS X

Fix a security issue about Quartz Composer plugin module

July 15, 2005

QuickTime 7.0.2

Mac OS X

Bug fixes and compatibility improvements

September 7, 2005

QuickTime 7.0.2

Windows 2000 /XP

First non-preview release

October 12, 2005

QuickTime 7.0.3

Mac OS X & Windows 2000 /XP

Streaming and H.264 bug fixes.
Requires purchase of supported videos through the iTunes Music Store.

October 29, 2005

QuickTime 7.0.3.50

Windows 2000 /XP

January 10, 2006

QuickTime 7.0.4

Mac OS X & Windows 2000 /XP

The first universal binary release.
Numerous bug fixes and H.264 performance improvements.

May 11, 2006

QuickTime 7.1

Mac OS X & Windows 2000 /XP

Numerous bug fixes, iLife '06 support, and H.264 performance improvements

May 31, 2006

QuickTime 7.1.1

Mac OS X

June 28, 2006

QuickTime 7.1.2

Mac OS X

Addresses an issue previewing iDVD projects.

September 12, 2006

QuickTime 7.1.3

Mac OS X & Windows 2000 /XP

Bug fixes and serious security issues.

At present, the latest version of QuickTime is QuickTime 7.6, but the highest version that Windows 2000 can use is 7.1.6. In versions before 7.5.5, there is a security problem of Cross site scriping.
History of QuickTime: QuickTime X

QuickTime X (pronounced Quicktime Ten, where "X" is ten in Roman numerals) is the next generation of QuickTime, announced at WWDC on June 9, 2008. The product is expected to ship with Mac OS X v10.6 in mid-2009 [3]. Version X will use the same media technology as the iPhone OS, and support newer codecs and more efficient media playback.
Sorenson Video

Sorenson Video 2:

The encoder developed by Sorenson Media is mainly used for QuickTime 4 video encoding, and the quality is poor.

Sorenson Video 3:

The codec released by Sorenson Media Company with QuickTime 5 has very good quality and has become the standard video codec for QuickTime. Most movie trailers on the Internet use this codec.

Apple MPEG-4

Apple's own MPEG-4 encoder, released with QuickTime 6, is of poor quality.

Apple H.264

Apple's own H.264 encoder, released with QuickTime 7, supports HDTV.

Audio QDesign Music

QDesign Music 1

The audio encoder developed by QDesign, this version is no longer available.

QDesign Music 2

The second and last version of QDesign Music, it has no vitality in the face of these advanced audio codecs nowadays, and it is mainly used in movie trailers on the Internet.

Audio Apple MPEG-4 AAC

The AAC encoder developed by Apple itself is of very good quality and is one of the best AAC encoders released with QuickTime 6.

Apple Lossless

Lossless audio coding developed by Apple, mainly used in iTunes to rip CDs. Apple Lossless (AppleLossless Audio Codec, ALAC) is Apple's lossless audio compression encoding format. It is called Apple Lossless on iTunes.

It can compress non-compressed audio formats (WAV, AIFF) to about 40% to 60% of the original capacity, and the encoding and decoding speed is very fast. Also because it is lossless compression, it sounds exactly the same as the original file and will not be changed by decompression and compression.

It was released on April 28, 2004 as part of iTunes 4.5 and QuickTime 6.5.1. Currently only the iPod can play among the portable digital multimedia players.

Although not free software or open source software, an open source codec for Apple Lossless has been released.

Note: The above materials come from the collation of the wiki.

Codec Study Notes (10): Ogg Series
Ogg is a free and open standard container format maintained by the Xiph.Org Foundation. The Ogg format is not restricted by software patents and is designed for efficient streaming and processing of high-quality digital multimedia.

Ogg refers to a file format that can incorporate a variety of free and open source codecs, including audio, video, text (like subtitles) and metadata processing.

oggTheora

Theora is a royalty-free, open-format lossy image compression technology developed by the Xiph.Org Foundation, which also developed the well-known audio coding technology Vorbis, and the multimedia container file Ogg. Theora is derived from On2Technologies' proprietary VP3 encoder through open source. Theora is named after a TV show called Max Headroom.

Theora is a variable bit rate, DCT-based image compression format. Like most image coding formats, Theora uses chroma sampling, block basedmotion compensation, and 8×8 DCT block. It also supports video compression image types and video compression image types, but does not support the use of H.264 and VC-1 bi-predictive frames (B-frame), Theora also does not support interlacing, variable frame rates, or bit-depths larger than 8 bits per component.

Theora's image stream can be stored in any container file format, and the most commonly used one is stored in the Ogg file format together with the audio code Vorbis, which can provide a completely open, royalty-free multimedia file. In addition Theora images can also be stored in Matroska archives.

According to Google's official blog, there is currently no standard for Web video. Some websites use Flash, but this requires users to have a Flash player; some use Java players, but users need a highly configured machines; and so forth.

The good news is that the next-generation HTML 5 standard introduces the video element, and web developers can specify the appearance of the video in a standard way. Now the question becomes which video format to use.

Google believes that an open standard format can be the bottom line in the current unorganized video format battle. The final benefit scheme doesn't require the most complex format, or the most fanfare, and it's almost an industry standard format, so they decided to choose to support the widely used open source Ogg Theora format.

Theora is an open-source derivative of On2 Technologies' VP3 encoder, which Google acquired last year.

Source: http://tech.it168.com/a2010/0412/872/000000872493.shtml

In March 2002, On2 changed the license to LGPL. In June 2002, On2 released VP3 as a BSD-like open source license under the Xiph.Org umbrella. On2 has also formulated an unalterable copyright-free statement that anyone can use any software, any derivative products, and any purpose. In March 2002, On2 signed an agreement with Xiph.Org to introduce VP3 as a new, free video codec, which became Theora. On2 claims Theora is a successor to VP3. On October 3, 2002, On2 and Xiph announced the release of Theora's earliest Alpha code.

The bitstream format was frozen in 2004 (version 1.0 alpha3), and after a few years of beta versions, Theora's first stable version (v1.0) was released in November 2008. Any version of Theora video encoding is compatible with future players after the format freeze. Current work is focused on bugfixes for the "Thusnelda" branch, currently in beta, which will eventually be released as Theora 1.1.

The Theora video compression format is basically compatible with the VP3 video compression format, including a backward compatible superset. Theora is a superset of VP3 and VP3 streams (with minor syntax revisions). VP3 streams can be changed to Theora streams without recompression, but vice versa. VP3 video compression can be decoded by Theora, but Theora video demos usually cannot be decoded using ancient VP3.

Theora bases the video format on an open-source foundation and serves as the encoding format of choice for kipedia video content. However, Theora lacks commercial support and is struggling to gain acceptance from resellers, especially network resellers.

Mozilla uses this technology to serve HTML5 video on Firefox. Both Apple and Microsoft are preparing to adopt H.264 managed by MPEG LA for HTML5 video. Members of the group include Microsoft and Apple, among many technology companies.

The key to the controversy lies in the issue of the license. H.264 requires a license.

Mozilla issued the following statement: "We believe that HTML5 video is in the public interest only if it is supported by a multi-party, open and royalty-free codec, in the same way as a W3C licensed standard. If the MPGA LA is willing to define it in accordance with the W3C standard H.264 is available in the open web of conditions, and we will definitely consider adopting this technology. The organization stands by our position on Theora.”

Hakon Wium Lie, CTO of Opera, also provided the following statement; "For the open web to thrive, all media (including video) must be available without paying codec license fees. A browser that truly supports an open web Manufacturers, must strive to build a basic AV codec without licensing fees."

Microsoft's corporate blog wrote: "The difference between the availability of source code and intellectual property rights is that the availability of source code is extremely necessary. At present, the intellectual property rights of H.264 can be managed through a definition of MPEG LA Clear scheme acquisition. Rights for other codecs are often less clear."

Ogg Vorbis

Ogg's audio encoding has excellent quality, especially at low bit rates, and supports multi-channel. The highest bit rate can reach 500kbps, which is a strong competitor of AAC.

The term "Ogg" usually refers to the audio file format of Ogg Vorbis, that is, the format in which Vorbis-encoded sound effects are contained in an Ogg container. In the past, the extension .ogg was used for content in any Ogg-supported format, but in 2007, the Xiph.Org Foundation made a request to leave .ogg only for the Vorbis format for backward compatibility considerations. use. The Xiph.Org Foundation decided to create some new extensions and media formats to describe different types of content, like .oga for sound effects only, .ogv for movies with or without sound (covering Theora) and programs .ogx.

Vorbis is an open source free software project directed by the Xiph.Org Foundation. This project produces a digital audio format specification and software implementation (codec) for lossy audio compression. Vorbi is most commonly used in conjunction with the Ogg container format and is therefore often referred to as the Ogg Vorbis format.

Vorbis is a continuation of an audio compression development started by Chris Montgomery in 1993. Intensive development began in September 1998 after a letter to the Fraunhofer Association announced that the company would charge a license fee for the MP3 audio format. The Vorbis project started as part of the company, Xiph.Org Foundation's Ogg project (also known as the OggSquish multimedia project). Chris Montgomery started working on the project and assisted a growing number of other developers. They continued to refine the source code until the Vorbis file format was frozen at 1.0 in May 2000 and a stable version (1.0) of the reference software was released on 19 July 2002.

OggSpeex

Ogg's voice coding, specially for low bit rate voice coding.

Ogg FLAC

Ogg's lossless audio encoding.

On2 VPX Series

On2 has developed a series of excellent video codecs, and the most widely used ones are NullsoftVideo's videos, which use VP3, VP5, and VP6 video codecs.

VP3

Already released as open source, it is now the Ogg Theora project, and of course Theora's quality is much better than VP3's.

VP4

On2 boasted the best video encoding in the world, but the quality turned out to be mediocre.

VP5

It is still very mysterious, On2 has not been released, only seen in NullsoftVideo.

VP6

From the beginning, On2 provided this encoder for download, and the quality is still good. But it seems to be closed again recently, and there is only one decoder on the home page. On2 TrueMotion VP6 is a proprietary lossy video codec format and video codec. It is a concrete embodiment of the TrueMotion video codec, a series of video codecs developed by On2, commonly used in Adobe flash, Flash Video and JavaFX media files.

VP7

On2's newest encoder has made many improvements over VP6. In January 2005, On2 announced the launch of a new codec VP7 with a better compression ratio than VP6. In April 2005, On2 licensed On2 Video Encoder 9 (including VP6 and VP7) for Macromedia Flash. In August 2005, Macromedia announced that they selected VP6 as the flagship codec for video playback in the new Flash Player8.

VP8

Google acquired On2 Technologies in 2009 and announced at the Google I/O conference on May 19, 2010 that VP8 would be open sourced under a BSD license. . VP8 is the second codec that On2 Technologies announced as open source after VP3. (The Xiph.Org Foundation took over VP3 in 2002 and named it Theora, and later made Theora open source in the form of a BSD license). The biggest call for Google to open source VP8 comes from the Free Software Foundation. On March 12, 2010, the Free Software Foundation sent an open letter to Google, requesting that Google gradually replace Adobe Flash and H.264 on YouTube with open source VP8 and HTML 5.

On May 19, 2010, WebM was launched. WebM includes contributions from Mozilla, Opera, Google, and more than forty other publishers and computer hardware and software suppliers (including AMD, NVIDIA), and aims to vigorously advocate the use of VP8 in HTML5. Internet Explorer 9 also supports VP8 when the appropriate codec is installed.

Note: The above materials come from the collation of the wiki.

Codec Study Notes (11): Flash Video Series
Used to compress video in Flash. The FLV streaming media format is a new video format. Its appearance effectively solves the shortcomings of importing video files into Flash, making the exported SWF files bulky and unable to be used effectively on the Internet. Generally, FLV files are packaged in the shell of SWF PLAYER, and FLV can protect the original address very well, and it is not easy to be downloaded, so as to protect the copyright.

File names: .flv, .f4v, .f4p, .f4a, .f4b
Media types: video/x-flv, video/mp4, video/x-m4v, audio/mp4a-latm, video/3gpp, video/quicktime, audio/mp4
manufacturer: Adobe Systems (originally developed by Macromedia)
Type of format: Media container
Container for: Audio, video, text, data Extended from
: FLV: SWF, F4V: MPEG-4 Part 12
Flash Introduction

Flash Video is a file container format used by Adobe Flash Player versions 6-10 to deliver video over the Internet. Flash video content can also be encapsulated in SWF files. Flash video comes in two different file formats: FLV and F4V. In FLV files, audio and video data are encoded in the same way as SWF files. The F4V that appeared late, its format is based on the ISO-based media file format, and it began to be supported in Flash Player 9 update 3. These formats can be supported by Adobe Flash players and developed by Adobe, among which FLV is most developed by Macromedia.

Flash视频FLV文件所包含的媒体的编码通常采用Sorenson Spark和VP6视频压缩格式。最新发布的Flash播放器支持H.264视频和HE-AAC音频。所有的这些编解码目前受到专利的限制。

Sorenson编解码看参考以下两种专用的视频编解码:Sorenson Video或者Sorenson Spark。Sorenson Video也被称为Sorenson Codec,Sorenson Video Quantizer或者SVQ。Sorenson Spark也称为Sorenson H.263。这些编解码都是有Sorenson 媒体公司设计(及以前的Sorenson Vision公司)。Sorenson Video在Apple的QuickTime中使用,SorensonSpark在Adobe Flash(以前的MacromediaFlash)中使用。

Flash视频通过广泛使用的Adobe Flash播放器和浏览器的plugin或者其他的第三方程序,使它能在绝大多是的操作系统都可以使用。

通常Flash视频FLV文件包含的视频比特流是一个专有的H.263视频标准的变体,FourCC为 FLV1(Sorenson Spark)。SorensonSpark是一个FLV文件老式编解码,但被广泛应用和兼容,因此它是第一个被Flash Player支持的视频编解码。这是在Flash Player 6和7要求的视频压缩格式。Flash Player 8和更新的版本支持On2 TrueMotion VP6视频比特流回放(FourCC VP6F或者FLV4)。On2 VP6是FlashPlayer 8或者更高版本优先使用的视频压缩格式。On2 VP6可以提供能够提供比Sorenson Spark更高视觉质量,尤其在低比特流中。另外它的计算更为复杂,因此在某些古老的系统配置中无法很好使用。

Flash 9 update 3, released on December 3, 2007, provides a new Flash video file format F4V, supports H.264 video standard (that is, MPEG-4part 10 or AVC), H.264 requires more complex technology, But offers an even more outstanding quality/bitstream ratio. Specifically, Flash Player now supports H.264 video compression (MPEG-4 Part 10), AAC audio compression (MPEG-3 Part 3), F4V, MP4 (MPEG-4 Part14), M4V, 3GP and MOV multimedia containers Format, 3GPP Timed Text standard (MEPG-4 Part 17) (This is a standard subtitle format that can partially parse ID3'ilist', which is equivalent to the metadata storage used by iTunes. Does not support MPEG-4 Part 2 video (for example Created by DivX or Xvid). Jonathan Gay, one of Flash's lead programmers, told BBC News that the company initially wanted to use H.264 in Flash, but was blocked by a $5 million (£3.5 million) annual patent License fees are prohibitive.

The Flash Video FLV file format supports two versions of the 'screenshare' (Screen Video) codec, an encoding format used for desktop presentations. Both formats are based on tmap tiling, which can be reduced by reducing the color depth Lossy encoding and compression using zlib.The second version is supported in Flash Player 8 and newer.

In Flash video files, MP3 is usually used as audio codec. However, in Flash video FLV files, the dedicated Nellymoser Asao codec is used for recording through the microphone (Flash Player 10 was released in 2008 and also supports the open source Speex codec). FLV files support uncompressed audio or ADPCM audio formats. The latest Flahs Player 9 supports AAC (HE-AAC/AAC SBR, AAC Main Profile, and AAC-LC).

Encoding to Flash Video files is provided by an encoding tool, including Adobe's Flash Professional and Creative Suite products, On2's Flix encoding tool, SorensonSqueeze, FFmepg, and other third-party tools.

container

Flash Player6 released in 2002 added support for the SWF file format. In 2003, FlashPlayer7 added direct support for the FLV file format. Due to the limitations of the FLV file format, Adobe System proposed the new file format listed below in 2007, which is based on the ISO basic media file format (MPEG-4 Part 12). Flash Player does not check the file extension, but directly looks at the file to check which format it belongs to.

file extension

Mime Type

describe

.f4v

video/mp4

Video for Adobe Flash Player

.f4p

video/mp4

Protected Video for Adobe Flash Player

.f4a

video/mp4

Audio for Adobe Flash Player

.f4b

video/mp4

Audio Book for Adobe Flash Player

The support for SWF files in Flash Player6 and later versions makes it possible to interact with Adobe Flash media server through RTMP for audio, video and data. Flash Media Server data supports files in the FLV file format (MIME type video/x-flv). For SWF files created starting from Flash Player 9 Update 3, Flash Player can broadcast the new F4V file format.

media format

Media types supported in FLV files:

Video: On2 VP6, Sorneson Spark (Sorenson H.263), Screen Video, H.264
Audio: MP3, ADPCM, Linear PCM, Nellymoser, Speex, AAC, G.711 (reserved for interoperability requirements)
  supported in F4V files of the media type:

Video: H.264
Image (still frame of video data): GIF, PNG, JPEG
Audio: AAC, HE-AAC, MP3 Audio
and video compression formats supported in Flash Player and Flash Video

Flash Player version

Released

File format

Video compression formats

Audio compression formats

6

2002

SWF

Sorenson Spark, Screen video

MP3, ADPCM, Nellymoser

7

2003

SWF, FLV

Sorenson Spark, Screen video

MP3, ADPCM, Nellymoser

8

2005

SWF, FLV

On2 VP6, Sorenson Spark, Screen video, Screen video 2

MP3, ADPCM, Nellymoser

9.0.115.0

2007

SWF, FLV

On2 VP6, Sorenson Spark, Screen video, Screen video 2, H.264[*]

MP3, ADPCM, Nellymoser, AAC[*]

SWF, F4V, ISO base media file format

H.264

AAC, MP3

10

2008

SWF, FLV

On2 VP6, Sorenson Spark, Screen video, Screen video 2, H.264[*]

MP3, ADPCM, Nellymoser, Speex, AAC[*]

SWF, F4V, ISO base media file format

H.264

AAC, MP3

[*] There are some limitations to using H.264 and AAC compression in the FLV file format, and the author of Flash Player strongly recommends that you use the new F4V file format.

Several ways of Flash delivery

One, as a standard flv file.

2. Embed SWF files and use the Flash authentication tool (supported in FlashPlayer 6 and later versions).

3. Progressive download via HTTP. This approach uses ActionScript and includes an externally hosted Flash Video file on the client side for playback. However, unlike media streaming using RTMP, HTTP "streaming" does not support real-time broadcasting. HTTP streaming requires a custom player and the addition of specific FlashVideo metadata containing the exact start byte position and timecode of each keyframe. Using this specific information, a custom Flash Video player can request that playback begin at any specified keyframe. For example, Google Video, Youtube and BitGravity support progressive streaming, allowing you to view any part of the video before the cache is full. On the server side, this "fake HTTP stream" method is quite simple to implement, for example, you can use Apache's PHP module and use lighttpd.

4. Streams using the RTMP protocol can be provided by Flash Media Server (formerly known as Flash Communication Server), VCS, Electro Server, Helix Universal Serval, Wowza Pro, WebORB for .NET, WebORB for Java, and Open source Red5 server. As of April 2008, this protocol is available for streaming video without the need for re-encoding screencast software.

RTMP, Real Time Message Protocol, Real Time Message Protocol is a proprietary protocol developed by Adobe System for audio, video and data streams on the Internet, running between the Flash player and the server. There are three methods of RTMP protocol:
1. Through the "pure" protocol using port 1935 on TCP.
2. Used for RTMPT encapsulated in HTTP requests when traversing firewalls.
3. RTPMS used in HTTPS secure connection.

Note: The above materials come from the collation of the wiki.

Codec study notes (12): Other codecs
M-JPEG

M-JPEG (Motion-JoinPhotographicExpertsGroup) technology is motion still image (or frame-by-frame) compression technology, which is widely used in the field of nonlinear editing and can be accurate to frame editing and multi-layer image processing. Processing, this compression method completely compresses each frame individually, stores each frame randomly during the editing process, and can perform frame-accurate editing. In addition, the compression and decompression of M-JPEG are symmetrical, and can be used by the same hardware and Software Implementation.

MPEG video compression in the same format is different from inter-frame compression, because the compression bit rate is relatively low, so encoding and decoding are relatively easy, and do not require too much computing power, which also makes it very easy for software or chips to edit Motion JPEG . Because of this, some mobile devices, such as digital cameras, use MotionJPEG to encode video clips.

Motion JPEG 2000

JPEG2000 is an image compression standard based on wavelet transform, created and maintained by the Joint PhotographicExperts Group. JPEG2000 is generally considered to be the next-generation image compression standard to replace JPEG (based on discrete cosine transform) in the future. The file extension of JPEG2000 files is usually .jp2, and the MIME type is image/jp2.

Although JPEG2000 has certain advantages in technology, so far (2006), the number of image files produced using JPEG2000 technology on the Internet is still very small, and most browsers still do not support the display of JPEG2000 image files by default. However, since JPEG2000 can still have a relatively good compression rate under lossless compression, JPEG2000 has been widely used in the analysis and processing of medical images with relatively high image quality requirements.

DivX

File name extension: .divx
Type: DIVX
Developer: DivX, Inc
Format type: Media container for MPEG-4 Part 2–compliant video
Extension source: AVI
  This is another video encoding derived from MPEG-4 ( Compression) standard, commonly known as the DVDrip format, which uses the MPEG4 compression algorithm and integrates all aspects of MPEG-4 and MP3 technologies. Quality compression, while compressing the audio with MP3 or AC3, and then combining the video and audio and adding the corresponding external subtitle files to form a video format. Its picture quality is close to that of DVD and its volume is only a fraction of that of DVD. This kind of encoding does not have high requirements on the machine, so DivX video encoding technology can be said to be a new video compression format that poses the greatest threat to DVD, known as the DVD killer or DVD terminator.

DivX, a well-known brand of DivX Corporation (formerly DivXNetworks Corporation), is an MPEG-4 technology video codec (codec). In the fall of 2007, it acquired Germany's MainConcept for $22 million.

ISO announced the "ultra-low bit rate moving image and voice compression standard", sorted MPEG-4, approved the first version in October 1998, and announced the second version and its verification model (VM) in April 1994, MPEG -4 The official number is ISO/IEC International Standard 14496, which is a new type of multimedia standard. An important difference between it and the previous standard is that it is an object-based visual coding compression standard. The defined rate control goal is Obtaining the best quality at a given bit rate, it provides a good technical platform for transmitting high-quality multimedia video on the Internet.

In 1998, Microsoft developed the first MPEG-4 encoder used on PC, which includes MS MPEG4V1, MSMPEG4V2, and MS MPEG4V3 series encoding internal codes, of which V1 and V2 are used to make AVI files, and it has been used until now. As the default component of Windows, but the encoding quality of V1 and V2 is not very good, until MS MPEG4V3 began to improve, the picture quality has improved significantly, but I do not know why Microsoft, but this MS MPEGV3 video The encoding kernel is closed, and it is only applied to Windows Media streaming technology, which is the ASF streaming media file we are familiar with. Although ASF files have some advantages, they cannot be edited because they are too closed, so they are not widely used. Microsoft's video encoding was adopted, and after their modification, a new video encoding was born: that is the widely circulated MPEG encoder-DivX3.11.

DivX adopts MS's MPEGV3, which is improved and added its own function called DivX3.11, which is also one of the MPEG-4 encoders commonly used on the Internet. Soon, DivX was so popular that it almost became the standard in the industry. However, it also quickly appeared that the basic technology of DivX was illegally embezzled from Microsoft. However, Rota, one of the creators of DivX technology, is applying to legalize DivX in an all-round way. This is based on the fact that although DivX was invented from Window, it has not used any Microsoft technology. The company DivXNetworks is fully promoting DivX. It seems that the booming tide of DivX (commonly known as compressed movies) is unstoppable.

It seems that any eye-catching story will have a turning point at a critical moment, and the development of DivX cannot escape this cliché. During the smooth development of DivX, when DivX's technology gradually matured and business opportunities were unlimited, a good show was staged. The original intention of DivXNetworks was to get rid of Microsoft's closed technology, so it launched a completely open source project named "Projet Mayo", the goal is to develop a new set of open source MPEG4 encoding software, because it fully complies with the ISO MPEG standard, and It is completely open source code. OpenDivXCODEC has attracted a lot of software and video experts to participate, and soon developed a higher-performance encoder Encore2 and so on. In the most glorious period of DivX, DXN company suddenly closed the source code of DivX. And released its own product DivX4 on the basis of Encore2. It turns out that DXN has left a back door for itself. DivX adopts the LGPL agreement instead of the GPL agreement. Although they are all public license agreements, they guarantee free use and modification of software or source code rights, but LGPL allows private ownership, and DXN used this agreement to make a big move by surprise.

Then, many software and video groups that were slapped down by DXN started their own business, gradually regrouped their development forces, held high the banner of revenge, and developed a new MPEG-4 encoding based on the OpenDivX version – XviD, the order of names is just the opposite of DviX, just from the name we can see that Xvid is full of revenge.

DivX is the image compression coding standard that has dominated online video in the past one or two years. At first it was modified and developed based on the Microsoft MPEG 4 video coding standard, and released free of charge. It is characterized by a very good compression ratio, which can compress and store a whole set of DVD-quality movies into a CD-R disc. Now DivX is divided into normal version and Pro version, the latter also has two kinds of paid version and Adware (advertising) version, which comes with DivX Player program for playback. If users install the free DivX Codec, they can also watch DivX videos with Windows Media Player.

Note: The above materials come from the collation of the wiki.

Codec Study Notes (13): Containers (Part 1)
Video is an important part of the multimedia system in the computer. In order to meet the needs of storing video, people have set different video file formats to put video and audio in one file to facilitate simultaneous playback. A video file is actually a container with different tracks wrapped in it, and the format of the container used is related to the scalability of the video file.

The full name of FourCC is Four-Character Codes, which is composed of 4 characters (4 bytes). It is a four-byte independently marked video data stream format. There will be a section of FourCC in wav and avi files to describe this AVI file. What kind of codec is used to encode. Therefore, there are many FourCCs equal to "IDP3" in wav and avi.

ISO/IEC

MPEG-PS · MPEG-TS · MPEG-4 Part 12 /JPEG 2000 Part 12 · MPEG-4 Part 14

IT-T

H.222.0

Others

3GP and 3G2 · ASF · AVI · Bink · DivX Media Format · DPX · EVO · Flash Video · GXF · M2TS · Matroska · MXF· Ogg · QuickTime File Format · RealMedia · REDCODE RAW · RIFF · Smacker · MOD and TOD · VOB ·WebM

Audio Only

AIFF · AU · WAV

3GP and 3G2 containers

3GP (3GPP file format) is a multimedia container defined by the 3rd Generation Partnership Project (3GPP) for 3G UMTS multimedia services. It is used on 3G mobile phones, but can also be used on some 2G and 4G phones. 3GP is defined in the ETSI 3GPP technical specification, which is a video file format with speech/audio media type and text with time information, used for IMS, MMS, multimedia broadcast/multicast service (MBMS) and transmission end-to-end Packet Switched Streaming Service (PSS).

3G2 (3GPP2 file format) is a multimedia container defined by 3GPP2 for 3G CDMA 2000 multimedia services. She is very similar to the 3GP file format, but there are some extensions and limitations compared to it. 3G2 is defined in the 3GPP2 technical specification.

Both 3GP and 3G2 file formats are based on the ISO base media file format defined in ISO/IEC 14496-12 (MPEG-4 Part 12), but the boss's 3GP file format does not have some of these properties. 3GP and 3G2 are similar to MP4 (MPEG-4 Part 14), which is also based on MPEP-4 Part 12. 3GP and 3G2 are designed to reduce storage and bandwidth requirements for mobile phones. They are very similar standards, but there are differences:

3GPP file format is used for GSM-type phones, file extension: .3gp
3GPP2 file format is used for CDMA-type phones, and has file extension: .3g2
  3GP file storage video stream: MPEG-4 Part2, H.263, MPEG- 4 Part 10 (AVC/H.264), audio streams AMR-NB, AMR-WB, AMR-WB+, AAC-LC, HE-AAC v1 and Enhanced aacPlus (HE-AAC v2). 3GPP allows the use of AMR and H.263 codecs in the ISO basic file format (MPEG-4Part12), because 3GPP stipulates the use of sampling entries and template fields in the ISO basic file format, and can define new boxes for codecs. These extensions are registered as code-points in the ISO base media file format ("MP4 family" files) by a registration authority. For storing MPEG-4 media in 3GP files, 3GP stipulates to participate in the MP4 and AVC file format specifications, which are also based on the ISO basic media file format. The MP4 and AVC file format specifications describe the use of MPEG-4 content in the ISO base media file format. Some phones use .mp4 as an extension for 3GP video.

The 3G2 file format can store the same video stream and part of the audio stream as the 3GP file format. In addition, 3G2 can have audio streams including EVRC, EVRC-B, EVRC-WB, 13K (QCELP), SMV, and VMR-WR. The 3G2 specification also defines certain cargo stations with time files in 3GPP. The 3G2 file format does not support Enhanced aacPlus (HE-AAC v2) and AMR-WB+ audiostreams. For the presence of MPEG-4 media (AAC audio, MPEG-4 Part 2 audio, MPEG-4 Part 10/H.264/AVC) in 3G2 files, the 3G2 specification mentions the MP4 file format and the AVC file format specification, where Describes how to use these in the ISO base media file format. For storing H.263 and AMR content in 3G2, the 3G2 specification refers to the 3GP file format specification.

3GP format video has two resolutions:

The resolution is 176×144, which is suitable for all mobile phones that support 3GP format on the market.
Resolution 320×240, clear, suitable for high-end mobile phones, MP4 players, PSP and Apple iPod.
ANIM

ANIM standard multimedia file used for digital animation of the classic Commodore Amiga. It follows the IFF ILBM master specification, which was the first animation format officially adopted by an operating system.

ASF

Standard container for Microsoft WMA and WMV.

WMV (Windows Media Video) is a general term for a set of digital video codec formats developed by Microsoft, and ASF (Advanced Systems Format) is its encapsulation format. WMV files encapsulated by ASF have the function of "digital copyright protection". Extensions: wmv/asf, wmvhd.

ASF (Advanced Streaming format Advanced Streaming Format). ASF is a file compression format developed by MICROSOFT to compete with the current Real player, which can directly watch video programs on the Internet. ASF uses the compression algorithm of MPEG4, and the compression rate and image quality are very good. Because ASF exists in a video "stream" format that can be watched instantly on the Internet, it is not surprising that its image quality is a little worse than VCD, but it is better than the RAM format that is also a video "stream" format.

File Extension: .asf .wma .wmv
Internet Media Type: video/x-ms-asf, application/vnd.ms-asf
Type Code: 'ASF_'
Unique Type Code: Identifier com.microsoft.advanced-systems-format
Magic number: 30 26 b2 75
Developer: Microsoft
Format type: Container format
Container storage: WMA, WMV, MPEG4 etc.
AVI

AVI (the standard Microsoft Windows container, also based on RIFF). AVI is the acronym for English Audio Video Interleave ("Audio Video Interleave" or translated as "Audio Video Interleave"), a multimedia file format launched by Microsoft in November 1992 to counter Apple's Quicktime technology. The AVI mentioned now mostly refers to a packaging format.

The earlier AVI was developed by Microsoft. Its meaning is Audio Video Interactive, which is to store video and audio codes together. AVI is also the longest-lived format. It has existed for more than 10 years. Although a revision has been released (V2.0 was released in 1996), it is already old. There are many restrictions on the AVI format. There can only be one video track and one audio track (now there are non-standard plug-ins that can add up to two audio tracks), and there can be some additional tracks, such as text. The AVI format does not provide any control functions. Extension: avi.

Codecs that AVI can use:

视频名称(括号内表示的是此视频的FourCC)
o MPEG-1/-2 (MPEG/MPG1/MPG2)
o MPEG-4 (MP4V/XVID/DX50/DIVX/DIV5/3IVX/3IV2/RMP4)
o MS-MPEG4 (MPG4/MP42/MP43)
o WMV7/WMV8/WMV9 (WMV1/WMV2/WMV3)
o DV(DVSD/DVIS)
o Flash Video (FLV1/FLV4)
o Motion JPEG (MJPG)
o LossLess JPEG (LJPG)
o H.264 (AVC1/DAVC/H264/X264)
o H.263 (H263/S263)
o H.261 (H261)
o Huffyuv (HFYU)
o AVIzlib (ZLIB)
o AVImszh (MSZH)
o Theora (THEO)
o Indeo Video (IV31/IV32)
o Cinepak (cvid)
o Microsoft Video 1 (CRAM)
o On2VP3 (VP30/VP31)
o On2VP4 (VP40)
o On2 VP6 (VP60/VP61/VP62)
o VC-1 (WVC1)
音频
o PCM
o MP3 (0x0055)
o AC-3 (0x0092)
o AAC
  - HE-AAC
  - LC-AAC
o FLAC
o Indeo Audio
o TrueSpeech
o WMA
o Vorbis
  编码组合能根据以下的例子自由选择。

(DivX或XviD+MP3).avi,
(H.264+MP3).avi
(WMV9+MP3).avi
  以XviD+MP3构成的AVI最为常见。

DVB-MS

DVR-MS (Microsoft Digital Video Recording,微软数字视频录制)是一种专用的视频和音频文件容器格式,有微软开发,用于存储由Windows XPMedia Center Edition,Windows Vista和Windows 7录制的电视内容。多个数据流(视频和音频)在带有DVR-MS扩展的ASF容器中封装。视频使用MPEG-2标准编码,音频使用MPEG-1 Layer II或者杜比数字AC-3(ATSC A/52)。扩展的格式包括内容和数字版权管理的元数据。这些格式的文件有流缓存引擎(SBE.dll)生成,这是一个在Windows XP Service Pack 1的DirectShow组件。

MPEG/MPG/DAT

MPEG format: MPEG (Moving Picture Experts Group), is a media package form recognized by the International Standards Organization (ISO), supported by most machines. Its storage methods are various and can adapt to different application environments. The file container format of the MPEG-4 file is specified in Part 1 (mux), 14 (asp), 15 (avc) and the like. MPEG is rich in control functions and can have multiple videos (that is, angles), audio tracks, subtitles (bitmap subtitles), and more. A simplified version of MPEG, 3GP, is also widely used in quasi-3G mobile phones. Extensions: dat (for VCD), vob, mpg/mpeg, 3gp/3g2 (for mobile phones), etc.

MPEG is also the abbreviation of Motion Picture Experts Group. Such formats include a variety of video formats including MPEG-1, MPEG-2 and MPEG-4. MPEG-1 is believed to be the most contacted by everyone, because it is currently being widely used in the production of VCDs and network applications for downloading some video clips. Most VCDs are compressed in MPEG1 format (the burning software automatically MPEG1 to .DAT format), using the MPEG-1 compression algorithm, can compress a 120-minute movie to a size of about 1.2 GB. MPEG-2 is used in the production of DVD, and there are quite a lot of applications in some HDTV (high-definition television broadcasting) and some high-demand video editing and processing. A 120-minute movie can be compressed to a size of 5-8 GB using the compression algorithm of MPEG-2 (the image quality of MPEG-2 is unmatched by MPEG-1).

MPEG-PS: MPEG program stream (programstream), is a standard container for MPEG-1 and MPEG-2 reference streams, used on reliable media, such as disks, and also used in DVD-Video discs.

MPEG-TS: MPEG Transport Stream, the standard container for digital broadcasting and transmission on unreliable media, also used on Blu-ray Discs, usually carrying multiple video and audio streams and an electronic program guide.

. AVI

If you find that the original playback software suddenly cannot open AVI files in this format, then you have to consider whether you have encountered n AVI. n AVI is the abbreviation of New AVI, which is a new video format developed by an underground organization called Shadow Realm. It is derived from the modification of the Microsoft ASF compression algorithm (not the imagined AVI). The pursuit of the video format is nothing more than the compression rate and image quality. Therefore, in order to pursue this goal, NAVI has improved some of the shortcomings of the original ASF format. NAVI can have a higher frame rate. It can be said that NAVI is an improved ASF format that removes the characteristics of video streaming.

Note: The above materials come from the collation of the wiki.

Codec Study Notes (14): Containers (Part 2)
Matroska (MKV)

MKV is not any codec or system standard, but it can actually encapsulate anything. It is an open and open source container format.

Extension .mkv .mka .mksInternet
media type video/x-matroska audio/x-matroska developer
Matroska.org
format Video file format
Specialized in multimedia
Free file format? Yes: GNU LGPL

Matroska, many people regard it as MKV, in fact, MKV is just one of the files in the Matroska media series. Matroska is a new multimedia packaging format. This packaging format can package a variety of different coded videos, 16 or more pieces of audio in different formats and subtitles in different languages ​​into one Matroska Media file. It is also one of the open source multimedia packaging formats.

Multimedia Packaging Format, referred to as MCF, multimedia container, is an open (no identity restrictions, free) and free format for storing data. The developer promises that everyone can freely use this format and the software developed in this format; it will not become a commercial research project when this format is common.

Matroska Media defines three types of files:

MKV (Matroska Video File): video file, which can contain audio and subtitles;
MKA (Matroska Audio File): a single audio file, which can have multiple and multiple types of audio tracks;
MKS (Matroska Subtitles): subtitle file.
  Among these three files, MKV is the most common.

The biggest feature of Matroska is that it can accommodate many different types of video encoding, audio encoding and subtitle streams, and it can also accommodate very high-density RealMedia and QuickTime files, and reorganize their audio and video at the same time, so as to achieve A better and sharper effect.

The development of Matroska is a big challenge to many traditional media formats, but Matroska has also been developed as a multi-functional multimedia container.

MP4

MP4 is a standard audio and video container defined by MPEG-4, based on the ISO basic media file format (defined in MPEG-4 Part 12 and JPEG 2000 Part 12), and described in MPEG-4 Part 14. It is a multimedia computer file format using MPEG-4, with the extension .mp4, mainly used to store digital audio and digital video.

Extension .mp4
Internet media type video/mp4, audio/mp4, application/mp4
type code mpg4
developer ISO
format video file format
Specially for Audio, video, text
Extended from QuickTime .mov and MPEG-4 Part 12
standard ISO/IEC 14496-14

MOD

The MOD format is the name of the storage format adopted by the hard disk camcorders produced by JVC.

MOV

MOV is the evaluation company's standard QuickTime video container. QuickTime Movie is a container developed by Apple Inc. Due to the dominance of Apple Computer in the field of professional graphics, the QuickTime format format has basically become the common format of the movie production industry. On February 11, 1998, the International Standards Organization (ISO) recognized the QuickTime file format as the basis for the MPEG-4 standard. QT can store quite a lot of content, in addition to video and audio, it can also support pictures, text (text subtitles) and so on. extension: mov

Friends who have used Mac machines should have some exposure to QuickTime. QuickTime was originally an image and video processing software used by Apple on Mac computers. Quick-Time provides two standard image and digital video formats, which can support static PIC and JPG image formats, dynamic MOV based on Indeo compression method and MPG video format based on MPEG compression method.

Ogg

Ogg is the standard brick container of Xiph.org audio codec Vorbis and video codec Theora, Ogg Media is a completely open multimedia system project, and OGM (Ogg Media File) is its container format. OGM can support multiple tracks such as video, audio, and subtitles (text subtitles). Extension: ogg.

GMO

OGM (Ogg Media), the video codec container of Xiph.ofg, is no longer supported and its use is discouraged.

RealMedia

RealMedia is the standard container for RealVideo and RealAudio. Real Video or Real Media (RM) file is a file container developed by RealNetworks. It can usually only accommodate Real Video and Real Audio encoded media. This file has some interactive functions, allowing scripting to control playback. RM, especially the variable bit rate RMVB format, is very small and very popular with Internet downloaders. Extension: rm/rmvb

RM

It is one of the audio/video compression specifications Real Media formulated by Real Networks. What Real Player can do is to use Internet resources to live broadcast these audio/videos that comply with the Real Media technical specifications. The Real Media specification mainly includes three types of files: RealAudio, Real Video and Real Flash (a new generation of high-compression animation format jointly launched by Real Networks and Macromedia). The REAL VIDEO (RA, RAM) format has been positioned in the application of video streaming from the very beginning, and it can also be said to be the founder of video streaming technology. It can realize uninterrupted video playback under the condition of using 56K MODEM dial-up Internet access, but its image quality is worse than that of VCD. If you have seen those RM compressed DVDs, you can clearly compare them.

RMVB

This is a new video format extended from the RM video format upgrade. Its advanced feature is that the RMVB video format breaks the average compression sampling method of the original RM format, and makes reasonable use of the bit rate on the basis of ensuring the average compression ratio. Resources, that is to say, still and less moving scenes use a lower encoding rate, which can leave more bandwidth space, and these bandwidths will be utilized when there are fast moving picture scenes. In this way, on the premise of ensuring the quality of the still picture, the picture quality of the moving picture is greatly improved, so that a delicate balance is reached between the picture quality and the file size. In addition, compared to the DVDrip format, RMVB video also has obvious advantages. If a DVD movie with a size of about 700MB is transcribed into RMVB format with the same audio-visual quality, its size will be about 400MB at most. Not only that, this video format also has unique advantages such as built-in subtitles and no need for plug-in support. If you want to play this video format, you can use RealOne Player2.0 or RealPlayer8.0 plus RealVideo9.0 or above to play the decoder.

VOB

VOB file (video Object) is a container format for DVD video media. VOB can contain video, audio, letters and menus integrated in one stream format. VOB is based on the MPEG PS format, but has additional restrictions and specifications for private streams. MPEG PS provides non-standard data called private streams. VOB files are a very restricted subset of MEPG PS tables. All VOB files are MPEG PS, but not all MPEG PS comply with the definition of VOB files.

Similar to MPEG PS, VOB files can contain H.262/MPEG-2 Part2 or MPEG-1 Part 2 video, MPEG-1 Audio LayerII or MOEG-2 Audio Layer II audio, but compared with MPEG PS, in VOB files There are certain restrictions on using these compression formats in . In addition, VOBs can include linear PCM, AC-3 or DTS video as well as letters. VOB files cannot contain AAC audio (MPEG-2 Part 7), MPEG-4 compressed formats or others, which are allowed in the MPEG PS standard.

File extension.VOB
  Developer: DVD Forum
  Type: Media Container
  Contains: Audio, Video, Alphabet
  Used for: DVD-Video
  Where does it expand: MPEG program stream, ISO/IEC 13818-1
  Standard Specification: DVD-Video Book

Guess you like

Origin blog.csdn.net/hmwz0001/article/details/130213080