Android audio and video codec (2) -- MediaCodec decoding (synchronous and asynchronous)

In the previous article Android audio and video codec (1) - MediaCodec Introductory , already have a perceptual understanding of MediaCodec, this chapter, to learn the decoding function of MediaCodec.

The effect of this chapter is as follows:

insert image description here

1. Synchronous decoding

In order to better understand the working principle and working steps of MediaCodec, first use the synchronous decoding method to decode the local video.

1.1 Get video parameters

First of all, we need to prepare a video, such as MP4 format, which has already been encoded. We can get the MediaFormat information of the video through MediaCodec. If you are not familiar with MediaExtractor, you can use Android audio and video development (5) – Use MediaExtractor to separate audio and video, and use MediaMuxer to synthesize new video (audio and video synchronization) to learn;

Define a MyExtractor, which implements MediaExtractor to specifically analyze the video and get the video data:

  public MyExtractor(String path) {
        try {
            mediaExtractor = new MediaExtractor();
            // 设置数据源
            mediaExtractor.setDataSource(path);
        } catch (IOException e) {
            e.printStackTrace();
        }
        //拿到所有的轨道
        int count = mediaExtractor.getTrackCount();
        for (int i = 0; i < count; i++) {
            //根据下标拿到 MediaFormat
            MediaFormat format = mediaExtractor.getTrackFormat(i);
            //拿到 mime 类型
            String mime = format.getString(MediaFormat.KEY_MIME);
            //拿到视频轨
            if (mime.startsWith("video")) {
                videoTrackId = i;
                videoFormat = format;
            } else if (mime.startsWith("audio")) {
                //拿到音频轨
                audioTrackId = i;
                audioFormat = format;
            }

        }
    }


    public void selectTrack(int trackId){
        mediaExtractor.selectTrack(trackId);
    }

    /**
     * 读取一帧的数据
     *
     * @param buffer
     * @return
     */
    public int readBuffer(ByteBuffer buffer) {
        //先清空数据
        buffer.clear();
        //选择要解析的轨道
      //  mediaExtractor.selectTrack(video ? videoTrackId : audioTrackId);
        //读取当前帧的数据
        int buffercount = mediaExtractor.readSampleData(buffer, 0);
        if (buffercount < 0) {
            return -1;
        }
        //记录当前时间戳
        curSampleTime = mediaExtractor.getSampleTime();
        //记录当前帧的标志位
        curSampleFlags = mediaExtractor.getSampleFlags();
        //进入下一帧
        mediaExtractor.advance();
        return buffercount;
    }

First, use selectTrack to specify whether to analyze video or audio, and then use the readBuffer method, which uses mediaExtractor.readSampleData(buffer, 0); to get the buffer of the current video, and get the next one through mediaExtractor.advance() frame data.

1.2 Decoding process

The previous chapter mentioned that the decoding of MediaCodec is based on the following two pictures:

MediaCodec working diagram MediaCodec working principle diagram

MediaCodec state diagram

MediaCodec state diagram

If you are not familiar with the process of these two pictures, please read Android Audio and Video Codec (1) - MediaCodec Preliminary Study .

In order to facilitate subsequent audio decoding, here we define a base class for parsing video and audio. Since the video is parsed synchronously, we should parse it in the thread, so inherit Runnable:

   /**
     * 解码基类，用于解码音视频
     */
    abstract class BaseDecode implements Runnable {
        final static int VIDEO = 1;
        final static int AUDIO = 2;
        //等待时间
        final static int TIME_US = 1000;
        MediaFormat mediaFormat;
        MediaCodec mediaCodec;
        MyExtractor extractor;

        private boolean isDone;
        public BaseDecode() {
            try {
                //获取 MediaExtractor
                extractor = new MyExtractor(Constants.VIDEO_PATH);
                //判断是音频还是视频
                int type = decodeType();
                //拿到音频或视频的 MediaFormat
                mediaFormat = (type == VIDEO ? extractor.getVideoFormat() : extractor.getAudioFormat());
                String mime = mediaFormat.getString(MediaFormat.KEY_MIME);
                //选择要解析的轨道
                extractor.selectTrack(type == VIDEO ? extractor.getVideoTrackId() : extractor.getAudioTrackId());
                //创建 MediaCodec
                mediaCodec = MediaCodec.createDecoderByType(mime);
                //由子类去配置
                configure();
                //开始工作，进入编解码状态
                mediaCodec.start();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
 }

It can be seen that different MediaFormats are obtained according to the video or video, and MediaCodec is created according to the mime type;

MediaCodec state diagram

In this way, the first step is created. Then, from the above state, it is now in the Uninitialized state; then you need to call the configure method to enter the Configured state. This step is completed by subclasses, such as video:

 @Override
 void configure() {
     mediaCodec.configure(mediaFormat, new Surface(mTextureView.getSurfaceTexture()), null, 0);
 }

It can be seen that the current MediaFormat and the Surface to play the video are configured through mediaCodec.configure(), and TextureView is used here. Then, call the start() of MediaCodec to enter the Executing state and start encoding and decoding.

1.3 Video decoding

The decoding process is based on this picture MediaCodec working principle diagram

1.3.1 Input

It is mentioned above that BaseDecode inherits Runnable, so the decoding process is in the run method.

    @Override
        public void run() {


            try {
                
                MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
                //编码
                while (!isDone) {
                    /**
                     * 延迟 TIME_US 等待拿到空的 input buffer下标，单位为 us
                     * -1 表示一直等待，知道拿到数据，0 表示立即返回
                     */
                    int inputBufferId = mediaCodec.dequeueInputBuffer(TIME_US);

                    if (inputBufferId > 0) {
                        //拿到 可用的，空的 input buffer
                        ByteBuffer inputBuffer = mediaCodec.getInputBuffer(inputBufferId);
                        if (inputBuffer != null) {
                            /**
                             * 通过 mediaExtractor.readSampleData(buffer, 0) 拿到视频的当前帧的buffer
                             * 通过 mediaExtractor.advance() 拿到下一帧
                             */
                            int size = extractor.readBuffer(inputBuffer);
                            //解析数据
                            if (size >= 0) {
                                mediaCodec.queueInputBuffer(
                                        inputBufferId,
                                        0,
                                        size,
                                        extractor.getSampleTime(),
                                        extractor.getSampleFlags()
                                );
                            } else {
                                //结束,传递 end-of-stream 标志
                                mediaCodec.queueInputBuffer(
                                        inputBufferId,
                                        0,
                                        0,
                                        0,
                                        MediaCodec.BUFFER_FLAG_END_OF_STREAM
                                );
                                isDone = true;

                            }
                        }
                    }
                    //解码输出交给子类
                    boolean isFinish =  handleOutputData(info);
                    if (isFinish){
                        break;
                    }

                }

                done();

            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        protected void done(){
            try {
                isDone = true;
                //释放 mediacodec
                mediaCodec.stop();
                mediaCodec.release();

                //释放 MediaExtractor
                extractor.release();
            } catch (Exception e) {
                e.printStackTrace();
            }

        }
        abstract boolean handleOutputData(MediaCodec.BufferInfo info);

In a while loop, continuous decoding, the above code does the following process:

Get an idle buffer from MediaCodec Get the data of the current frame of the video from the video, and fill it into the buffer of MediaCodec Use mediaCodec.queueInputBuffer() to give the data of the buffer to MediaCodec for decoding

1.3.2 Output

The above output process is implemented by handleOutputData(), and it goes to VideoDecodeSync. The code is implemented as follows:

@Override
boolean handleOutputData(MediaCodec.BufferInfo info) {
    //等到拿到输出的buffer下标
    int outputId = mediaCodec.dequeueOutputBuffer(info, TIME_US);

    if (outputId >= 0){
        //释放buffer，并渲染到 Surface 中
        mediaCodec.releaseOutputBuffer(outputId, true);
    }

    // 在所有解码后的帧都被渲染后，就可以停止播放了
    if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
        Log.e(TAG, "zsr OutputBuffer BUFFER_FLAG_END_OF_STREAM");

        return true;
    }

    return false;
}

The above code also does two things:

Get the output buffer
Release the buffer and render the video to the Surface, controlled by the second parameter of releaseOutputBuffer().

In this way, we have written the video decoding part. The effect is as follows:

insert image description here

However, you will find that the video seems to be played at double speed.

1.3.3 Correct display timestamp

Why does the above situation occur? Normally, the frame rate of our video playback is about 30, that is, 30fps, and one frame is played every 33.33ms; but the time to decode a frame is about a few ms, if it is decoded, it will be directly displayed on the Surface If not, the video will look like it is played at double speed. Then you may say that it is enough to play with a delay of 30ms. Isn’t that the standard? Yes, 30fps is the standard, but it does not mean that every video is 30. Here you need to learn the basics of audio and video, DTS and PTS.

[Learning address]: FFmpeg/WebRTC/RTMP/NDK/Android audio and video streaming media advanced development

[Article Benefits]: Receive more audio and video learning packages, Dachang interview questions, technical videos and learning roadmaps for free. The materials include (C/C++, Linux, FFmpeg webRTC rtmp hls rtsp ffplay srs, etc.) Click 1079654574 to join the group to receive it~

DTS (Decoding Time Stamp): It is the decoding time stamp. The meaning of this time stamp is to tell the player when to decode the data of this frame. PTS (Presentation Time Stamp): Display the time stamp. This time stamp tells the player what When playing this frame, it should be noted that although DTS and PTS are used to guide the behavior of the player, they are generated by the encoder during encoding. In the case of no B frame, the output order of DTS and PTS is the same, once there is a B frame, the order is different. Here, we only need to care about PTS, that is, display timestamp. The current pts timestamp can be obtained through the presentationTimeUs of MediaCodec.BufferInfo. The unit is subtle. It is the time to start playing relative to 0. Therefore, we can use the system time difference to imitate the time difference between two frames, so that when the decoded pts is compared to If the time difference is fast, it will be output to the Surface after a delay, if not, it will be directly displayed on the Surface.

Since it is in a thread, we can use Thread.sleep() to achieve it before rendering to Surface:

 // 用于对准视频的时间戳
 private long startMs = -1;
if (outputId >= 0){
     if (mStartMs == -1) {
            mStartMs = System.currentTimeMillis();
      }
    //矫正pts
     sleepRender(info, startMs);
     //释放buffer，并渲染到 Surface 中
     mediaCodec.releaseOutputBuffer(outputId, true);
 }
#sleepRender
    /**
     * 数据的时间戳对齐
     **/
    private void sleepRender(MediaCodec.BufferInfo info, long startMs) {
        /**
         * 注意这里是以 0 为出事目标的，info.presenttationTimes 的单位为微秒
         * 这里用系统时间来模拟两帧的时间差
         */
        long ptsTimes = info.presentationTimeUs / 1000;
        long systemTimes = System.currentTimeMillis() - startMs;
        long timeDifference = ptsTimes - systemTimes;
        // 如果当前帧比系统时间差快了，则延时以下
        if (timeDifference > 0) {
            try {
                Thread.sleep(timeDifference);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }

        }
    }

Time playback is now normal.

insert image description here

1.4 Decoding audio

The video decoding has been understood above, so the audio decoding is relatively simple. Create a new AudioDecodeSync class, inherit BaseDecode, and configure MediaCodec in the configure method. Since Surface is not required, just pass null.

 @Override
 void configure() {
     mediaCodec.configure(mediaFormat, null, null, 0);
 }

Although we don't need to use Surface, but we need to play video, then we need to use AudioTrack. If you don't need it, you can refer to Android audio and video development (1) - use AudioRecord to record PCM (recording); AudioTrack to play audio.

Therefore, in the construction method of AudioDecodeSync, the following AudioTrack needs to be configured:

    class AudioDecodeSync extends BaseDecode {

        private int mPcmEncode;
        //一帧的最小buffer大小
        private final int minBufferSize;
        private AudioTrack audioTrack;


        public AudioDecodeSync() {
            //拿到采样率
            if (mediaFormat.containsKey(MediaFormat.KEY_PCM_ENCODING)) {
                mPcmEncode = mediaFormat.getInteger(MediaFormat.KEY_PCM_ENCODING);
            } else {
                //默认采样率为 16bit
                mPcmEncode = AudioFormat.ENCODING_PCM_16BIT;
            }

            //音频采样率
            int sampleRate = mediaFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE);
            //获取视频通道数
            int channelCount = mediaFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT);

            //拿到声道
            int channelConfig = channelCount == 1 ? AudioFormat.CHANNEL_IN_MONO : AudioFormat.CHANNEL_IN_STEREO;
            minBufferSize = AudioTrack.getMinBufferSize(sampleRate, channelConfig, mPcmEncode);


            /**
             * 设置音频信息属性
             * 1.设置支持多媒体属性，比如audio，video
             * 2.设置音频格式，比如 music
             */
            AudioAttributes attributes = new AudioAttributes.Builder()
                    .setUsage(AudioAttributes.USAGE_MEDIA)
                    .setContentType(AudioAttributes.CONTENT_TYPE_MUSIC)
                    .build();
            /**
             * 设置音频数据
             * 1. 设置采样率
             * 2. 设置采样位数
             * 3. 设置声道
             */
            AudioFormat format = new AudioFormat.Builder()
                    .setSampleRate(sampleRate)
                    .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
                    .setChannelMask(channelConfig)
                    .build();


            //配置 audioTrack
            audioTrack = new AudioTrack(
                    attributes,
                    format,
                    minBufferSize,
                    AudioTrack.MODE_STREAM, //采用流模式
                    AudioManager.AUDIO_SESSION_ID_GENERATE
            );
            //监听播放
            audioTrack.play();
        }
        }

After getting the AudioTrack, you can use the play() method to monitor whether there is data written, and start playing the audio.

In handleOutputData:

    @Override
        boolean handleOutputData(MediaCodec.BufferInfo info) {
            //拿到output buffer
            int outputIndex = mediaCodec.dequeueOutputBuffer(info, TIME_US);
            ByteBuffer outputBuffer;
            if (outputIndex >= 0) {
                outputBuffer = mediaCodec.getOutputBuffer(outputIndex);
                //写数据到 AudioTrack 只，实现音频播放
                audioTrack.write(outputBuffer, info.size, AudioTrack.WRITE_BLOCKING);
                mediaCodec.releaseOutputBuffer(outputIndex, false);
            }
            // 在所有解码后的帧都被渲染后，就可以停止播放了
            if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                Log.e(TAG, "zsr OutputBuffer BUFFER_FLAG_END_OF_STREAM");

                return true;
            }
            return false;
        }

You will find that the audio is playing normally, and there is no meaning of fast forwarding, because the time stamp of the audio is relatively continuous, so there is no need to correct it.

1.5 Audio and video synchronization

So, how to synchronize audio and video? In fact, it is not difficult, just open up two threads and let it play at the same time:

if (mExecutorService.isShutdown()) {
            mExecutorService = Executors.newFixedThreadPool(2);
        }
        mVideoSync = new VideoDecodeSync();
        mAudioDecodeSync = new AudioDecodeSync();
        mExecutorService.execute(mVideoSync);
        mExecutorService.execute(mAudioDecodeSync);
  }

2. Asynchronous decoding

After 5.0, google recommends to use MediaCodec by asynchronous decoding, which is also very simple to use, just use the setCallback method. For example, to analyze the above video, the steps can be as follows:

Use MediaExtractor to parse the video, get MediaFormat and use the MediaCodec.setCallback() method to call mediaCodec.configure() and mediaCodec.start() to start decoding So, the code is as follows:

    class AsyncDecode {
        MediaFormat mediaFormat;
        MediaCodec mediaCodec;
        MyExtractor extractor;

        public AsyncDecode() {
            try {
                //解析视频，拿到 mediaformat
                extractor = new MyExtractor(Constants.VIDEO_PATH);
                mediaFormat = (extractor.getVideoFormat());
                String mime = mediaFormat.getString(MediaFormat.KEY_MIME);
                extractor.selectTrack(extractor.getVideoTrackId());
                mediaCodec = MediaCodec.createDecoderByType(mime);

            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        private void start() {
            //异步解码
            mediaCodec.setCallback(new MediaCodec.Callback() {
                @Override
                public void onInputBufferAvailable(@NonNull MediaCodec codec, int index) {
                        ByteBuffer inputBuffer = codec.getInputBuffer(index);
                    int size = extractor.readBuffer(inputBuffer);
                    if (size >= 0) {
                        codec.queueInputBuffer(
                                index,
                                0,
                                size,
                                extractor.getSampleTime(),
                                extractor.getSampleFlags()
                        );
                        handler.sendEmptyMessage(1);
                    } else {
                        //结束
                        codec.queueInputBuffer(
                                index,
                                0,
                                0,
                                0,
                                MediaCodec.BUFFER_FLAG_END_OF_STREAM
                        );
                    }
                }

                @Override
                public void onOutputBufferAvailable(@NonNull MediaCodec codec, int index, @NonNull MediaCodec.BufferInfo info) {
                        mediaCodec.releaseOutputBuffer(index, true);
                }

                @Override
                public void onError(@NonNull MediaCodec codec, @NonNull MediaCodec.CodecException e) {
                    codec.reset();
                }

                @Override
                public void onOutputFormatChanged(@NonNull MediaCodec codec, @NonNull MediaFormat format) {

                }
            });
            //需要在 setCallback 之后，配置 configure
            mediaCodec.configure(mediaFormat, new Surface(mTextureView.getSurfaceTexture()), null, 0);
            //开始解码
            mediaCodec.start();
        }

    }

The process of using asynchronous decoding is basically the same as that of synchronous decoding; however, in the synchronous code, we pass

int inputBufferId = mediaCodec.dequeueInputBuffer(TIME_US);

To wait to get the free input buffer subscript, and in asynchronous, it is through the callback

void onInputBufferAvailable(@NonNull MediaCodec codec, int index)

Get the subscript of the input buffer. If there is no accident, your video has already started to play, but you also encounter the above problem, that is, it is played at double speed. The reason we have explained is that the PTS problem has not been corrected.

Since our code is carried out in the main thread, Thread.sleep() will definitely freeze, but we can use HandlerThread or other threads to parse it, so I won’t post it here, let’s see the source code for details.

3, Reference:

[Android audio and video development monster upgrade: audio and video hard decoding articles] 3. Audio and video playback: audio and video synchronization - short book Android video processing MediaCodec-3-play video MediaCodec | Android Developers

Original link: Android audio and video codec (2) -- MediaCodec decoding (synchronous and asynchronous)_mediacodec asynchronous decoding_Summer solstice's rice ear's blog-CSDN blog