Audio and video learning AudioTrack, OpenSL ES audio rendering

Preface

Before explaining audio rendering, you need to understand the basics of audio, so this article is divided into basic concepts and AudioTrack and OpenSL ES Demo examples to explain, which helps to better understand audio rendering in Android.

The basic concepts of audio involve a lot of knowledge points. The first half of this article will introduce in detail, and the subsequent articles will basically involve the development of audio. With the foundation, it will be easier to get started with the following content.

Basic knowledge of audio

Getting started with the player to improve the
quick mastery of the basic knowledge of audio and video development

The physical properties of sound

Sound is wave

When it comes to sound, I believe that as long as people with normal hearing have heard the sound, how is the sound produced? Remember the description in the junior high school physics textbook-sound is produced by the vibration of an object. In fact, sound is a kind of pressure wave. When hitting an object or playing an instrument, their vibration will cause rhythmic vibration of the air, causing the surrounding air to change densely and densely, forming dense and dense longitudinal waves. Sound waves are generated, and this phenomenon will continue until the vibration disappears.

The three elements of sound waves

The three elements of a sound wave are frequency, amplitude, and waveform. Frequency represents the level of the musical scale, amplitude represents loudness, and waveform represents timbre.

Sound transmission medium

The sound transmission medium is very wide, it can be transmitted through air, liquid and solid; and the medium is different, the transmission speed is also different, for example, the transmission speed of sound in the air is 340m/s, and the transmission speed in distilled water is 1497 m /s, and the propagation speed in an iron rod can be as high as 5200 m/s; however, sound cannot propagate in a vacuum.

echo

When we yell loudly in high mountains or open areas, we often hear echoes. The reason for the echo is that the sound bounces back when encountering obstacles in the propagation process, and we hear it again.

However, if the time difference between the two sounds reaching our ears is less than 80 milliseconds, we cannot distinguish between the two sounds. In fact, in daily life, the human ears are also collecting echoes, but due to the noisy external environment and The decibel of the echo is relatively low, so our ears can't distinguish such sound, or the brain can receive it but can't distinguish it.

resonance

In nature, there are light energy, water energy, mechanical energy, and electrical energy in life. In fact, sound can also generate energy. For example, two objects with the same frequency can vibrate when one object is struck. This phenomenon is called resonance, which proves that sound transmission can drive another object to vibrate, that is to say, the transmission process of sound is also a kind of energy transmission process.

Digital audio

In the previous section, we mainly introduced the physical phenomena of sound and the common concepts in sound, and will also unify the terminology in the follow-up explanation. This section mainly introduces the concept of digital audio.

In order to digitize the analog signal, this section will be divided into 3 concepts to explain the digital audio, namely sampling, quantization and coding. First, the analog signal must be sampled. The so-called sampling is to digitize the signal on the time axis. According to the Nyquist theorem (also called the sampling theorem), the sound is sampled at a frequency that is more than 2 times higher than the highest frequency of the sound. For high-quality audio signals, the frequency range is 20Hz ~ 20kHz, so the sampling frequency is generally 44.1 kHz, this ensures that the sampled sound can be digitized even up to 20kHz, so that the quality of the sound heard by the human ear will not be reduced after the digitization process. The so-called 44.1 kHz means that 44100 samples will be sampled in 1 s.

So, how should each specific sample be represented? This involves the second concept that will be explained: quantification. Quantization refers to the digitization of the signal on the amplitude axis. For example, a 16-bit binary signal is used to represent a sample of the sound, and the range represented by 16-bit is [-32768, 32767], there are 65536 possible values, so in the end The analog audio signal is also divided into 65536 layers in amplitude.

Since each component is a sample, how to store so many samples? This involves the third concept that will be explained: coding. The so-called encoding is to record sampled and quantized digital data in a certain format, such as sequential storage or compressed storage.

There are many formats involved here. The raw audio data is usually PCM (Pulse Code Modulation) data. Describing a piece of PCM data generally requires the following concepts: quantization format (sampleFormat), sampling rate (sampleRate), and number of channels (channel). Take CD sound quality as an example: the quantization format is 16 bit (2 byte), the sampling rate is 44100, and the number of channels is 2. This information describes the sound quality of the CD. As for the audio format, there is another concept used to describe its size, called the data bit rate, that is, the number of bits in 1s, which is used to measure the volume of audio data per unit time. And for CD sound quality data, what is the bit rate? The calculation is as follows:

44100 * 16 * 2 = 1378.125 kbps

So how much storage space does such CD-quality data need to occupy in one minute? The calculation is as follows:

1378.125 * 60 / 8 / 1024 = 10.09 MB

Of course, if the sampleFormat is more accurate (for example, 4 bytes are used to describe a sample), or the sampleRate is more dense (for example, the sampling rate of 48kHz), then the storage space will be larger, and the sound details that can be described will be The more precise. The stored binary data means that the analog signal is converted into a digital signal, and the binary data can be stored, played, copied, or other operations can be performed in the future.

Audio coding

The data sampling format of CD sound quality is mentioned above. It has been calculated that the storage space required per minute is about 10.1MB. If it is only stored on a CD or hard disk, it may be acceptable, but if it is to be online in real time on the network For transmission, the amount of data may be too large, so it must be compressed and encoded. One of the basic indicators of compression coding is the compression ratio, which is usually less than 1. Compression algorithms include lossy compression and lossless compression. No compression means that the decompressed data cannot be completely restored, and some information will be lost. The smaller the compression, the more information is lost, and the greater the distortion after the signal is restored. According to different application scenarios (including storage equipment, transmission network environment, playback equipment, etc.), different compression coding algorithms can be selected, such as PCM, WAV, AAC, MP3, Ogg, etc.

WAV encoding

WAV encoding is to add 44 bytes in front of the PCM data format, which are used to store the PCM sampling rate, number of channels, data format and other information.

Features: Good sound quality, a lot of software support.

Scene: Intermediate files for multimedia development, preservation of music and sound effect materials.

MP3 encoding

MP3 has a good compression ratio. MP3 files with medium and high bit rates that use LAME encoding (an implementation of MP3 encoding format) are very close to the source WAV files in terms of sound. Of course, in different application scenarios, appropriate parameters should be adjusted to To achieve the best results.

Features: The sound quality is good at 128 Kbit/s or higher, the compression ratio is relatively high, a lot of software and hardware are supported, and the compatibility is good.

Scenario: Music appreciation that requires compatibility at a high bit rate.

AAC encoding

AAC is a new generation of audio lossy compression technology. It derives three main encoding formats: LC-AAC, HE-AAC, and HE-AAC v2 through some additional encoding technologies (such as PS, SBR), etc. LC-AAC is a relatively traditional AAC. Relatively speaking, it is mainly used in the encoding of medium and high bit rate scenarios (>=80Kbit/s); HE-AAC is equivalent to AAC + SBR and is mainly used in the encoding of medium and low bit rates (< = 80Kbit/s); The newly launched HE-AAC v2 is equivalent to AAC + SBR + PS and is mainly used for encoding low bit rate scenes (<= 48Kbit/s). In fact, most encoders are set to <= 48Kbit/s to automatically enable PS technology, and> 48Kbit/s without PS, which is equivalent to ordinary HE-AAC.

Features: Excellent performance at a bit rate of less than 128Kbit/s, and is mostly used for audio coding in video.

Scene: Audio encoding below 128 Kbit/s is mostly used for encoding audio tracks in videos.

Ogg encoding

Ogg is a very promising code, which has excellent performance at various bit rates, especially in low to medium bit rate scenarios. In addition to good sound quality, Ogg is completely free. This lays a solid foundation for Ogg to get more support. Ogg has a very good algorithm that can achieve better sound quality with a smaller bit rate, 128 Kbit/s Ogg is better than 192kbit/s or even higher bitrate MP3. However, because there is no media service software support, Ogg-based digital broadcasting is not yet possible. Ogg's current support situation is not good enough, no matter whether it is software or hardware support, it cannot be compared with MP3.

Features: It can use a bit rate smaller than MP3 to achieve better sound quality than MP3. It has good performance at high, middle and low bit rates, compatibility is not good enough, and streaming media features are not supported.

Scene: Audio message scene of language chat.

Audio and video development learning materials added to the editor's C/C++ communication group: 960994558
learning materials have been shared in the group, looking forward to your joining
Insert picture description here
Insert picture description here

Audio rendering under the Android platform

Use of AudioTrack

Since AudioTrack is the lowest audio playback API provided by the Android SDK layer, only raw data PCM is allowed to be input. Compared with MediaPlayer, for a compressed audio file (such as MP3, AAC, etc.), it only needs to implement decoding operations and buffer control by itself. Because only the audio rendering end of AudioTrack is involved here, we will explain the encoding and decoding later, so this section only introduces how to use AudioTrack to render audio PCM raw data.

Configure AudioTrack

public AudioTrack(int streamType, int sampleRateInHz, int channelConfig, int audioFormat,
            int bufferSizeInBytes, int mode)

streamType: Android mobile phones provide multiple audio management strategies. When multiple processes in the system need to play audio, the management strategy will determine the final presentation effect. The optional values ​​of this parameter will be defined in the AudioManager class in the form of constants, mainly Include the following:

/**电话铃声 */
    public static final int STREAM_VOICE_CALL = AudioSystem.STREAM_VOICE_CALL;
    /** 系统铃声 */
    public static final int STREAM_SYSTEM = AudioSystem.STREAM_SYSTEM;
    /** 铃声*/
    public static final int STREAM_RING = AudioSystem.STREAM_RING;
    /** 音乐声 */
    public static final int STREAM_MUSIC = AudioSystem.STREAM_MUSIC;
    /** 警告声 */
    public static final int STREAM_ALARM = AudioSystem.STREAM_ALARM;
    /** 通知声 */
    public static final int STREAM_NOTIFICATION = AudioSystem.STREAM_NOTIFICATION;

sampleRateInHz : Sampling rate, that is, the played audio will be sampled every second. The list of available sampling frequencies is: 8000, 16000, 22050, 24000, 32000, 44100, 48000, etc. You can do it according to your own application scenarios Reasonable choice.

channelConfig : The configuration of the number of channels, the optional value is configured in the form of a constant in the class AudioFormat, commonly used are CHANNEL_IN_MONO (mono), CHANNEL_IN_STEREO (dual-channel), because most mobile phone microphones are pseudo-stereo collection For performance considerations, it is recommended to use mono for acquisition.

audioFormat : This parameter is used to configure the "data bit width", that is, the sampling format. The optional values ​​are defined as constants in the class AudioFormat, which are ENCODING_PCM_16BIT (compatible with all mobile phones), ENCODING_PCM_8BIT,

bufferSizeInBytes : Configure the size of the internal audio buffer. The AudioTrack class provides a function to help developers determine the bufferSizeInBytes. The prototype is as follows:

static public int getMinBufferSize(int sampleRateInHz, int channelConfig, int audioFormat)

In actual development, it is strongly recommended that the function calculate the size of the buffer that needs to be passed in instead of manually calculating it.

mode: AudioTrack provides two playback modes. The optional values ​​are defined in the class AudioTrack in the form of constants. One is MODE_STATIC. All data needs to be written into the playback buffer at one time. It is simple and efficient, and is usually used for playback. Ring tones, audio clips reminded by the system; the other is MODE_STREAM, which needs to continuously write audio data at a certain time interval. In theory, it can be applied to any audio playback scene.

Play

//当前播放实例是否初始化成功,如果处于初始化成功的状态并且未播放的状态,那么就调用 play
if (null != mAudioTrack && mAudioTrack.getState() != AudioTrack.STATE_UNINITIALIZED && mAudioTrack.getPlayState() != PLAYSTATE_PLAYING)
   mAudioTrack.play();

Destroy resources

public void release() {
    
    
        Log.d(TAG, "==release===");
        mStatus = Status.STATUS_NO_READY;
        if (mAudioTrack != null) {
    
    
            mAudioTrack.release();
            mAudioTrack = null;
        }
    }

For specific examples, please move to the AudioTracker part of the AudioPlay project. You need to put the pcm file in the raw directory in the project into the sdcard and directory.

Use of OpenSL ES

OpenSL ES official document

The full name of OpenSL ES (Open Sound Library for Embedded System) is the embedded audio acceleration standard. OpenSL ES is a free, cross-platform, hardware audio acceleration API optimized for embedded systems. It can provide standardized, high-performance, low-response-time audio functions for local application developers on embedded mobile multimedia devices The implementation method also realizes the direct cross-platform deployment of soft/hard audio performance, which not only reduces the difficulty of execution, but also promotes the development of the advanced audio market. Insert picture description here
The above figure describes the architecture of OpenSL ES. In Android, High Level Audio Libs is the audio Java layer API input and output, which belongs to the high-level API. Relatively speaking, OpenSL ES is a low-level API that belongs to the C language API. In development, the advanced API is generally used directly, unless performance bottlenecks are encountered, such as voice real-time chat, 3D Audio, some Effects, etc., developers can directly develop applications based on OpenSL ES audio through C/C++.

Before using the OpenSL ES API, you need to import the OpenSL ES header file, the code is as follows:

// 这是标准的OpenSL ES库
#include <SLES/OpenSLES.h>
// 这里是针对安卓的扩展,如果要垮平台则需要注意
#include <SLES/OpenSLES_Android.h>

Create engine and get engine interface

void createEngine() {
    
    
        // 音频的播放,就涉及到了,OpenLSES
        // TODO 第一大步:创建引擎并获取引擎接口
        // 1.1创建引擎对象:SLObjectItf engineObject
        SLresult result = slCreateEngine(&engineObj, 0, NULL, 0, NULL, NULL);
        if (SL_RESULT_SUCCESS != result) {
    
    
            return;
        }
 
        // 1.2 初始化引擎
        result = (*engineObj) ->Realize(engineObj, SL_BOOLEAN_FALSE);
        if (SL_BOOLEAN_FALSE != result) {
    
    
            return;
        }
 
        // 1.3 获取引擎接口 SLEngineItf engineInterface
        result = (*engineObj) ->GetInterface(engineObj, SL_IID_ENGINE, &engine);
        if (SL_RESULT_SUCCESS != result) {
    
    
            return;
        }
 
         
    }

Set up the mixer

// TODO 第二大步 设置混音器
        // 2.1 创建混音器:SLObjectItf outputMixObject
        result = (*engine)->CreateOutputMix(engine, &outputMixObj, 0, 0, 0);
 
        if (SL_RESULT_SUCCESS != result) {
    
    
            return;
        }
 
        // 2.2 初始化 混音器
        result = (*outputMixObj)->Realize(outputMixObj, SL_BOOLEAN_FALSE);
        if (SL_BOOLEAN_FALSE != result) {
    
    
            return;
        }

Create player

// TODO 第三大步 创建播放器
    // 3.1 配置输入声音信息
    // 创建buffer缓冲类型的队列 2个队列
    SLDataLocator_AndroidSimpleBufferQueue locBufq = {
    
    SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2};
    // pcm数据格式
    // SL_DATAFORMAT_PCM:数据格式为pcm格式
    // 2:双声道
    // SL_SAMPLINGRATE_44_1:采样率为44100(44.1赫兹 应用最广的,兼容性最好的)
    // SL_PCMSAMPLEFORMAT_FIXED_16:采样格式为16bit (16位)(2个字节)
    // SL_PCMSAMPLEFORMAT_FIXED_16:数据大小为16bit (16位)(2个字节)
    // SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT:左右声道(双声道)  (双声道 立体声的效果)
    // SL_BYTEORDER_LITTLEENDIAN:小端模式
    SLDataFormat_PCM formatPcm = {
    
    SL_DATAFORMAT_PCM, (SLuint32) mChannels, mSampleRate,
                                  (SLuint32) mSampleFormat, (SLuint32) mSampleFormat,
                                  mChannels == 2 ? 0 : SL_SPEAKER_FRONT_CENTER,
                                  SL_BYTEORDER_LITTLEENDIAN};
    /*
     * Enable Fast Audio when possible:  once we set the same rate to be the native, fast audio path
     * will be triggered
     */
    if (mSampleRate) {
    
    
        formatPcm.samplesPerSec = mSampleRate;
    }
 
    // 数据源 将上述配置信息放到这个数据源中
    SLDataSource audioSrc = {
    
    &locBufq, &formatPcm};
 
    // 3.2 配置音轨(输出)
    // 设置混音器
    SLDataLocator_OutputMix locOutpuMix = {
    
    SL_DATALOCATOR_OUTPUTMIX, mAudioEngine->outputMixObj};
    SLDataSink audioSink = {
    
    &locOutpuMix, nullptr};
 
    /*
     * create audio player:
     *     fast audio does not support when SL_IID_EFFECTSEND is required, skip it
     *     for fast audio case
     */
    //  需要的接口 操作队列的接口
    const SLInterfaceID ids[3] = {
    
    SL_IID_BUFFERQUEUE, SL_IID_VOLUME, SL_IID_EFFECTSEND};
    const SLboolean req[3] = {
    
    SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE};
 
    //  3.3 创建播放器
    result = (*mAudioEngine->engine)->CreateAudioPlayer(mAudioEngine->engine, &mPlayerObj,
                                                        &audioSrc, &audioSink,
                                                        mSampleRate ? 2 : 3, ids, req);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("CreateAudioPlayer failed: %d", result);
        return false;
    }
 
    //  3.4 初始化播放器:mPlayerObj
    result = (*mPlayerObj)->Realize(mPlayerObj, SL_BOOLEAN_FALSE);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj Realize failed: %d", result);
        return false;
    }
//  3.5 获取播放器接口:SLPlayItf mPlayerObj
    result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_PLAY, &mPlayer);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj GetInterface failed: %d", result);
        return false;
    }

Set playback callback function

// TODO 第四大步:设置播放回调函数
    // 4.1 获取播放器队列接口:SLAndroidSimpleBufferQueueItf mBufferQueue
    result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_BUFFERQUEUE, &mBufferQueue);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj GetInterface failed: %d", result);
        return false;
    }
// 4.2 设置回调 void playerCallback(SLAndroidSimpleBufferQueueItf bq, void *context)
    result = (*mBufferQueue)->RegisterCallback(mBufferQueue, playerCallback, this);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj RegisterCallback failed: %d", result);
        return false;
    }
 
    mEffectSend = nullptr;
    if (mSampleRate == 0) {
    
    
        result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_EFFECTSEND, &mEffectSend);
        if (result != SL_RESULT_SUCCESS) {
    
    
            LOGE("mPlayerObj GetInterface failed: %d", result);
            return false;
        }
    }
 
    result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_VOLUME, &mVolume);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj GetInterface failed: %d", result);
        return false;
    }

Set player status

  // TODO 第五大步:设置播放器状态为播放状态
    result = (*mPlayer)->SetPlayState(mPlayer, SL_PLAYSTATE_PLAYING);
    if (result != SL_RESULT_SUCCESS) {
    
    
        LOGE("mPlayerObj SetPlayState failed: %d", result);
        return false;
    }

Manually activate the callback function

void OpenSLAudioPlay::enqueueSample(void *data, size_t length) {
    
    
    // 必须等待一帧音频播放完毕后才可以 Enqueue 第二帧音频
    pthread_mutex_lock(&mMutex);
    if (mBufSize < length) {
    
    
        mBufSize = length;
        if (mBuffers[0]) {
    
    
            delete[] mBuffers[0];
        }
        if (mBuffers[1]) {
    
    
            delete[] mBuffers[1];
        }
        mBuffers[0] = new uint8_t[mBufSize];
        mBuffers[1] = new uint8_t[mBufSize];
    }
    memcpy(mBuffers[mIndex], data, length);
    // TODO 第六步:手动激活回调函数
    (*mBufferQueue)->Enqueue(mBufferQueue, mBuffers[mIndex], length);
    mIndex = 1 - mIndex;
}

Release resources

extern "C"
JNIEXPORT void JNICALL
Java_com_devyk_audioplay_AudioPlayActivity_nativeStopPcm(JNIEnv *env, jclass type) {
    
    
    isPlaying = false;
    if (slAudioPlayer) {
    
    
        slAudioPlayer->release();
        delete slAudioPlayer;
        slAudioPlayer = nullptr;
    }
    if (pcmFile) {
    
    
        fclose(pcmFile);
        pcmFile = nullptr;
    }
}

For the complete code, please refer to the OpenSL ES section in the repository. Note: You need to put the pcm file in raw into the sdcard root directory.

to sum up

This article mainly introduces some basic knowledge of audio and the use of AudioTrack and OpenSL ES to render raw audio data. You can deepen your understanding based on my source code.

The final page effect:
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_52622200/article/details/113648341