Audio encoding to aac on the iOS platform

When Xiao Cheng introduced decoding aac before, he used fadd, and mentioned that if you want to encode into aac format, you can use facc, fdk-aac, etc., but using fdk-aac and other encoding methods are all soft encoding, in cpu The consumption will be significantly greater than the hardware encoding.

The advantage of hard coding is that the function of the hardware chip can be used to complete the coding task at high speed and low power consumption.

On the iOS platform, it also provides the ability to hardcode, and you only need to call the corresponding SDK interface during APP development.

This SDK interface is AudioConverter.

This article introduces how to call AudioConverter to complete the hard coding of aac on the iOS platform.

Judging from the name, AudioConverter is a format converter. Here Xiaocheng uses it to convert data in pcm format into data in aac format.

For media formats (encoding formats or packaging formats), readers can follow the official account of "Guangzhou Xiaocheng" and check related articles in the "Audio Video->Basic Concepts and Processes" menu.

AudioConverter implements conversion in memory and does not need to write files, while the ExtAudioFile interface operates on files, and uses AudioConerter internally to convert formats, that is, readers can also use ExtAudioFile interface in certain scenarios.

How to use AudioConverter? Basically, calling the interface needs to read the corresponding header file, and understand how to call it by reading the documentation comments.

Xiao Cheng will demonstrate how to convert data in pcm format into data in aac format.

After the demo code, Xiaocheng only gives a simple explanation. Readers who need it, please read the code patiently to understand and apply it to their own development scenarios.

The following example demonstrates the implementation of converting from pcm to aac (for example, the implementation of saving recording data as aac).

typedef struct
{
    void *source;
    UInt32 sourceSize;
    UInt32 channelCount;
    AudioStreamPacketDescription *packetDescriptions;
}FillComplexInputParam;

// 填写源数据,即pcm数据
OSStatus audioConverterComplexInputDataProc(  AudioConverterRef               inAudioConverter,
                                            UInt32*                         ioNumberDataPackets,
                                            AudioBufferList*                ioData,
                                            AudioStreamPacketDescription**  outDataPacketDescription,
                                            void*                           inUserData)
{
    FillComplexInputParam* param = (FillComplexInputParam*)inUserData;
    if (param->sourceSize <= 0) {
        *ioNumberDataPackets = 0;
        return -1;
    }
    ioData->mBuffers[0].mData = param->source;
    ioData->mBuffers[0].mNumberChannels = param->channelCount;
    ioData->mBuffers[0].mDataByteSize = param->sourceSize;
    *ioNumberDataPackets = 1;
    param->sourceSize = 0;
    param->source = NULL;
    return noErr;
}

typedef struct _tagConvertContext {
    AudioConverterRef converter;
    int samplerate;
    int channels;
}ConvertContext;

// init
// 最终用AudioConverterNewSpecific创建ConvertContext,并设置比特率之类的属性
void* convert_init(int sample_rate, int channel_count)
{
    AudioStreamBasicDescription sourceDes;
    memset(&sourceDes, 0, sizeof(sourceDes));
    sourceDes.mSampleRate = sample_rate;
    sourceDes.mFormatID = kAudioFormatLinearPCM;
    sourceDes.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger;
    sourceDes.mChannelsPerFrame = channel_count;
    sourceDes.mBitsPerChannel = 16;
    sourceDes.mBytesPerFrame = sourceDes.mBitsPerChannel/8*sourceDes.mChannelsPerFrame;
    sourceDes.mBytesPerPacket = sourceDes.mBytesPerFrame;
    sourceDes.mFramesPerPacket = 1;
    sourceDes.mReserved = 0;

    AudioStreamBasicDescription targetDes;
    memset(&targetDes, 0, sizeof(targetDes));
    targetDes.mFormatID = kAudioFormatMPEG4AAC;
    targetDes.mSampleRate = sample_rate;
    targetDes.mChannelsPerFrame = channel_count;
    UInt32 size = sizeof(targetDes);
    AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &targetDes);

    AudioClassDescription audioClassDes;
    memset(&audioClassDes, 0, sizeof(AudioClassDescription));
    AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders, sizeof(targetDes.mFormatID), &targetDes.mFormatID, &size);
    int encoderCount = size / sizeof(AudioClassDescription);
    AudioClassDescription descriptions[encoderCount];
    AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(targetDes.mFormatID), &targetDes.mFormatID, &size, descriptions);
    for (int pos = 0; pos < encoderCount; pos ++) {
        if (targetDes.mFormatID == descriptions[pos].mSubType && descriptions[pos].mManufacturer == kAppleSoftwareAudioCodecManufacturer) {
            memcpy(&audioClassDes, &descriptions[pos], sizeof(AudioClassDescription));
            break;
        }
    }

    ConvertContext *convertContex = malloc(sizeof(ConvertContext));
    OSStatus ret = AudioConverterNewSpecific(&sourceDes, &targetDes, 1, &audioClassDes, &convertContex->converter);
    if (ret == noErr) {
        AudioConverterRef converter = convertContex->converter;

        tmp = kAudioConverterQuality_High;
        AudioConverterSetProperty(converter, kAudioConverterCodecQuality, sizeof(tmp), &tmp);

        UInt32 bitRate = 96000;
        UInt32 size = sizeof(bitRate);
        ret = AudioConverterSetProperty(converter, kAudioConverterEncodeBitRate, size, &bitRate);
    }
    else {
        free(convertContex);
        convertContex = NULL;
    }

    return convertContex;
}

// converting
void convert(void* convertContext, void* srcdata, int srclen, void** outdata, int* outlen)
{
    ConvertContext* convertCxt = (ConvertContext*)convertContext;
    if (convertCxt && convertCxt->converter) {
        UInt32 theOuputBufSize = srclen;  
        UInt32 packetSize = 1;
        void *outBuffer = malloc(theOuputBufSize);
        memset(outBuffer, 0, theOuputBufSize);

        AudioStreamPacketDescription *outputPacketDescriptions = NULL;
        outputPacketDescriptions = (AudioStreamPacketDescription*)malloc(sizeof(AudioStreamPacketDescription) * packetSize);

        FillComplexInputParam userParam;
        userParam.source = srcdata;
        userParam.sourceSize = srclen;
        userParam.channelCount = convertCxt->channels;
        userParam.packetDescriptions = NULL;

        OSStatus ret = noErr;

        AudioBufferList* bufferList = malloc(sizeof(AudioBufferList));
        AudioBufferList outputBuffers = *bufferList;
        outputBuffers.mNumberBuffers = 1;
        outputBuffers.mBuffers[0].mNumberChannels = convertCxt->channels;
        outputBuffers.mBuffers[0].mData = outBuffer;
        outputBuffers.mBuffers[0].mDataByteSize = theOuputBufSize;
        ret = AudioConverterFillComplexBuffer(convertCxt->converter, audioConverterComplexInputDataProc, &userParam, &packetSize, &outputBuffers, outputPacketDescriptions);
        if (ret == noErr) {
            if (outputBuffers.mBuffers[0].mDataByteSize > 0) {

                NSData* rawAAC = [NSData dataWithBytes:outputBuffers.mBuffers[0].mData length:outputBuffers.mBuffers[0].mDataByteSize];
                *outdata = malloc([rawAAC length]);
                memcpy(*outdata, [rawAAC bytes], [rawAAC length]);
                *outlen = (int)[rawAAC length];
// 测试转换出来的aac数据,保存成adts-aac文件
#if 1
                int headerLength = 0;
                char* packetHeader = newAdtsDataForPacketLength((int)[rawAAC length], convertCxt->samplerate, convertCxt->channels, &headerLength);
                NSData* adtsPacketHeader = [NSData dataWithBytes:packetHeader length:headerLength];
                free(packetHeader);
                NSMutableData* fullData = [NSMutableData dataWithData:adtsPacketHeader];
                [fullData appendData:rawAAC];

                NSFileManager *fileMgr = [NSFileManager defaultManager];
                NSString *filepath = [NSHomeDirectory() stringByAppendingFormat:@"/Documents/test%p.aac", convertCxt->converter];
                NSFileHandle *file = nil;
                if (![fileMgr fileExistsAtPath:filepath]) {
                    [fileMgr createFileAtPath:filepath contents:nil attributes:nil];
                }
                file = [NSFileHandle fileHandleForWritingAtPath:filepath];
                [file seekToEndOfFile];
                [file writeData:fullData];
                [file closeFile];
#endif
            }
        }

        free(outBuffer);
        if (outputPacketDescriptions) {
            free(outputPacketDescriptions);
        }
    }
}

// uninit
// ...

int freqIdxForAdtsHeader(int samplerate)
{
    /**
     0: 96000 Hz
     1: 88200 Hz
     2: 64000 Hz
     3: 48000 Hz
     4: 44100 Hz
     5: 32000 Hz
     6: 24000 Hz
     7: 22050 Hz
     8: 16000 Hz
     9: 12000 Hz
     10: 11025 Hz
     11: 8000 Hz
     12: 7350 Hz
     13: Reserved
     14: Reserved
     15: frequency is written explictly
     */
    int idx = 4;
    if (samplerate >= 7350 && samplerate < 8000) {
        idx = 12;
    }
    else if (samplerate >= 8000 && samplerate < 11025) {
        idx = 11;
    }
    else if (samplerate >= 11025 && samplerate < 12000) {
        idx = 10;
    }
    else if (samplerate >= 12000 && samplerate < 16000) {
        idx = 9;
    }
    else if (samplerate >= 16000 && samplerate < 22050) {
        idx = 8;
    }
    else if (samplerate >= 22050 && samplerate < 24000) {
        idx = 7;
    }
    else if (samplerate >= 24000 && samplerate < 32000) {
        idx = 6;
    }
    else if (samplerate >= 32000 && samplerate < 44100) {
        idx = 5;
    }
    else if (samplerate >= 44100 && samplerate < 48000) {
        idx = 4;
    }
    else if (samplerate >= 48000 && samplerate < 64000) {
        idx = 3;
    }
    else if (samplerate >= 64000 && samplerate < 88200) {
        idx = 2;
    }
    else if (samplerate >= 88200 && samplerate < 96000) {
        idx = 1;
    }
    else if (samplerate >= 96000) {
        idx = 0;
    }

    return idx;
}

int channelIdxForAdtsHeader(int channelCount)
{
    /**
     0: Defined in AOT Specifc Config
     1: 1 channel: front-center
     2: 2 channels: front-left, front-right
     3: 3 channels: front-center, front-left, front-right
     4: 4 channels: front-center, front-left, front-right, back-center
     5: 5 channels: front-center, front-left, front-right, back-left, back-right
     6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
     7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
     8-15: Reserved
     */
    int ret = 2;
    if (channelCount == 1) {
        ret = 1;
    }
    else if (channelCount == 2) {
        ret = 2;
    }

    return ret;
}

/**
 *  Add ADTS header at the beginning of each and every AAC packet.
 *  This is needed as MediaCodec encoder generates a packet of raw
 *  AAC data.
 *
 *  Note the packetLen must count in the ADTS header itself.
 *  See: http://wiki.multimedia.cx/index.php?title=ADTS
 *  Also: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio#Channel_Configurations
 **/
char* newAdtsDataForPacketLength(int packetLength, int samplerate, int channelCount, int* ioHeaderLen) {
    int adtsLength = 7;
    char *packet = malloc(sizeof(char) * adtsLength);
    // Variables Recycled by addADTStoPacket
    int profile = 2;  //AAC LC
    //39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;
    int freqIdx = freqIdxForAdtsHeader(samplerate);
    int chanCfg = channelIdxForAdtsHeader(channelCount);  //MPEG-4 Audio Channel Configuration.
    NSUInteger fullLength = adtsLength + packetLength;
    // fill in ADTS data
    packet[0] = (char)0xFF;
// 11111111  = syncword
    packet[1] = (char)0xF9;
// 1111 1 00 1  = syncword MPEG-2 Layer CRC
    packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
    packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
    packet[4] = (char)((fullLength&0x7FF) >> 3);
    packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
    packet[6] = (char)0xFC;
    *ioHeaderLen = adtsLength;
    return packet;
}

In the above code, there are two important functions, one is the initialization function, which creates the AudioConverterRef, and the other is the conversion function, which should be called repeatedly to convert different pcm data.

In addition, in the example, the aac data converted by pcm is saved, and the saved file can be used for playback.

Note that AudioConverter converts all audio raw data. As for combining into adts-aac or encapsulating it into Apple's m4a file, it is determined by the program.

Explain here, adts-aac is a representation of aac data, that is, before each frame of aac raw data, add a frame information (including the length of each frame, sampling rate, number of channels, etc.), plus frame information After that, each frame of aac can be played individually. Moreover, adts-aac is not encapsulated, that is, there is no specific file header and file structure.

adts is the abbreviation of Audio Data Transport Stream.

Of course, readers can also encapsulate the converted aac data into m4a format. This encapsulation format is the file header first, and then the bare audio data:

{packet-table}{audio_data}{trailer}, after the header information is the audio raw data, and the audio data does not carry packet information.

So far, the implementation of converting pcm to aac data on the iOS platform has been introduced.


To sum up, this article describes how to use the AudioConverter interface provided by the iOS platform to convert data in pcm format to aac format. The article also introduces how to save it as an adts-aac file. Readers can use this method to check whether the converted aac data is correct.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324836334&siteId=291194637
Recommended