Crop m4a on Android platform

The operation of setting ringtones on Android phones is relatively flexible. When a general reader hears a favorite song, he can immediately cut the song, cut it into segments, and then set it as a ringtone (phone ringtone, alarm clock, etc.) through the interface of the system. ringtones, etc.).

The premise is that the APP that plays the song needs to provide the function of cropping the song.

So, how to realize the function of intercepting a segment of an audio file?

Xiao Cheng naturally thought of using the FFmpeg command to achieve this. The content of "Extracting pictures from videos" introduced earlier can extract segments, such as:

ffmpeg -ss 10 -i audio.mp3 -t 5 out.mp3 The
above command, starting from the 10th second, extracts a 5 second segment.

Readers can follow the WeChat public account of "Guangzhou Xiaocheng", and check the contents of the menu item "Audio Video -> FFmpeg Structure & Application".

However, the FFmpeg command can be easily used on the PC, but cannot be used directly on the mobile APP.

Xiaocheng introduces another method for cropping audio files for the Android platform, and it is assumed that the original audio file is in the m4a package format.

This article describes how to crop m4a audio files on the Android platform and get an audio clip.

There are basically two schemes to achieve this function:

  • One is to decode the original audio file, then extract the corresponding segment, and then encode the segment.
  • The second is to directly locate the starting point of the crop, extract the segment, and then save it as a new audio file.

In contrast, the first scheme has a more obvious consumption in performance, but this scheme can take all kinds of audio formats (as long as it can be decoded and finally encoded into a fixed format).

The second scheme needs to consider the implementation of different formats (including the original audio and the final audio format), but it is superior in performance and saves more time than the first scheme.

Xiaocheng introduces the implementation of the second scheme here, and only considers the interception and generation of m4a files.

The second solution, in a nutshell, is the parsing of the m4a format and the generation of the m4a file .

(1) Introduction to m4a

The m4a file is actually an mp4 file, which generally only stores audio streams. m4a is the name given by Apple to distinguish general mp4 files with video frames.

Parsing the m4a file format is parsing the mp4 file format, which is the same for writing files.

To intercept m4a clips, it is necessary to parse the m4a file format first to obtain relevant information (such as sample rate, number of channels, number of samples in one frame, total number of frames, length of each frame, offset of each frame, etc. ), and parsing the file format, you need to understand the mp4 file format.

MP4 is composed of atoms (or boxes), and all data (including various information and naked audio data) are placed in atoms.

Each atom consists of three fields:

len (the length of the entire atom, 4Byte),
type (the type of the atom, 4Byte),
data (the data saved by the atom).

Atoms can be nested.

There are many types of atoms, and not all types must exist to form a valid mp4 file. But there are several types of atoms that must be present:

ftyp (identifying file format),
stts (number of samples per frame),
stsz (length of each frame),
stsc (relationship table between frames and chunks),
mvhd (information such as duration),
mdat (raw data),
moov Wait.

The specific structure (including the meaning of each atom, the size and meaning of each field) can be viewed on the resources on the network (it is best to see the field table of the atom).

for example:
all atoms
atom explained
duration information

(2) Implementation of the plan

The implementation of the second scheme can use ringdroid, an open source project.

Ringdroid is maintained on git, and its latest version uses a decoding-re-encoding scheme.

You can retrieve the earlier version of ringdroid, which includes CheapAAC, CheapMP3, etc., which process audio in different formats and directly intercept it.

CheapAAC's ReadFile completes the parsing of m4a files, and WriteFile completes the writing of new m4a files.

CheapAAC also implements a gain calculation that can be used to display audio waveforms.

For interception, several pieces of information are very important: {frame length is the number of bytes}, {frame offset}, and interception can be achieved according to these two sets.

The length of the frame (and thus the total number of frames) is determined when parsing stsz, and the offset of the frame is determined when parsing mdat.

Readers can read the code of CheapAAC in detail to understand the interception process. Xiaocheng only mentions the problems of CheapAAC here, and it is also a problem that readers may encounter.

(1) Incompatible m4a files encoded by neroAacEnc

For m4a files encoded by neroAacEnc, when CheapAAC parseMdat, the raw data cannot be parsed normally. The reason is that neroAacEnc adds 8 bytes before the raw data. These 8 bytes will make the calculated offset of each frame equal to No, the data of each frame written in the subsequent WriteFile is incorrect.

Consider skipping 8 bytes to solve this problem (when judging to be nero-encoded m4a):

        if (mMdatOffset > 0 && mMdatLength > 0) {
            final int neroAACFrom = 570;
            int neroSkip = 0;
            if (mMdatOffset - neroAACFrom > 0) {
                FileInputStream cs = new FileInputStream(mInputFile);
                cs.skip(mMdatOffset - neroAACFrom);
                final int flagSize = 14;
                byte[] buffer = new byte[flagSize];
                cs.read(buffer, 0, flagSize);
                if (buffer[0] == 'N' && buffer[1] == 'e' && buffer[2] == 'r' && buffer[3] == 'o' && buffer[5] == 'A'
                        && buffer[6] == 'A' && buffer[7] == 'C' && buffer[9] == 'c' && buffer[10] == 'o'
                        && buffer[11] == 'd' && buffer[12] == 'e' && buffer[13] == 'c') {
                    neroSkip = 8;
                }
                cs.close();
            }

            stream = new FileInputStream(mInputFile);
            mMdatOffset += neroSkip; // slip 8 Bytes if need
            stream.skip(mMdatOffset);
            mOffset = mMdatOffset;
            parseMdat(stream, mMdatLength);
        } else {
            throw new java.io.IOException("Didn't find mdat");
        }

(2) The length of the clipping segment is incorrect

The duration of the clipped clip is not reset, and the duration of the original file is still used.

The duration of the segment can be reset in WriteFile, but it should be noted that if mediaplayer is used for playback, the following code cannot be added, because the decoding process of mediaplayer is inconsistent with that of FFmpeg. If it is finally handed over to FFmpeg to decode, you need to reset the duration of the segment.

        // 在写完stco之后,增加:
        long time = System.currentTimeMillis() / 1000;
        time += (66 * 365 + 16) * 24 * 60 * 60;  // number of seconds between 1904 and 1970
        byte[] createTime = new byte[4];
        createTime[0] = (byte)((time >> 24) & 0xFF);
        createTime[1] = (byte)((time >> 16) & 0xFF);
        createTime[2] = (byte)((time >> 8) & 0xFF);
        createTime[3] = (byte)(time & 0xFF);
        long numSamples = 1024 * numFrames;
        long durationMS = (numSamples * 1000) / mSampleRate;
        if ((numSamples * 1000) % mSampleRate > 0) {  // round the duration up.
            durationMS++;
        }
        byte[] numSaplesBytes = new byte[] {
                (byte)((numSamples >> 26) & 0XFF),
                (byte)((numSamples >> 16) & 0XFF),
                (byte)((numSamples >> 8) & 0XFF),
                (byte)(numSamples & 0XFF)
        };
        byte[] durationMSBytes = new byte[] {
                (byte)((durationMS >> 26) & 0XFF),
                (byte)((durationMS >> 16) & 0XFF),
                (byte)((durationMS >> 8) & 0XFF),
                (byte)(durationMS & 0XFF)
        };

        int type = kMDHD;
        Atom atom = mAtomMap.get(type);
        if (atom == null) {
            atom = new Atom();
            mAtomMap.put(type, atom);
        }
        atom.data = new byte[] {
                0, // version, 0 or 1
                0, 0, 0,  // flag
                createTime[0], createTime[1], createTime[2], createTime[3],  // creation time.
                createTime[0], createTime[1], createTime[2], createTime[3],  // modification time.
                0, 0, 0x03, (byte)0xE8,  // timescale = 1000 => duration expressed in ms.  1000为单位
                durationMSBytes[0], durationMSBytes[1], durationMSBytes[2], durationMSBytes[3],  // duration in ms.
                0, 0,     // languages
                0, 0      // pre-defined;
        };
        atom.len = atom.data.length + 8;

        type = kMVHD;
        atom = mAtomMap.get(type);
        if (atom == null) {
            atom = new Atom();
            mAtomMap.put(type, atom);
        }
        atom.data = new byte[] {
                0, // version, 0 or 1
                0, 0, 0, // flag
                createTime[0], createTime[1], createTime[2], createTime[3],  // creation time.
                createTime[0], createTime[1], createTime[2], createTime[3],  // modification time.
                0, 0, 0x03, (byte)0xE8,  // timescale = 1000 => duration expressed in ms.  1000为单位
                durationMSBytes[0], durationMSBytes[1], durationMSBytes[2], durationMSBytes[3],  // duration in ms.
                0, 1, 0, 0,  // rate = 1.0
                1, 0,        // volume = 1.0
                0, 0,        // reserved
                0, 0, 0, 0,  // reserved
                0, 0, 0, 0,  // reserved
                0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  // unity matrix for video, 36bytes
                0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0x40, 0, 0, 0,
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 0,  // pre-defined
                0, 0, 0, 2   // next track ID, 4bytes
        };
        atom.len = atom.data.length + 8;

(3) Other concepts

There are some audio concepts involved in CheapAAC, Xiao Cheng briefly explained. Readers can also follow the WeChat public account of "Guangzhou Xiaocheng" and check the articles under the "Audio and Video" menu.

track, that is, a track (audio or video), also called a stream;
sample, understood as a frame (different from the concept of a sample), for aac, the number of samples included in a frame is fixed, both 1024;
chunk, that is, a block , is a collection of frames.

Example of using the neroAcc command:

ffmpeg -i "1.mp3" -f wav - | neroAacEnc -br 32000 -ignorelength -if - -of "1.m4a" -br
bit rate
-lc/-he/-hev2 encoding method, the default is he
-if input file -of
output-file
-ignorelength is used when taking other output (such as ffmpeg) as input

So far, the implementation of cropping m4a on the Android platform has been introduced.


To sum up, this article introduces the implementation of using CheapAAC to crop m4a to obtain fragment files on the Android platform, and also introduces the concept of m4a structure and possible problems.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325078086&siteId=291194637