Java Mp3 converts WAV/PCM audio data, decodes and analyzes in detail, extracts each frame of data collection/bit stream/play, one line of code!

preface

Hello everyone! I am Atom Jun
1. Because Java itself only supports wav, and lacks mp3 decoders, so Java's built-in cannot process mp3. This MPEG-* audio lossy compression standard encoding , let alone use Java's audio format and audio stream can be resolved.
2. So this conversion needs to use colorful1.1, a pure Java-Pc cross-platform tool framework.
Note: colorful only supports Java19, because it was developed to solve various troubles encountered in Java, so it can help a lot in development.
3. To put it bluntly, Mp3 is a compression technology. Its advantage is that it takes up less space after compression , and is suitable for storage and use on mobile devices. And it also maintains the original sound quality very well
4. Then we can start: novice installation tutorial -> click me to view , complete open source, free, and commercially available.

decoding process

PCM carries out MP3 compression:
Encapsulation: with 1152 PCM sampling values, encapsulate into MP3 data frame with fixed length, and frame is the minimum composition unit of MP3 file simultaneously.
Utilize the data frame: when decoding, use the information in the data frame to restore 1152 PCM sampling values.
Granularity group: The 1152 sampling values ​​will be divided into 2 granularity groups, each granularity group contains 576 sampling values.
Data frame: An MP3 data frame is divided into 5 parts, frame header, CRC check value, side information, main data, and additional data.

Mp3 structure

MP3 files are generally divided into three parts : ID3V2, Frame, and ID3V1 also belong to frames, which are called tag frames, and the Frame part is called data frames. There may not be tag frames in MP3 files, but there must be data frames.

The first part of ID3V2
in mp3 - it contains some information such as author, composer, album, etc. Note that the length is not fixed and
expands the amount of information of ID3V1.
Audio data Frame :
1. Consists of a series of data frames, the number of frames is determined by the file size and frame length.
2. The length of each Frame may be unequal or equal, determined by the bit rate.
3. Each Frame It is divided into two parts: frame header and data entity.
4. Frame header: records the bit rate, sampling rate, version and other information of mp3.
5. Each frame is independent of each other. If CRC check is enabled , the frame header is followed by 2-byte CRC check, there may be 32 bytes of additional information behind it.
ID3V1 : Contains information such as author, composer, album, etc., the length is fixed at 128 , the standard is not comprehensive, there is little information stored, lyrics cannot be stored, album cover, pictures, etc.
ID3 V2.0 is a fairly complete standard, But it brings difficulties in writing software. Although there are many people in favor of this format, very few actually realize it in software. Now most MP3 still use ID3 V1.0 standard. This standard uses the last 128 bytes of the end of the MP3 file to store ID3 information.
Descriptive information
Some mp3s may carry some additional descriptive information.

ID3V2 analysis

At the beginning, the length is 10 bytes, and the structure is as follows:

tag header

Header identification : 3 bytes, composed of characters ID3, indicating that this is an ID3v2 label;
main version number : version number ID3V2.3 records 3
minor version numbers : here it is recorded as 0
Label size : represents the following The total size of all tagged frames. It occupies a total of four bytes, but according to the requirements of the ID3v2 standard, each byte only uses 7 bits, the highest bit is not used, and is always 0. For example: if the total size of the subsequent tag frame is 257, then when writing , it must be: 513
When the value is written: it must be writtenin big-endian format , because the calculation result needs to discard the highest bit 0 of each byte
    public static int discard(int num)
    {
        int result = 0, mask = 0x7F;
        while ((mask ^ 0x7FFFFFFF)==1)
        {
            result = num & ~mask;
            result <<= 1;
            result |= num & mask;
            mask = ((mask + 1) << 8) - 1;
            num = result;
        }
        return result;
    }

recover:

    public static int recovery(int num) {
        byte[] D = new byte[4];
        D[0] = (byte) (num & 0xff);
        D[1] = (byte) (num >> 8 & 0xff);
        D[2] = (byte) (num >> 16 & 0xff);
        D[3] = (byte) (num >> 24 & 0xff);
        int Result = 0x0;
        Result = Result | D[0];
        Result = Result | (D[1] << 7);
        Result = Result | (D[2] << 14);
        Result = Result | (D[3] << 21);
        return Result;
    }
If you don’t understand big and small, you can read my article: Click me to view

label frame

Data structure definition:
TIT2 = title indicates the title of the song
TPE1 = author
TALB = album
TRCK = track format: N/M where N is the Nth song in the album, M is the total M songs in the album, N and M TYER = the number expressed in ASCII code
TYER = the year is the number expressed in ASCII code
TCON = the type is directly represented by a string
COMM = remark format: "eng/0 remark content", where eng represents the natural language used in the remark
Size = represents the frame It is not clear what the actual meaning of the mark is.
flags = represents the size of the frame content, it should be noted here that it must also be in big-endian format when writing.

Frame analysis - label frame

Frame header: 4 bytes long, there may be two bytes of CRC check after the frame header, the existence of these two bytes depends on the 16th bit of the frame header information, if it is 0, there is no checksum after the frame header, for 1 means there is a checksum, and the length of the checksum value is 2 bytes.
(The following is additional information of variable length. For standard MP3 files, the length is 32 bytes. The text content in this bracket is subject to discussion. I haven't seen such a file for a while), followed by the compressed sound data, which will be decoded when the decoder reads here.

The first two bytes of the data frame of all Mp3 files must be "FF FA" or "FF FB" .

name

bit length

illustrate

Synchronization information

11

2nd byte

All bits are 1, and the first byte is always FF.

Version

2

00-MPEG 2.5, 01-Undefined, 10-MPEG 2, 11-MPEG 1

layer

2

00-Undefined, 01-Layer 3, 10-Layer 2, 11-Layer 1

CRC check

1

0-check, 1-no check

bit rate

4

3rd byte

取样率,单位是kbps,如:采用MPEG-1 Layer 3,64kbps是,值为0101。

bits V1,L1---V1,L2---V1,L3---V2,L1---V2,L2---V2,L3

0000--free--free--free--free--free--free

0001--32--32--32--32(32)--32(8)--8 (8)

0010--64--48--40--64(48)--48(16)--16 (16)

0011--96--56--48--96(56)--56(24)--24 (24)

0100--128--64--56--128(64)--64(32)--32 (32)

0101--160--80--64--160(80)--80(40)--64 (40)

0110--192--96--80--192(96)--96(48)--80 (48)

0111--224--112--96--224(112)--112(56)--56 (56)

1000--256--128--112--256(128)--128(64)--64 (64)

1001--288--160--128--288(144)--160(80)--128 (80)

1010--320--192--160--320(160)--192(96)--160 (96)

1011--352--224--192--352(176)--224(112)--112 (112)

1100--384--256--224--384(192)--256(128)--128 (128)

1101--416--320--256--416(224)--320(144)--256 (144)

1110--448--384--320--448(256)--384(160)--320 (160)

1111--bad--bad--bad--bad--bad--bad

V1 - MPEG 1,V2 - MPEG 2 and MPEG 2.5

L1 - Layer 1 ,L2 - Layer 2 , L3 - Layer 3

"free" :位率可变 "bad" :不允许值

采样频率

2

MPEG-1: 00-44.1kHz ,01-48kHz ,10-32kHz ,11-未定义

MPEG-2: 00-22.05kHz , 01-24kHz ,10-16kHz ,11

MPEG-2.5: 00-11.025kHz ,01-12kHz ,10-8kHz ,11-未定义

帧长调节

1

用来调整文件头长度,0-无需调整,1-调整

保留字

1

没有使用

声道模式

2

第 4字节

00-立体声 ,01-联合立体声(是基于帧与帧完成的), 10-双声道 ,11-单声道

扩充模式

2

声道是01时用:Value强度立体声,MS立体声

00 off off

01 on off

10 off on

11 on on

版权

1

0-不合法 1-合法

原版标志

1

0-非原版 1-原版

强调方式

2

用于声音经降噪压缩后再补偿的分类,很少用到,今后也可能不会用。

00-未定义 01-50/15ms 10-保留 11-CCITT J.17

帧长计算

计算公式:取决于 位率频率
Lyaer 1使用公式:
帧长度(字节) = 每帧采样数 / 采样频率 * 比特率/ 8 +填充 * 4
Lyer 2和Lyaer 3使用公式:
帧长度(字节)= 每帧采样数 / 采样频率 * 比特率/ 8 + 填充

2.帧的填充大小就是第23位的帧长调节,不是0就是1。
3.采样个数:MPEG1-3的不同规范,以及同一规范中不同的 Layer1-3,每一帧
对应的采样,都是固定的,具体的值看下表(单位:个/帧):

MPEG帧的采样表

MPEG 1

MPEG 2(LSF)

MPEG 2.5(LSF)

Layer 1

384

384

384

Layer 2

1152

1152

1152

Layer 3

1152

576

576

每帧播放时长

每帧播放持续时间 = 帧大小 / 采样率

ID3V1尾部说明

字节

长度-bytes

内容

1-3(A)

3

存储了“TAG”字符,表示ID3V1标准,后面歌曲信息。

4-33(B)

30

歌名称

34-63(C)

30

作者名称

64-93(D)

30

专辑名称

94-97(E)

4

年份

98-125(F)

28

附注

126(G)

1

保留位

127(H)

1

音轨号

127(I)

1

MP3音乐类别一共147种

各项信息按顺序存放,没有任何标识将其分开,比如标题信息不足30 个字节,会使用”\0”填充。

Mp3解码还原流程

MP3解码经MP3编码方式压缩后的音频数据还原成原始PCM数据的过程。

MP3解码的整个工作流程见图下图,当预处理操作把MP3帧中的帧头和边信息解码后,解码器对经预处理后的信息进行缩放因子解码和哈夫曼解码,得出的结果再经反量化、重排序、立体声解码、混叠消除、逆离散余弦变换、频率反转和子带合成滤波等操作后,得到左右声道PCM音频数据,完成整个解码过程。

解码导言

这些个复杂的解码过程,我已经为大家封装好了,大家直接调用就可以导出左右PCM数据,
和对Mp3的播放。

上代码

前言
我提供了多种方法,共大家使用。

最适合新手的,最快捷的

如果你需要直接播放,我们为你封装好了,此方式,__response是一个工具框架响应快捷类。
代码见下即可:
import IOS_SHOGUN_Component.__response;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;

import javax.sound.sampled.SourceDataLine;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException {
        try {
            try (SourceDataLine Mp3 = __response.Debug_PlayMp3("Mp3地址")) {
                
            }
        } catch (Mp3DecodeException e) {
            throw new RuntimeException(e);
        }
    }
}

直接导出数据

如果你要直接导出PCM用于缓存或者其他,持久性性存储

两种存储方式

一种是Base64这种二进制存储方式占用内存小,转换后的大小比例大概为1/3,降低了资源服务器的消耗;
base64编码的字符串,更适合不同平台、不同语言的传输
一种是流存储的方式,不过这种大概只能用于暂时性的缓存,不推荐全部转化为了字节数组,因为
存在丢失的风险,通俗来讲就是,你的音频就变成一段乱音了。

TaskList<String>方式

内容:

保存了每一帧的解码后的二进制数据,随时可以对数据持久化。
也可以对数据音频进行剪辑,等其他变声操作。
它也可以导出成其他list集合,以及提供了非常的API方式,
原始API与Java自带的是一致的线程安全集合。

注意:

在将每一帧的提出并且缓存时,我们需要将它转化为,音频数据
import IOS_SHOGUN_Component.TaskList;
import IOS_SHOGUN_Component.__response;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException {
        try {
            TaskList<String> Data=__response._mp3_extract_mode_Base64("路径");
            //保存了每一帧的PCM解码数据
            byte[] PCM=__response._base64_T_X2(Data.get(0));
        } catch (Mp3DecodeException e) {
            throw new RuntimeException(e);
        }
    }
}

流方式

两种方式,一种是ByteArrayOutputStream,一种是ByteArrayInputStream两种方式
import IOS_SHOGUN_Component.__response;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException {
        try {
            ByteArrayInputStream I=__response._mp3_extract_mode_IStream("路径");
            ByteArrayOutputStream O=__response._mp3_extract_mode_OStream("路径");
        } catch (Mp3DecodeException e) {
            throw new RuntimeException(e);
        }
    }
}

快捷方式还有很多

你也可以直接导出成pcm格式文件

同样可以使用快捷方式
你可以使用常用的本地导出,和缓存的TaskList<String>,流方式

Mp3是缓存数据不是-本地数据怎么提取转换成Pcm数据?

在Colorful1.1中提供了流读取的支持,比如如果是客户端发送来的音频数据,我们就可以使用它。

翻译成ByteArrayInputStream

翻译成ByteArrayOutputStream

非快捷方式

它同时准备了非快捷的接口,看下图这些快捷方式只是对原本开放的API做了一次完成的封装。
_mp3_extract_mode_Decode
Debug_PlayMp3
同样我们可以直接复制
参数Audio,音频输出类,存储方式,是否是本地引入/缓存引入(CacheData),缓存引入时,它只能执行转化程序。
import IOS_SHOGUN_Component.decodeAean.AudioBuffer;
import IOS_SHOGUN_Component.decodeAean.DecodeSuperclasses;
import IOS_SHOGUN_Component.decodeAean.Header;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import IOS_SHOGUN_Component.mp3_Decode;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException, Mp3DecodeException {

        mp3_Decode Create = new mp3_Decode(new mp3_Decode.Audio(), AudioBuffer.STREAM, mp3_Decode.LocalData);
        Create.open("路径/流的方式", false);
        
        if (Create.onCreateAndStart()) {

            DecodeSuperclasses DECODE = Create.getPCM_DecodeSuperclasses();
            Header Head = DECODE.getRecording();
            AudioFormat af = new AudioFormat((float) Head.getSamplingRate(), 16, Head.getChannels(), true, false);

            SourceDataLine DataLineSource;
            try {
                DataLineSource = AudioSystem.getSourceDataLine(af);
            } catch (LineUnavailableException var8) {
                throw new RuntimeException(var8);
            }

            try {
                DataLineSource.open(af, 8 * Head.getPcmSize());
            } catch (LineUnavailableException var7) {
                throw new RuntimeException(var7);
            }

            DataLineSource.start();
            ByteArrayInputStream D = DECODE.getAudioBuffer().getPcmDataExportIStream();
            byte[] DD = new byte[DECODE.getAudioBuffer().getOffset()];

            while (D.read(DD) != -1) {
                DataLineSource.write(DD, 0, DD.length);
            }
        }
    }
}
为啥没有TaskList<String>?,因为TaskList数据引入只是我们的一个有备而来的接口,它最后还是会变成流的方式进入mp3_Decode进行解码流程。并且它是线程安全的。
其实这里我写复杂了,可以更简单
参数Audio,音频输出类,存储方式,是否是本地引入/缓存引入(CacheData),缓存引入时,它只能执行转化程序。
->几个参数Audio,音频输出类,存储方式,是否是本地引入/缓存引入(CacheData),缓存引入时,它只能执行转化程序。
import IOS_SHOGUN_Component.*;
import IOS_SHOGUN_Component.decodeAean.AudioBuffer;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException, Mp3DecodeException {
        //参数Audio,音频输出类,存储方式,是否是本地引入/缓存引入(CacheData),缓存引入时,它只能执行转化程序。
        mp3_Decode.Audio a=new mp3_Decode.Audio();
        mp3_Decode Create = new mp3_Decode(a, AudioBuffer.STREAM, mp3_Decode.LocalData);
        Create.open("流/路径", true);
        if (Create.onCreateAndStart()){
            //创建解码向导
        };
    }
}
我们再添加一点操作,因为在创建(onCreateAndStart)时,程序是阻塞的
import IOS_SHOGUN_Component.*;
import IOS_SHOGUN_Component.decodeAean.AudioBuffer;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import java.io.*;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.TimeUnit;

public class Java {

    public static void main(String[] X) throws IOException, Mp3DecodeException {
        SequenceCachedPool C=new SequenceCachedPool(1,2,100, TimeUnit.MILLISECONDS,
                new LinkedBlockingDeque<>(10));//创建一个池做操作控制
        //参数Audio,音频输出类,存储方式,是否是本地引入/缓存引入(CacheData),缓存引入时,它只能执行转化程序。
        mp3_Decode.Audio a=new mp3_Decode.Audio();
        mp3_Decode Create = new mp3_Decode(a, AudioBuffer.STREAM, mp3_Decode.LocalData);
        Create.open("流/路径", true);
        C.submit(()->{
            try {
                TimeUnit.SECONDS.sleep(5);
                Create.close();
                //5秒后退出
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        });
        if (Create.onCreateAndStart()){
            //创建解码向导
        };
    }
}

更推荐这样做

我更加的推荐把它当作一个转化程序,去做,因为它本身的任务不是播放。
这些是额外附加的。
getBase64Statistics(专为PCM-Base64数据做统计),像这样的专项API还有很多

代码

import IOS_SHOGUN_Component.*;
import IOS_SHOGUN_Component.decodeAean.AudioBuffer;
import IOS_SHOGUN_Component.decodeAean.Mp3DecodeException;
import java.io.*;
public class Java {

    public static void main(String[] X) throws IOException, Mp3DecodeException {
        //注意这里必须是AudioBuffer.BASE64,不然不管以任何方式获取base64的PCM纯音频数据都将为空!
        mp3_Decode Create = new mp3_Decode(new mp3_Decode.Audio(), AudioBuffer.BASE64, mp3_Decode.LocalData);
        Create.open("D:\\WindowsDataStorageFolder\\CSDN2.mp3", false);//为false只做转化
        if (Create.onCreateAndStart()){
            TaskList<String> Data=Create.getPCM_dataLine().getPcmDataTaskList();
            //或者
            //TaskList<String> Data=Create.getPCM_DecodeSuperclasses().getAudioBuffer().getPcmDataTaskList();
            console.success("数据总长%s".formatted("PCM数据段总长->"+Data.getBase64Statistics()));
        };
    }
}

结尾

如果你喜欢的话就点个赞吧。

Guess you like

Origin blog.csdn.net/m0_61267721/article/details/129345078
Recommended