Talking about Sound Files

【Preface】

Some time ago, a colleague encountered a very tangled client. The reason for the struggle is that the customer asked her to provide song files with a size of 100MB-200MB or more. And my colleagues don't know much about audio formats, so I started endless entanglements about FLAC, WAV, and audio sizes. In the end, the colleague did not explain clearly to the customer what was going on.

Afterwards, some other things happened, which made me feel that in the music industry, there are too many practitioners around me who have extremely little understanding of music, even some basic music-related knowledge, what's more, this knowledge is not at all Being valued, practitioners don't even have the idea to understand, which makes me feel very sad. It seems that music has only one product attribute, and our practitioners only need to organize the shelves, code various products, and recommend products to users with the big data of user purchase records, without caring about why users like this at all. Brands, what are the characteristics of these products, and use cold data to provide users with various services.

Therefore, I think it is necessary to write something. I don't expect practitioners to become people who really love music. I just hope that even if you still only think of "her" as a commodity, you can first know what you are selling. .

PS: The content of the first lecture is about media files. Because the relevant content involves many technical issues, it seems a bit boring, but if you read it quietly, you will find that it is actually very easy to understand. These basic knowledge can be easily understood. Good to improve your abilities. Please also look forward to more interesting content about records, music styles, etc. that I will release soon.

【text】

Bitrate, Samplerate, Lossless, MP3, FLAC, APE, 320kb, 192kb, 128kb, 44.1khz, CBR, VBR. Does this pile of various names make you both familiar and strange?

The higher the bit rate, the better the sound quality. And lossless music is the highest sound quality, is this true? So let's start with sound collection.

【Audio Composition】

Currently, what we call audio is digital audio. Digital audio consists of three parts: sampling frequency, sampling precision, and number of sound channels.

Sampling frequency: The sampling rate refers to the number of samples per second when recording sound, which is expressed in Hertz (Hz).

Sampling accuracy: refers to the dynamic range of recorded sound, which is in bit (Bit).

Sound channel: the number of sound channels (1-8).

<img src="https://pic1.zhimg.com/50/7745e85fed03c093addc424d5b437e9a_hd.jpg" data-rawwidth="587" data-rawheight="217" class="origin_image zh-lightbox-thumb" width="587" data-original="https://pic1.zhimg.com/7745e85fed03c093addc424d5b437e9a_r.jpg">

In layman's terms, we can think of a sound wave as a curve. We know that a curve is composed of points, and the sampling rate is the number of points in the middle of the length per second (horizontal axis in the figure above). The sampling accuracy is the number of midpoints in the dynamic range (the vertical axis in the figure above). The finer the positioning of these two dimensions, the higher the true reproduction of the sound, the better the sound quality, and of course, the larger the audio file. What the client met by the colleague above said was the latest audio format Hi-Res Audio released by SONY, which is 192kHz/24bit, 6-channel audio files, and the size of the lossless format will of course be more than 200 megabytes.

The sampling rate is roughly as follows according to the type of use (k is a thousand-bit symbol, 1khz=1000hz):

8khz: Used for telephone etc., it is enough for recording human voice.

22.05khz: Broadcast frequency.

44.1kb: Audio CD.

48khz: Used in DVD and digital TV.

96khz-192khz: DVD-Audio, Blu-ray high-definition, etc.

The common range of sampling accuracy is 8bit-32bit, while 16bit is generally used in CD.

At this point, my friends are beginning to be confused. It is not the bit rate that determines whether the sound quality is good or bad. Then why do everyone say that the sound quality of 320kb is better than that of 128kb?

【Audio compression】

Well, in fact, the bit rate should be said to be another dimension, it is a kind of compression of audio files.

At present, most of the audio formats we commonly use are based on the original file "WAV" file of audio CD (sampling rate 44.1khz, sampling accuracy 16bit, 2 channels). The original recorded sound data is stored in an array, which is the PCM format, and the WAV format is a coding format developed by Microsoft. Its function is to play the data in the PCM format through coding.

Since the data in WAV basically fully restores PCM data, other encoding formats such as lossless, MP3, AAC, etc. are basically recompressed based on WAV files. Therefore, we can simply think that WAV is the original audio format, and other audio formats are compressed formats.

When it comes to compression, storage and transmission are inseparable. The purpose of compression is for better storage and transmission. Therefore, before we talk about compression, we need to have some understanding of the basic units of computers.

We all know that the computer is a binary number system, and the files stored in the computer are composed of two numbers, 0 and 1. Therefore, the transmission of the computer is based on each number, and each number is called 1 "bit (bit)". For example, for a piece of audio, its basic data is "0,1,1,1,0,1, 1,0", and when transmitting, these numbers are transmitted one by one. The sampling accuracy mentioned above is this unit.

The storage unit of a computer is "byte (Byte)". In a computer, a byte is composed of 8 bits, that is to say, 8b (bit) = 1B (Byte). In computer language, data storage is expressed in decimal, and data transmission is expressed in binary, so 1KB=1024B=1024×8b. This is also part of the reason why the hard drive capacity we see does not match the actual capacity.

Come back and talk about audio compression, the audio bit rate is actually the compression ratio. So the bit rate actually only defines the size of the file, but under normal conditions, the larger the file, the less data it loses, so its sound quality is relatively higher. But the bit rate itself does not have a direct impact on the quality of the file. For example, if we use a 128kb file as the source file, even if it is converted to a 320kb file, the sound quality will still not be better than 128kb.

So what exactly do the numbers and letters in bitrate mean? First look at the full name of 128k "128kbps", let's try to break it down: 128 is a number, k is a thousand character, b is a unit, s is a second, and ps is actually "/s". In this way, 128kbps is 128kb/s. That's 128kb per second.

Note that the b here is a lowercase b, which is the bit. Knowing this, we can calculate how much storage space a 128kb file occupies: 128*1000=128000b/s÷8=16000B/s÷1024=15.625KB/s*60=937.5KB/minute÷1024=0.9155 MB/min. Therefore, the size of 128kb audio files is about 0.92M or 916kb per minute, which is why the size of 128kb mp3 is about 1M. You can test and verify it locally.

Before talking about lossy and lossless, there are two more words to explain to you, that is, we will see CBR and VBR when compressing MP3. And CBR is Constants Bit Rate, constant bit rate; VBR is Variable Bit Rate, dynamic bit rate. Theoretically speaking, the VBR method is to automatically correct some bit rates according to the specific frequency of the sound in the audio source file, so as to achieve a smaller file with the same bit rate effect.

Let's talk about lossy and lossless again. To put it simply, lossy compression is to achieve the purpose of compression by deleting some less important data in existing data; lossless compression is to achieve the purpose of compression by optimizing the arrangement. Since these compression methods involve deeper technical knowledge, we will not say more, and we can probably look at it this way: lossy compression is like deleting some unimportant particles in an article to achieve the goal. After decompression, the Deleted content cannot be restored; lossless is achieved through typesetting, and after decompression, complete WAV data can be obtained, just like our commonly used winzip and WinRAR.

Among the lossless formats, APE (Monkey's audio) and FLAC (Free Lossless Audio Codec) are commonly used at present. The former has a smaller bit rate, while the latter is easier to spread. The difference is that FLAC can use the transmitted data directly after the transmission is interrupted. For example, if we download a piece of music in APE format, we must wait for all the data to be downloaded before playing it. However, FLAC is different. You only download 1/3, and you can play the 1/3 content first.

Seeing this, I think you have already thought that WAV files are also a kind of encoding format, so does it also have a certain bit rate? That's right, the bit rate of a standard WAV file is 1411kb, and the lossless compression is about 900-1000 depending on the content of the source file. You can calculate their standard size by yourself.

[The difference between different coding modes in the market]

We often see some sayings that the sound quality of 64kb aac (the audio format used by Apple) is similar to that of 128kb MP3, but it is only half the size of MP3. The size of wma including Microsoft is relatively small, but why is the current mainstream audio format still mp3?

Regarding this issue, I have not studied it specifically, but some situations on the comprehensive network may be as follows:

1. MP3 is the earliest audio coding standard popular on the Internet. People's behavior habits and the whole network support decoding make it more advantageous.

2. Different encoding methods have different advantages in different bit rates. In the range of 192kb-224kb, the sound quality of MP3 format still has an absolute advantage.

3. From Napster's free MP3 download website to the support of major Walkman players, MP3 has been widely spread, and the subsequent AAC format has not encountered such a large-scale spread opportunity, resulting in no mainstream for more than ten years .

PS: AAC and MP3 are actually derived from the same standard MPEG. AAC appeared as the successor of MP3 at the beginning of its birth.

In addition, the test results from netizens can be used as a reference, as follows:

OGG Advantage Range: Above 96K (OGG)

Advantage range of AAC: AAC LC should be higher than (inclusive) 256K AAC HE 48K-96K

Advantage range of Mp3: above 192K (inclusive)

Advantage range of WMA: Below 128K (inclusive)



Personal feelings about lossy formats:
For any lossy format, the higher the bit rate, the better. The code rate is high, not only the waveform distortion is small, but also the frequency attenuation is small.
If it is necessary to rank the lossy audio formats according to the degree of spectrum depletion: (Mp3 refers to CBR, AAC refers to LC)
Under the same code rate (CBR)
More than 320K OGG, AAC are almost lossless
320K OGG=AAC>Mp3>WMA
256K OGG>AAC>Mp3>WMA
224K OGG>Mp3>AAC
>WMA 192K OGG>Mp3>WMA>AAC
128K OGG>WMA>AAC>Mp3
96K AAC(HE)>OGG>WMA>Mp3 (OGG is severely distorted at this time, so it may be AAC(HE)>WMA>OGG>Mp3 in hearing) 64K AAC(HE)>OGG>WMA>Mp3 (OGG is severely distorted at this
time , so the hearing may be AAC(HE)>WMA>OGG>Mp3)
serial number
Format
Specification
Actual code rate
actual size
highest frequency
distortion
degree of distortion
1
Wave Wave 1411 46.1 22+
none
none
2
APE Fast 960 31.3 22+
none
none
3
Insane 936 30.6 22+
none
none
4
FLAC V0 1030 33.5 22+
none
none
5
V8 969 31.6 22+
none
none
6
WavPack Normal 970 31.6 22+
none
none
7
Ultra 953 31.1 22+
none
none
8
Mp3 CBR 320 10.4 21.5
yes
Small
9
256 8.36 20.9
yes
middle
10
224 7.31 19.6
yes
middle
11
192 6.27 19.6
yes
middle
12
128 4.18 15.5
yes
big
13
96 3.13 12.2
yes
big
14
64 2.08 8.8
yes
big
15
V0 273 8.93 19.2
yes
Small
16
v2 221 7.23 18.5
yes
big
17
V4 159 5.22 16.9
yes
middle
18
V6 130 4.26 15.5
yes
big
19
V8 100 3.29 12.8
yes
great
20
Mp3-pro CBR 96 3.13 22+
yes
middle
21
64 2.09 18.1
yes
big
22
WMA CBR 320 10.4 20.2
yes
Small
23
256 8.39 20.3
yes
Small
24
224 no data no data
no data
no data
25
192 6.3 18.7
yes
middle
26
128 4.2 16.1
yes
big
27
96 3.16 13.6
yes
big
28
64 2.11 11.7
yes
big
29
AAC LC448 443 14.5 22+
yes
Small
30
LC320 316 10.4 22+
yes
Small
31
LC256 253 8.35 21.4
32
LC224 221 7.31 18.4
33
LC192 190 6.27 18.2
34
LC128 126 4.19 15.9
35
HC96 94 3.13 20.4
36
HC64 63 2.1 20.4
37
OGG Q10 499 16.3 22+
38
Q9 334 10.9 22+
39
Q8 257 8.41 22+
40
Q7 225 7.38 22+
41
Q6 194 6.35 21.6
42
Q4 133 4.35 19.2
43
Q2 93 3.04 16.8
44
Q0 64 2.05 15.6
极大
不等码率(VBR)下请参照上表交叉对比
另外各种格式都有自己的优势码率范围:
OGG的优势范围:96K以上(OGG)
AAC的优势范围:AAC LC应高于(包含)256K AAC HE 48K-96K( AAC HE真强
Mp3的优势范围:192K(包含)以上
WMA的优势范围:128K(包含)以下
如果你的机器支持,128K(包含)以上请用OGG,64K-96K请用AAC(HE)
如果你的机器只支持WMA和Mp3,192K以上(包含)请用Mp3,128K以下(包含)请用WMA
64K以下什么格式都很衰,所以请至少保留64K以上的码率
单从频率范围来讲:
对于1个理论上的正常人来说,听觉范围大约为50Hz-20KHz。那么你的选择为
Mp3 CBR 码率高于(包含)224K
WMA 码率高于(包含)224K
AAC LC 码率高于(包含)256K 、AAC HE 码(包含)率高于48K
OGG 码率高于(包含)192K
Mp3-pro 码率高于(包含)80K
Mp3 VBR 高频不合格!
对于1个普通音乐迷来说,听觉范围大约为1KHz~16KHz。那么你的选择为
Mp3 CBR 码率高于(包含)192K
Mp3 VBR 码率高于V6编码 即高于128K
WMA 码率高于(包含)128K
AAC LC 码率高于(包含)128K、AAC HE 码率高于(包含)48K
OGG 码率高于(包含)96K
Mp3-Pro 码率高于(包含)56K
当然还有金耳朵们,听觉范围大约为 20Hz~22KHz 。那么你的选择为
首先当然是无损,然后有损里面可以试试:
Mp3 CBR 码率高于(包含)224K
WMA 码率高于(包含)224K
AAC LC 码率高于(包含)256K、AAC HE 码率高于(包含)48K
OGG 码率高于(包含)192K
Mp3-Pro 码率高于(包含)80K
Mp3 VBR 高频不合格!
当然,每个人都有自己的感觉
例如Mp3 CBR的低频有点差,所以听古典音乐觉得很生硬……等等
这个感觉问题就只有大家自己慢慢体会了!
附:各频率对人耳的刺激即听觉感受
16K~20KHz频率:
这段频率范围实际上对于人耳的听觉器官来说,已经听不到了,因为人耳听觉的最高频率是15.1KHz。但是,人可以通过人体和头骨、颅骨将感受到的16~20KHz频率的声波传递给大脑的听觉脑区,因而感受到这个声波的存在。这段频率影响音色的韵味、色彩、感情味。如果音响系统的频率响应范围达不到这个频率范围,那么音色的韵味将会失落;而如果这段频率过强,则给人一种宇宙声的感觉,一种幻觉,一种神秘莫测的感觉,使人有一种不稳定的感觉。因为这些频率大多数是基音的不谐和音频率,所以会产生一种不安定的感受。这段频率在音色当中强度很小,但是很重要,是音色的表现力部分,也是常常被人们忽略的部分,甚至有些人根本感觉不到它的存在。
12K~16KHz频率:
这是人耳可以听到的高频率声波,是音色最富于表现力的部分,是一些高音乐器和高音打击乐器的高频泛音频段,例如镲、铃、铃鼓、沙锤、铜刷、三角铁等打击乐器的高频泛音,可给人一种“金光四射”的感觉,强烈地表现了各种乐器的个性。如果这段频率成分不足,则音色将会会失掉色彩,失去个性;而如果这段频率成分过强,如激励器激励过强,音色会产生“毛刺”般尖噪、刺耳的高频噪声,对此频段应给予一定的适当的衰减。
10K~12KHz频率:
这是高音木管乐器的高音铜管乐器的高频泛音频段,例如长笛、双簧管、小号、短笛等高音管乐器的金属声非常强烈。如果这段频率缺乏,则音色将会失去光泽,失去个性;如果这段频率过强,则会产生尖噪,刺耳的感觉。
8K~10KHz频率:
这段频率s音非常明显,影响音色的清晰度和透明度。如果这频率成分缺少,音色则变得平平淡淡;如果这段频率成分过多,音色则变得尖锐。
6K~8KHz频率:
这段频率影响音色的明亮度,这是人耳听觉敏感的频率,影响音色清晰度。如果这段频率成分缺少,则音色会变得暗淡;如果这段频率成分过强,则音色显得齿音严重。
5K~6KHz频率:
这段频率最影响语音的清晰度、可懂度。如果这段频率成分不足,则音色显得含糊不清;如果此段频率成分过强,则音色变得锋利,易使人产生听觉上的疲劳感。
4K~5KHz频率:
这段频率对乐器的表面响度有影响。如果这段频率成分幅度大了,乐器的响度就会提高;如果这段频率强度变小了,会使人听觉感到这种乐器与人耳的距离变远了;如果这段频率强度提高了,则会使人感觉乐器与人耳的距离变近了。
4KHz频率:
这个频率的穿透力很强。人耳耳腔的谐振频率是1K~4KHz所以人耳对这个频率也是非常敏感的。如果空虚频率成分过少,听觉能力会变差,语音显得模糊不清了。如果这个频率成分过强了,则会产生咳声的感觉,例如当收音机接收电台频率不正时,播音员常发出的咳音声。
2K~3KHz频率:
这段频率是影响声音明亮度最敏感的频段,如果这段频率成分丰富,则音色的明亮度会增强,如果这段频率幅度不足,则音色将会变得朦朦胧胧;而如果这段频率成分过强,音色就会显得呆板、发硬、不自然.
1K~2KHz频率:
这段频率范围通透感明显,顺畅感强。如果这段频率缺乏,音色则松散且音色脱节;如果这段频率过强,音色则有跳跃感。
800Hz频率:
这个频率幅度影响音色的力度。如果这个频率丰满,音色会显得强劲有力;如果这个频率不足,音色将会显得松弛,也就是800Hz以下的成分特性表现突出了,低频成分就明显;而如果这个频率过多了,则会产生喉音感。人人都有一个喉腔,人人都有一定的喉音,如果音色中的喉音成分过多了,则会失掉语音的个性、失掉音色美感。因此,音响师把这个频率称为"危险频率",要谨慎使用。
500Hz~1KHz频率:
这段频率是人声的基音频率区域,是一个重要的频率范围。如果这段频率丰满,人声的轮廓明朗,整体感好;如果这段频率幅度不足,语音会产生一种收缩感;如果这段频率过强,语音就会产生一种向前凸出的感觉,使语音产生一种提前进人人耳的听觉感受。
300Hz~500Hz频率:
这段频率是语音的主要音区频率。这段频率的幅度丰满,语音有力度。如果这段频率幅度不足,声音会显得空洞、不坚实;如果这段频率幅度过强,音色会变得单调,相对来说低频成分少了,高频成分也少了,语音会变成像电话中声音的音色一样,显得很单调。
150Hz~300Hz频率:
这段频率影响声音的力度,尤其是男声声音的力度。这段频率是男声声音的低频基音频率,同时也是乐音中和弦的根音频率。如果这段频率成分缺乏,音色会显得发软、发飘,语音则会变得软绵绵;如果这段频率成分过强,声音会变得生硬而不自然,且没有特色。
100Hz~150Hz频率:
这段频率影响音色的丰满度。如果这段频率成分增强,就会产生一种房间共鸣的空间感、混厚感;如果这段频率成分缺少,音色会变得单薄、苍白;如果这段频率成分过强,音色将会显得浑浊,语音的清晰度变差。
60Hz~100Hz:
这段频率影响声音的混厚感,是低音的基音区。如果这段频率很丰满,音色会显得厚实、混厚感强。如果这段频率不足,音色会变得无力;而如果这段频率过强,音色会出现低频共振声,有轰鸣声的感觉。
20Hz~60Hz频率:
这段频率影响音色的空间感,这是因为乐音的基音大多在这段频率以上。这段频率是房间或厅堂的谐振频率。如果这段频率表现的充分,会使人产生一种置身于大厅之中的感受;如果这段频率缺乏,音色会变得空虚;而如果这段频率过强,会产生一种嗡嗡的低频共振的声音,严重地影响了语音的清晰度和可懂度。

Guess you like

Origin blog.csdn.net/tianhai110/article/details/79213496