ffmpeg adjusts the audio volume to step on the pit

A while ago, I used Flutterthe combination ffmpegto make an audio and video merger function, and recorded the problems encountered.

merge method

The first is the audio and video merge command:

ffmpeg -i input.mp4 -i input.mp3 -filter_complex "[1:a]adelay=0s:all=1[a1];[a1]amix=inputs=1[amixout]" -map 0:v:0 -map "[amixout]" -c:v copy -c:a aac output.mp4

illustrate:

  • -i input.mp4: Specify the input video file.
  • -i input.mp3: Specifies the input audio file.
  • -filter_complex "[1:a]adelay=0s:all=1[a1];[a1]amix=inputs=1[amixout]": Use filter_complex option for audio processing. First, delay the audio stream ([1:a]) of the input audio file for 0 seconds (adelay=0s), and save all channels (all=1) as a new audio stream ([a1]). Then, use an amix filter to mix this new audio stream with the audio stream of the input video file into a new audio stream ([amixout]).
  • -map 0:v:0: Specifies to map the first video stream of the input video file to the output file.
  • -map "[amixout]": Specifies to map the audio-processed audio stream to an output file.
  • -c:v copy: Specifies that the video stream is copied using the original encoding method, that is, no re-encoding is performed.
  • -c:a aac: Specifies that the audio stream is re-encoded using AAC encoding.
  • output.mp4: Specifies the output file.

volume problem

Because the volume of the audio file is not fixed, the function of volume modification is provided at the same time. Modify the volume first, then merge. This is also where the problem is encountered. The command to start with is:

ffmpeg -i input.mp3 -af loudnorm=i=-16 output.mp3
  • -af loudnorm=: apply loudnorm audio filter with af option. The loudnorm filter is used to automatically gain adjust the audio so that it reaches the specified target volume.
  • i=-16: Indicates that the target volume is -16 LUFS. -16 is the recommended value for the music category.

LUFS is an absolute unit used to represent the overall loudness or volume perception of audio. It is calculated based on the average loudness perceived by the human ear for sounds in different frequency ranges. The value of LUFS represents the gain or attenuation of the audio relative to a standard reference volume.

Command reference for viewing loudness:

ffmpeg -nostats -i test.mp3 -af "ebur128=peak=true:framelog=verbose" -f null -

The result is as follows:

[Parsed_ebur128_0 @ 0000021ebcf9a880] Summary:

  Integrated loudness:
    I:         -33.1 LUFS
    Threshold: -43.5 LUFS

  Loudness range:
    LRA:         5.4 LU
    Threshold: -55.1 LUFS
    LRA low:   -38.1 LUFS
    LRA high:  -32.7 LUFS

  True peak:
    Peak:      -14.2 dBFS
  • I : Overall loudness.
  • LRA: Loudness Range.
  • True peak: True peak, referred to as TP.

However, after adjustment, it is found that some audio processing will cause noise, or the sound in some places is small, and the overall feeling is averaged. In fact, the way to adjust the loudness is more complicated, and simply specifying the loudness cannot achieve good results.

Loudness normalization

First get the audio value:

 ffmpeg -i test.mp3 -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json -f null -

Output result:

{
    
    
        "input_i" : "-33.13",
        "input_tp" : "-14.19",
        "input_lra" : "5.20",
        "input_thresh" : "-43.60",
        "output_i" : "-16.42",
        "output_tp" : "-2.00",
        "output_lra" : "4.90",
        "output_thresh" : "-28.41",
        "normalization_type" : "dynamic",
        "target_offset" : "0.42"
}

Then re-fill the above parameters loudnorm:

ffmpeg -i input.mp3 -af "loudnorm=I=-16:measured_I=-33.13:measured_TP=-14.19:measured_LRA=5.20:measured_thresh=-43.6:offset=0.42:print_format=summary" output.mp3

Output result:

Input Integrated:    -33.1 LUFS
Input True Peak:     -14.2 dBTP
Input LRA:             5.2 LU
Input Threshold:     -43.6 LUFS

Output Integrated:   -15.6 LUFS
Output True Peak:     -2.0 dBTP
Output LRA:            5.5 LU
Output Threshold:    -27.9 LUFS

Normalization Type:   Dynamic
Target Offset:        -0.4 LU

Check volume:

[Parsed_volumedetect_0 @ 00000198003cfe00] n_samples: 2734011
[Parsed_volumedetect_0 @ 00000198003cfe00] mean_volume: -17.9 dB
[Parsed_volumedetect_0 @ 00000198003cfe00] max_volume: -1.5 dB
[Parsed_volumedetect_0 @ 00000198003cfe00] histogram_1db: 6
[Parsed_volumedetect_0 @ 00000198003cfe00] histogram_2db: 512
[Parsed_volumedetect_0 @ 00000198003cfe00] histogram_3db: 1871
[Parsed_volumedetect_0 @ 00000198003cfe00] histogram_4db: 4690

After doing this, the problem audio is normal. Because Loudness normalization (loudness normalization) is used above, it is to uniformly adjust and balance the volume of each audio, as shown on the right side of the figure below.
insert image description here

peak normalization

Let's talk about the Peak (level) normalization (peak normalization) method (the left side of the figure above). It is to adjust the maximum (Peak) of the audio to a specific size, and then make corresponding increase/decrease adjustments for other audio.

The first is to get the maximum volume of the audio:

ffmpeg -i input.mp3 -filter_complex volumedetect -c:v copy -f null /dev/null

The result is as follows:

[Parsed_volumedetect_0 @ 000002422f31d580] n_samples: 2511872
[Parsed_volumedetect_0 @ 000002422f31d580] mean_volume: -35.2 dB
[Parsed_volumedetect_0 @ 000002422f31d580] max_volume: -14.2 dB
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_14db: 3
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_15db: 10
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_16db: 15
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_17db: 67
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_18db: 161
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_19db: 421
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_20db: 1147
[Parsed_volumedetect_0 @ 000002422f31d580] histogram_21db: 3595

Maximum volume -14.2dB, then adjust dB:

ffmpeg -i input.mp3 -af volume=14dB output.mp3
  • -af volume=14dB: The volume filter is used to adjust the gain of the audio to increase or decrease the volume of the audio. This means a volume increase of 14dB.

dB is a relative unit used to express the relative strength or power of an audio signal. In the field of audio, dB is usually used to express the gain or attenuation of volume

Adjusted volume:

[Parsed_volumedetect_0 @ 0000027108d9f240] n_samples: 2511872
[Parsed_volumedetect_0 @ 0000027108d9f240] mean_volume: -21.6 dB
[Parsed_volumedetect_0 @ 0000027108d9f240] max_volume: -1.0 dB
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_1db: 9
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_2db: 14
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_3db: 35
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_4db: 122
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_5db: 253
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_6db: 693
[Parsed_volumedetect_0 @ 0000027108d9f240] histogram_7db: 2260

This method is relatively simple, adjust the volume according to the maximum volume value. But the place where the audio volume is relatively low is still small, and the place where it is large is still loud. It's like turning up the volume on your phone.

This method is more suitable for simply adjusting the volume. It is also more suitable for our material file adjustment. So this method was chosen in the end.

optimization

If the volume is adjusted too loudly, the audio will be distorted. So the first step to get the maximum volume is very important. Or use dynamic range to adjust the volume.

ffmpeg -i input.mp3 -af "compand=0|0:1|1:-90/-90|-80/-80|-70/-70|-60/-60|-50/-50|-40/-30|-30/-20|-20/-10|-10/-1:6:0:-90:0.2" output.mp3
  • compand: Audio filter for dynamic range compression and expansion.
  • I can't explain the other parameters. . . Focus on -90/-90|-80/-80|-70/-70|-60/-60|-50/-50|-40/-30|-30/-20|-20/-10|-10/-1the part.

Each pair of values ​​represents a map point (separated by a vertical bar), the first value is the input volume, and the second value is the output power. In this example, there are 9 mapping points, namely (-90dB, -90dB), (-80dB, -80dB), (-70dB, -70dB), (-60dB, -60dB), (-50dB, - 50dB), (-40dB, -30dB), (-30dB, -20dB), (-20dB, -10dB), (-10dB, -1dB).

For example, if you want all audio signals above -40dB to be compressed to -30dB, you can set that -40/-30. In this way, all audio signals above -40dB compandwill be compressed to -30dB after being processed by the filter. At the same time, the small sound above -50dB is not processed, so the volume is not increased. This operation down, actually added about 10dB.


Finally, put a screenshot of the final tool running:

Please add a picture description

reference

Guess you like

Origin blog.csdn.net/qq_17766199/article/details/131330432