Analyze the principle of audio output volume adjustment and the adjustment method

Brief description

Typically, volume adjustments can be made in two main stages of the audio pipeline: the digital domain and the analog domain. When we adjust the volume in system settings, we usually adjust it in the analog domain (i.e. audio device). In this case, all audio output (including audio from different applications) will be affected.

If you only want to change the volume of a specific audio, you can adjust it in the digital domain. This is usually when the audio data is still in memory and before it is sent to the audio device.
At this stage, the volume can be changed by adjusting the amplitude of the audio data. This method will only affect the modified audio stream and will not affect other audio or system volume settings. This process is usually performed in audio processing software or audio player software.

In general, you can modify the audio data to only change the volume of a specific audio without affecting the volume of the system or other applications.

Does volume adjustment require resampling?

uncertain. Volume adjustment and resampling are two separate processes.

Volume adjustment is a simple mathematical operation performed by changing the amplitude of audio data. This process does not require changing the audio sample rate. In other words, changing the volume does not involve resampling the audio signal.

However, resampling is the process of converting an audio signal from one sampling rate to another. This is usually required when you need to match the sample rate of a device or audio format, or perform some kind of frequency analysis. For example, if your audio device can only support a 44.1kHz sample rate, but your audio file is a 48kHz sample rate, then you may need to resample the audio file in order to play on your device. This process is not directly related to changes in volume.

In general, adjusting the volume of the audio output does not require resampling. However, in some cases, you may need to adjust the volume and resample at the same time, and the two processes can be performed separately.

digital domain

The term "digital domain" refers to the environment or stage in digital signal processing where signals or data are processed in digital form. This is usually the stage before the signal or data is converted to analog form and fed into a hardware device such as speakers or headphones.

In audio processing, operations in the digital domain include, but are not limited to:

Digital filtering: Use digital filters to change the frequency response of a signal, such as to remove noise or enhance audio signals at specific frequencies.
Resampling: Changing the sampling rate of an audio signal to match the needs of a specific device or application.
Volume adjustment: Change the volume by changing the amplitude of audio data. This does not change the audio's sample rate.
Compression and encoding: Compress audio data into smaller files, or encode it into a specific format for easy storage or transmission.

These operations are completed in memory before the audio device plays the signal. Therefore, they can be performed on each audio stream independently without affecting other audio streams or the device's global settings. That's why you can change the volume of one audio in the digital domain without affecting the system's volume setting or the volume of other audio.

Adjust audio data

As long as you know the format and sample type of the audio data, you can perform mathematical operations directly on the decoded audio data to adjust the volume, and then pass this data to the audio output device.

What needs to be noted here is that your operation needs to consider the sampling type of the audio data. For example, if your audio data are 16-bit signed integers ( int16_t), you can simply multiply or divide each sample value to change the volume. But if your audio data is floating point ( floator double), you need to perform corresponding floating point operations.

In addition, you also need to consider the situation of multi-channel audio. If your audio data has multiple channels (such as stereo or 5.1 surround), you need to perform the same operation on the sample values of each channel.

In general, as long as you can handle the audio data correctly, you can directly perform operations on the decoded audio data to adjust the volume without relying on other functions of the FFmpeg library. However, please note that this method requires you to have a sufficient understanding of the format and processing of audio data.

A basic C++ example is provided below. Of course, the prerequisite is to obtain the decoded audio data.
Assuming that your audio data format is a 32-bit floating point number, you can write a C++ function as follows to adjust the audio volume:

#include <vector>

// 音频样本数据类型假设为浮点数
typedef float SampleType;

// 调整音频音量的函数
// audioData: 音频数据，左右通道交错存储，例如: L1 R1 L2 R2 ...
// volumePercent: 音量的百分比，1.0 表示 100%
void adjustVolume(std::vector<SampleType>& audioData, float volumePercent) {
    
    
    // 遍历所有的音频样本
    for (size_t i = 0; i < audioData.size(); ++i) {
    
    
        // 将每个样本值乘以音量的百分比
        audioData[i] *= volumePercent;
    }
}

This function takes a vector storing audio data (assuming left and right channel samples are stored interleaved) and a percentage of the volume (1.0 means 100%), and then adjusts the value of each audio sample by multiplying by the percentage of the volume.

Note that if the volume percentage is greater than 1.0, the volume will be increased; if the volume percentage is less than 1.0, the volume will be decreased. If the volume percentage is 0.0, then the audio will be muted.

This function assumes that the audio data is in the format of a 32-bit floating point number, so we can directly use floating point multiplication to adjust the volume. If your audio data is in a different format, you may need to modify this function to fit your data format.

Adjust the volume of decoded audio data in ffmpeg

Direct manipulation of audio data

In the FFmpeg library, you can process the decoded audio samples to adjust the volume.

Here's a basic example:

// 假设 frame 是你解码后的音频帧
AVFrame *frame;

// 解码器输出的数据类型为 int16，每个样本的最大值为 32767
// 想要将音量减半，你可以遍历所有的样本并将它们除以2
for (int i = 0; i < frame->nb_samples; ++i) {
    
    
    for (int ch = 0; ch < frame->channels; ++ch) {
    
    
        ((int16_t*)frame->data[ch])[i] /= 2;
    }
}

In this example, we assume that the decoder outputs a data type int16_twith a maximum value of 32767 per sample (which is the maximum value for a 16-bit signed integer). To halve the volume, we loop through each audio sample and divide its value by 2.

Please note that you will need to modify this code according to the specific format and sample type of your audio data. In practical applications, you may also need to consider more complex situations, such as overflow processing, different sample types, multi-channel audio, etc.

Additionally, while this example divides all samples by 2 to halve the volume, you can also use other mathematical operations to change the volume. For example, you can multiply by a factor to adjust the volume, or use more complex algorithms to achieve more sophisticated audio effects.

audio filter

In fact, you can also use FFmpeg's audio filters to adjust the volume. FFmpeg has an audio filter called "volume", which can adjust the volume in the audio filter chain. Here's an example:

AVFilterContext* vol_ctx;
AVFilterGraph* graph;
AVFilterInOut* inputs;
AVFilterInOut* outputs;

// 初始化滤镜图
graph = avfilter_graph_alloc();

// 创建 "volume" 滤镜并设置为减半音量
AVFilter* vol = avfilter_get_by_name("volume");
avfilter_graph_create_filter(&vol_ctx, vol, "volume", "0.5", NULL, graph);

// 将解码器的输出链接到 "volume" 滤镜
inputs = avfilter_inout_alloc();
inputs->filter_ctx = dec_ctx;  // 解码器的滤镜上下文
inputs->pad_idx = 0;
inputs->next = NULL;

// 将 "volume" 滤镜的输出链接到编码器
outputs = avfilter_inout_alloc();
outputs->filter_ctx = vol_ctx;  // "volume" 滤镜的上下文
outputs->pad_idx = 0;
outputs->next = NULL;

// 将输入和输出链接到滤镜图
avfilter_graph_parse_ptr(graph, "volume", &inputs, &outputs, NULL);

In this example, we create an audio filter graph and add a "volume" filter to it, which cuts the volume in half. We then link the output of the decoder to the "volume" filter, and finally the output of the "volume" filter to the encoder.

This way you can use FFmpeg's audio filters to adjust the volume. This is potentially more convenient and powerful, as FFmpeg's audio filters provide many predefined audio processing functions and can be easily combined in filter chains.

simulation domain

In audio processing, the analog domain usually refers to the stage where the sound signal has been converted into an analog signal that is used to drive a physical device such as speakers or headphones.

We know that the computer processes digital signals internally, which is binary data based on 0 and 1. However, what our ears hear is a continuous, analog sound signal. Therefore, between the audio output from the computer and our ears, there must be a process of converting the digital signal into an analog signal. This process is accomplished by a device called a DAC (Digital-to-Analog Converter).

The stage after the DAC is usually called the analog domain. At this stage, the signal is a continuous voltage or current rather than a binary number. This signal can be received by an audio device (such as a speaker or headphones) and converted into sound.

When we adjust the volume in the operating system settings, we are usually changing the signal amplitude in the analog domain. The advantage of this method is that it is very straightforward to change the volume of all audio outputs. However, it also has the disadvantage that it affects all audio output, not just a specific audio stream.

In contrast, if we change the volume in the digital domain (i.e. the audio is still in the computer's memory and has not yet been converted to an analog signal by the DAC), we have more fine-grained control over which audio stream's volume is changed.

In the analog domain, audio signals have become analog electrical signals and can no longer be directly manipulated by software. At this stage, the volume can be changed by adjusting the volume control of the device (such as an audio interface or speakers).

FFmpeg is a software library for processing digital audio and video. It operates when the audio is still in the digital domain, that is, before the audio signal has been converted to an analog signal. You can use FFmpeg to change the volume of audio data, resample, codec, etc.

Therefore, the simulation domain and the FFmpeg library involve two different stages. FFmpeg mainly processes audio and video data in the digital domain, while in the analog domain, the volume is mainly changed by operating hardware devices.

Adjust audio device volume in SDL library

First, I'll show you how to use the SDL library to change the volume of an audio stream. In SDL, you can use SDL_MixAudioFormatfunctions to adjust the audio volume. The fourth parameter of this function is the volume. The volume range is from 0 (silent) to SDL_MIX_MAXVOLUME (maximum volume). If you want to express the volume as a percentage, then you can first convert the percentage into a ratio of SDL_MIX_MAXVOLUME.

Here is a simple function that takes an audio buffer and a volume percentage, and then adjusts the buffer's volume to the specified percentage:

#include <SDL2/SDL.h>

void adjustVolume(Uint8 *buffer, int length, int volumePercent) {
    
    
    // Ensure volumePercent is between 0 and 100
    if(volumePercent < 0) volumePercent = 0;
    if(volumePercent > 100) volumePercent = 100;

    // Convert volumePercent to SDL volume range
    int volume = (volumePercent * SDL_MIX_MAXVOLUME) / 100;

    // Adjust the volume
    SDL_MixAudioFormat(buffer, buffer, AUDIO_S16LSB, length, volume);
}

In this function, bufferit is the audio data that needs to be adjusted, lengththe length of the data, volumePercentand the percentage of the volume. We first make sure volumePercentit's between 0 and 100, then convert it to an SDL volume range. We then use SDL_MixAudioFormata function to adjust the volume of the audio data.

Note that this function assumes that the audio data are 16-bit signed integers, stored in little-endian format ( AUDIO_S16LSB). If your audio data is in a different format, you'll need to modify this function accordingly.

A final note is that this function only changes the volume of the provided audio data and does not affect the overall volume setting of the SDL audio device. If you need to change the volume of an audio device, you need to do it through the operating system or hardware device.

Adjust audio device volume in Qt framework

In Qt, you can use QMediaPlayerclass setVolumemethods to adjust the audio volume. This method accepts a parameter ranging from 0 to 100, representing the percentage of the volume. Here's an example function that takes an QMediaPlayerobject and a volume percentage and then adjusts the player's volume to the specified percentage:

#include <QMediaPlayer>

void adjustVolume(QMediaPlayer* player, int volumePercent) {
    
    
    // Ensure volumePercent is between 0 and 100
    if(volumePercent < 0) volumePercent = 0;
    if(volumePercent > 100) volumePercent = 100;

    // Adjust the volume
    player->setVolume(volumePercent);
}

In this function, playeris the object you wish to adjust the volume of QMediaPlayer, volumePercentwhich is the percentage of the volume. We first make sure volumePercentit's between 0 and 100 and then use setVolumemethods to adjust the player's volume.

This function changes the volume of the provided QMediaPlayerobject, but does not affect QMediaPlayerthe volume of other objects or the overall volume setting of the system.

Note that QMediaPlayerclasses are part of Qt's multimedia framework, and you need to add the corresponding modules to your project. If you are using qmake, you can add the following code to your .pro file:

QT += multimedia

If you are using CMake you can use the following code:

find_package(Qt6 COMPONENTS Multimedia REQUIRED)
target_link_libraries(your_project_name PRIVATE Qt6::Multimedia)

Audio adjustment upper limit

Whether in the digital or analog domain, volume control has an upper limit. In the digital domain, volume is usually represented as a range of values. For example, in SDL, the volume range is from 0 (silent) to SDL_MIX_MAXVOLUME (maximum volume). If you try to set a volume above this range, it will usually be limited to this range.

In the analog domain, the upper limit on volume is usually determined by the physical limitations of the hardware device. For example, the volume of speakers and headphones cannot exceed their maximum output level. If you try to set a volume above this level, the audio device may cut the amplitude of the signal or, in extreme cases, may produce distortion.

Also, note that between the digital and analog domains, there is a DAC (Digital-to-Analog Converter). The DAC also has its own dynamic range, and signals outside this range may be limited or distorted.

In general, the volume control has an upper limit, and the part beyond this upper limit may be limited or distorted. This upper limit may be determined by the range of digital signal processing, the dynamic range of the DAC, or the physical limitations of the audio device.

System level volume

The SDL library and Qt library provide audio data processing and playback functions. They do not directly control the audio device volume at the operating system level. These libraries can modify the volume of audio data, but cannot directly change the system volume setting or the volume of the audio device.

If you need to change the system volume, or the volume of a specific audio device, you usually need to use the API provided by the operating system, or use a specialized library. For example, in Windows, you may need to use the Windows Core Audio API; in Linux, you may need to use the ALSA or PulseAudio API; in macOS, you may need to use the Core Audio API.

These APIs usually allow you to directly operate the operating system volume or audio device volume, but please also note that operating these APIs may require special permissions because it involves control of system resources.

System-level volume control can be performed in both the digital and analog domains. This depends on the specific system and hardware.

In the digital domain, system-level volume control can be achieved through the operating system's audio services. For example, in Windows, volume can be controlled through the Windows Audio Session API (WASAPI). In this case, the volume control is actually done in the digital signal processing stage, changing the amplitude of the digital audio data.

This digital audio data is then sent to a DAC (Digital-to-Analog Converter) to be converted into an analog signal. In the analog domain, there are also methods of volume control. For example, many audio devices, such as speakers and headphones, have hardware volume controls. In this case, volume control is achieved by varying the strength of the analog electrical signal.

In general, system-level volume control can occur in both the digital and analog domains, depending on your system and hardware. In the digital domain, it is usually implemented through the operating system's audio service. In the analog domain, it is usually achieved by operating the audio device's hardware volume control.

Adjust system volume in Windows

In Windows systems, system-level volume can be controlled through the Windows Audio Session API (WASAPI). Here is a simple example showing how to adjust system volume using C++ and WASAPI:

#include <Windows.h>
#include <mmdeviceapi.h>
#include <endpointvolume.h>

// Function to adjust system volume
void setSystemVolume(int volumePercent) {
    
    
    // Ensure volumePercent is between 0 and 100
    if (volumePercent < 0) volumePercent = 0;
    if (volumePercent > 100) volumePercent = 100;

    // Convert volumePercent to float scale (0.0 to 1.0)
    float volume = volumePercent / 100.0f;

    // Initialize COM
    CoInitialize(NULL);

    // Get default audio device
    IMMDeviceEnumerator *deviceEnumerator = NULL;
    CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, __uuidof(IMMDeviceEnumerator), (LPVOID *)&deviceEnumerator);

    IMMDevice *defaultDevice = NULL;
    deviceEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &defaultDevice);

    // Get volume control
    IAudioEndpointVolume *endpointVolume = NULL;
    defaultDevice->Activate(__uuidof(IAudioEndpointVolume), CLSCTX_INPROC_SERVER, NULL, (LPVOID *)&endpointVolume);

    // Set volume
    endpointVolume->SetMasterVolumeLevelScalar(volume, NULL);

    // Clean up
    endpointVolume->Release();
    defaultDevice->Release();
    deviceEnumerator->Release();
    CoUninitialize();
}

In this function, we first make sure it volumePercentis between 0 and 100 and then convert it to a floating point range of 0.0 to 1.0. Next, we use WASAPI to get the default audio device and get the volume control for that device. Then we use IAudioEndpointVolume::SetMasterVolumeLevelScalara function to set the volume. Finally, we release all COM objects and deinitialize COM.

Please note that in order to compile and run this code, you need to include the Windows SDK in your project and link to it mmdevapi.lib.

In addition, this function will change the volume of the entire system, affecting all audio streams. If you only want to change the volume of a specific audio stream, you need to use a different approach, such as modifying the amplitude of the audio data in the digital domain as I mentioned before.

The volume set through the Windows Audio Session API (WASAPI) affects the overall system volume, so the volume you see on the volume icon in the taskbar will change accordingly.

This is because WASAPI operates system-level volume control, which affects all audio streams output to audio devices. This includes all applications and system sounds. Therefore, when you change the volume via WASAPI, the volume icon on the taskbar will reflect the change.

However, please note that if you change the volume of audio data within a specific application (such as through SDL or other audio libraries), this change will usually not be reflected on the taskbar volume icon. This is because this method only changes the specific audio stream and does not affect the system-level volume settings.

Adjust system volume in Linux

In Linux systems, the commonly used audio device control program is ALSA or PulseAudio. Below is an example using ALSA to adjust system volume. This requires installing ALSA's development libraries (on most Linux distributions, the package name is usually libasound2-devor alsa-lib-devel).

#include <alsa/asoundlib.h>
#include <math.h>

// Function to adjust system volume
void setSystemVolume(int volumePercent) {
    
    
    // Ensure volumePercent is between 0 and 100
    if (volumePercent < 0) volumePercent = 0;
    if (volumePercent > 100) volumePercent = 100;

    // Open mixer
    snd_mixer_t *handle;
    snd_mixer_open(&handle, 0);
    snd_mixer_attach(handle, "default");
    snd_mixer_selem_register(handle, NULL, NULL);
    snd_mixer_load(handle);

    // Get mixer element
    snd_mixer_selem_id_t *sid;
    snd_mixer_selem_id_alloca(&sid);
    snd_mixer_selem_id_set_index(sid, 0);
    snd_mixer_selem_id_set_name(sid, "Master");
    snd_mixer_elem_t* elem = snd_mixer_find_selem(handle, sid);

    // Convert volumePercent to ALSA volume range
    long minv, maxv;
    snd_mixer_selem_get_playback_volume_range(elem, &minv, &maxv);
    long volume = (volumePercent * (maxv - minv) / 100) + minv;

    // Set volume
    snd_mixer_selem_set_playback_volume_all(elem, volume);

    // Clean up
    snd_mixer_close(handle);
}

In this function, we first make sure it volumePercentis between 0 and 100, then we open ALSA's mixer and get the mix element named "Master". We then get the volume range of the element and volumePercentconvert it to a value in this range. Finally, we use snd_mixer_selem_set_playback_volume_alla function to set the volume and then turn off the mixer.

Please note that this function changes the system-wide volume, affecting all audio streams. If you only want to change the volume of a specific audio stream, you need to use a different method, such as modifying the amplitude of the audio data in the digital domain.

In addition, this function assumes that your system uses ALSA to manage audio devices, and "Master" is the mixing element that controls the master volume. This may vary on different systems or configurations. You may need to modify this function depending on your system configuration.