Introduction to swr_convert audio resampling

When doing audio processing, we sometimes need to adjust the sampling rate or sampling format of the audio stream. It may be that the speaker does not support 48000 sampling rate, so it needs to be reduced to 44100 sampling. It may also be due to various business reasons that the sampling rate needs to be adjusted , sampling format, or channel layout.

FFmpeg provides  swr_convert() functions to achieve the above functions.

It should be noted that adjusting the sampling rate will not affect the playing time of the audio stream. The original 10-minute audio file, if you increase or decrease the sampling rate, it will still play 10 minutes.

But  swr_convert() it supports adjusting the playback time, which will be discussed later.


The following demonstrates the usage of the function through a code example  swr_convert() , the code download address: GitHub , and the compilation environment  Qt 5.15.2 is  MSVC2019_64bit .

This code example mainly reduces the sampling rate of the audio stream from 48000 to 44100, converts the audio format from 1 to 2  fltp ,  s64and keeps the channel layout unchanged.

int tgt_fmt = AV_SAMPLE_FMT_S64;
int tgt_freq = 44100;

The key codes are as follows:

The API function call flow of the audio resampling library is as follows:


Let's introduce each function.

1,swr_alloc_set_opts() , is defined as follows:

struct SwrContext *swr_alloc_set_opts(struct SwrContext *s,
                                      int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
                                      int64_t  in_ch_layout, enum AVSampleFormat  in_sample_fmt, int  in_sample_rate,
                                      int log_offset, void *log_ctx);

The parameters are explained as follows:

  • struct SwrContext *s, if you pass NULL, it will apply for a piece of memory internally, and if it is not NULL, you can reuse the previous memory without applying.
  • int64_t out_ch_layout, the target channel layout
  • enum AVSampleFormat out_sample_fmt, the target sampling format
  • int out_sample_rate, the target sampling rate
  • int64_t in_ch_layout, the original channel layout
  • enum AVSampleFormat in_sample_fmt, the original sampling format
  • int in_sample_rate, the original sampling rate
  • int log_offset, I don’t know what to do, just fill in 0.
  • void *log_ctx, I don’t know what to do, just fill in NULL.

2,swr_init() , initialize the resampling function, if you change the resampling context  options, that is, change the options, such as changing the sampling rate, you must adjust it  swr_init() to take effect. .

3,swr_convert() , the conversion function is defined as follows:

/** Convert audio.
 *
 * in and in_count can be set to 0 to flush the last few samples out at the
 * end.
 *
 * If more input is provided than output space, then the input will be buffered.
 * You can avoid this buffering by using swr_get_out_samples() to retrieve an
 * upper bound on the required number of output samples for the given number of
 * input samples. Conversion will run directly without copying whenever possible.
 *
 * @param s         allocated Swr context, with parameters set
 * @param out       output buffers, only the first one need be set in case of packed audio
 * @param out_count amount of space available for output in samples per channel
 * @param in        input buffers, only the first one need to be set in case of packed audio
 * @param in_count  number of input samples available in one channel
 *
 * @return number of samples output per channel, negative value on error
 */
int swr_convert(struct SwrContext *s, uint8_t **out, int out_count,
                                const uint8_t **in , int in_count);

The parameters are explained as follows:

  • struct SwrContext *s, resampling context, also called resampling instance.
  • uint8_t **out, the output memory address.
  • int out_count, how many samples are there in each channel, this value is usually recommended to be set larger, to avoid insufficient memory space, insufficient space to write, it will be cached in the resampling instance, and accumulate more and more.
  • const uint8_t **in, the memory address of the input.
  • int in_count, the input audio stream, how many samples per channel.

swr_convert() The return value of the function is the actual number of samples.


The result of running the project code is as follows:

The focus of the project code in this article is  out_count the calculation, as follows:

out_count = (int64_t)frame->nb_samples * tgt_freq / frame->sample_rate + 256;

Since  juren-30s.mp4 most audio frames of the source file are 1024 samples, reduce 44100 from 48000, which means 1024 samples will become 940 samples.

But from the running results in the figure above, we can see that sometimes 941 samples are converted, which is one more than 940. Therefore,  out_count 256 is usually added to the original size to make the writing space larger.

If there is not enough space to write, it will be cached in the resampling instance, accumulating more and more.

+256 is also  ffplay what the player does.


Finally, another important point is that SwrContext there may be residual data in the context. When there is no data input, it needs to be adjusted again  swr_convert() to refresh the residual data, as follows:

It can be seen that 16 residual samples were finally brushed out .


The above is the resampling scene of the player. After resampling, the memory is obtained  out , and the memory can be directly thrown  out to SDL to play.

But sometimes, we need to encode and save the converted data, so in this case, we need to hang  out the memory in  AVFrame it, the specific method is as follows:

AVFrame frame;
frame->extended_data = out;
frame->data = out;
frame->nb_samples = out_nb_samples;

Then set frame it  pts , it should be ok, but I haven't tested it with coding, I will add it later.

Audio can also be av_samples_alloc() used to apply for memory, but since this article needs +256, this function is not used.


Extended knowledge:

The conversion functions of audio and video are named similarly, for example:

av_samples_alloc()
av_image_alloc()
av_samples_fill_arrays()
av_image_fill_arrays()

The function  av_frame_get_buffer() can be used for audio and video application memory at the same time, provided that the format, width and height, sampling rate, and channel are AVFrame set .


swr_convert() I personally think it is a bit cumbersome to use the function. In fact, the audio  aformat format filter can also adjust the sampling rate, sampling format, and channel layout.

The syntax of the filter is relatively uniform, but  aformat the filter cannot adjust the playback duration. It is recommended to read " Introduction to FFmpeg's Audio aformat Filter "

Finally swr_free() the instance needs to be resampled with a release tune.


The Easter eggs left at the beginning, the adjustment of the playback time is  swr_set_compensation() realized by using it. It is recommended to read " How to Adjust the Audio Playback Time "


 Recommend a free open course of Zero Sound Academy. I personally think the teacher taught it well, so I would like to share it with you:

Linux, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutines, DPDK and other technical content, learn now

Guess you like

Origin blog.csdn.net/u012117034/article/details/127537875