Table of contents
foreword
Recently, I am very interested in golang processing audio and video, and I have done some research on goav, a common library for golang audio and video. I wrote a wav to sample rate function. Let me share with you, I encountered a lot of pitfalls in the middle, and the process of solving them is quite interesting.
Environment deployment
The code runs on the Ubuntu environment and needs to use goav, which is a golang package of the ffmpeg source code.
goav address: https://github.com/giorgisio/goav
Goav is installed as follows
sudo apt-get -y install autoconf automake build-essential libass-dev libfreetype6-dev libsdl1.2-dev libtheora-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texi2html zlib1g-dev
sudo apt install -y libavdevice-dev libavfilter-dev libswscale-dev libavcodec-dev libavformat-dev libswresample-dev libavutil-dev
sudo apt-get install yasm
export FFMPEG_ROOT=$HOME/ffmpeg
export CGO_LDFLAGS="-L$FFMPEG_ROOT/lib/ -lavcodec -lavformat -lavutil -lswscale -lswresample -lavdevice -lavfilter"
export CGO_CFLAGS="-I$FFMPEG_ROOT/include"
export LD_LIBRARY_PATH=$HOME/ffmpeg/lib
```
```
go get github.com/xueqing/goav
the code
Look at the code first
package main
//#include<stdlib.h>
import "C"
import (
"flag"
"fmt"
"github.com/google/logger"
"github.com/xueqing/ffmpeg-demo/logutil"
"github.com/xueqing/goav/libswresample"
"github.com/youpy/go-wav"
"io"
"os"
"reflect"
"unsafe"
)
func main() {
var (
inputUrl string = "./data/1.wav"
inNumChannels int64 = 1
inSampleRate int = 16000
//inBitsPerSample uint16 = 16
outNumChannels int64 = 1
outSampleRate int = 48000
outBitsPerSample uint16 = 16
swr *libswresample.SwrContext = libswresample.SwrAlloc()
)
flag.Parse()
logutil.Init(true, false, "resample.log")
defer logutil.Close()
swr.SwrAllocSetOpts(outNumChannels,
libswresample.AvSampleFormat(1),
outSampleRate,
inNumChannels,
libswresample.AvSampleFormat(1),
inSampleRate,
0,
0)
swr.SwrInit()
defer swr.SwrClose()
_inputFile, err := os.Open(inputUrl)
if err != nil {
logger.Errorf("open input file error(%v)", err)
return
}
defer _inputFile.Close()
_reader := wav.NewReader(_inputFile)
format, err := _reader.Format()
if err != nil {
logger.Errorf("input file format error(%v)", err)
return
}
fmt.Printf("input file format info -> AudioFormat:%v,NumChannels:%v,SampleRate:%v,ByteRate:%v,BlockAlign:%v,BitsPerSample:%v", int(format.AudioFormat), format.NumChannels, format.SampleRate, format.ByteRate, format.BlockAlign, format.BitsPerSample)
_tempFile, err := os.CreateTemp("", "*.wav")
if err != nil {
logger.Errorf("create temp file error(%v)", err)
return
}
logger.Infof("Create tempFile %v", _tempFile.Name())
defer func() {
_tempFile.Close()
}()
_samples := []wav.Sample{}
n := 4096
for {
spls, err := _reader.ReadSamples(uint32(n))
if err == io.EOF {
break
}
_samples = append(_samples, spls...)
}
_result := ResampleByFFmpegApi2(swr, _samples)
_writer := wav.NewWriter(_tempFile, uint32(len(_result)), uint16(outNumChannels), uint32(outSampleRate), outBitsPerSample)
err4 := _writer.WriteSamples(_result)
if err4 != nil {
logger.Errorf("write file error(%v)", err4)
err = err4
return
}
}
func ResampleByFFmpegApi2(swr *libswresample.SwrContext, samples []wav.Sample) []wav.Sample {
var (
_inArr **uint8
_outArr **uint8
_inptr []uint16
_outptr []uint16
)
_inArr = (**uint8)(C.malloc(C.sizeof_int))
defer C.free(unsafe.Pointer(_inArr))
_inptr = make([]uint16, len(samples))
_outArr = (**uint8)(C.malloc(C.sizeof_int))
defer C.free(unsafe.Pointer(_outArr))
_outptr = make([]uint16, len(samples)*3)
//fmt.Println(unsafe.Sizeof(uint16(0)))
for i, v := range samples {
_inptr[i] = uint16(v.Values[0])
}
*_inArr = (*uint8)(unsafe.Pointer((*reflect.SliceHeader)(unsafe.Pointer(&_inptr)).Data))
*_outArr = (*uint8)(unsafe.Pointer((*reflect.SliceHeader)(unsafe.Pointer(&_outptr)).Data))
ret := swr.SwrConvert(_outArr, len(samples)*3, _inArr, len(samples))
if ret > 0 {
fmt.Println(ret)
}
_result := make([]wav.Sample, ret)
for i := 0; i < ret; i++ {
_result[i] = wav.Sample{[2]int{int(_outptr[i]), 0}}
}
return _result
}
Code description:
1. The code is not a tool method. If you understand the logic, you can modify it into a tool method by yourself.
2. The swresample library in ffmpeg will be used to sample the audio data.
3. You can take a closer look, if you want to do real-time processing, you can also change it.
4. There is a parameter libswresample.AvSampleFormat(1) in the SwrAllocSetOpts method. Why take 1? This is mainly to select the enumeration of the sampling representation method. Refer to the underlying source code enumeration. I will post it below. On my side, because the audio is s16, I choose 1.
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_S64, ///< signed 64 bits
AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
Audio preparation, the input audio is 16k sampling rate audio.
(base) xxx@hu:~/GolandProjects/MediaRelay/data$ ffmpeg -i 1.wav
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '1.wav':
Metadata:
date : 2020-09-28
encoder : Lavf58.45.100
Duration: 00:04:01.75, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Implementation
input file format info -> AudioFormat:1,NumChannels:1,SampleRate:16000,ByteRate:32000,BlockAlign:2,BitsPerSample:16INFO : 2022/12/06 17:14:49.937547 csdn_wav_util.go:62: Create tempFile /tmp/2402235346.wav
11603961
final audio
(base) xxx@hu:/tmp$ ffmpeg -i 2402235346.wav
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '2402235346.wav':
Duration: 00:04:01.75, bitrate: 768 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s
Summarize
In fact, in the process of writing code, there is something that gives me a particularly headache, that is, how to convert the array to **uint. If you are interested, you can study the conversion logic of the ResampleByFFmpegApi2 method, and you will learn a lot.
share:
Our fatigue is often not caused by work, but by worry, frustration and dissatisfaction. ——"The Weakness of Human Nature"