Go language study notes - call ffmpeg-api to achieve audio resampling

Table of contents

foreword

Environment deployment

the code

Summarize


foreword

Recently, I am very interested in golang processing audio and video, and I have done some research on goav, a common library for golang audio and video. I wrote a wav to sample rate function. Let me share with you, I encountered a lot of pitfalls in the middle, and the process of solving them is quite interesting.

Environment deployment

The code runs on the Ubuntu environment and needs to use goav, which is a golang package of the ffmpeg source code.

goav address: https://github.com/giorgisio/goav

Goav is installed as follows

sudo apt-get -y install autoconf automake build-essential libass-dev libfreetype6-dev libsdl1.2-dev libtheora-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texi2html zlib1g-dev

sudo apt install -y libavdevice-dev libavfilter-dev libswscale-dev libavcodec-dev libavformat-dev libswresample-dev libavutil-dev

sudo apt-get install yasm

export FFMPEG_ROOT=$HOME/ffmpeg
export CGO_LDFLAGS="-L$FFMPEG_ROOT/lib/ -lavcodec -lavformat -lavutil -lswscale -lswresample -lavdevice -lavfilter"
export CGO_CFLAGS="-I$FFMPEG_ROOT/include"
export LD_LIBRARY_PATH=$HOME/ffmpeg/lib
``` 

``` 
go get github.com/xueqing/goav

the code

Look at the code first

package main

//#include<stdlib.h>
import "C"
import (
	"flag"
	"fmt"
	"github.com/google/logger"
	"github.com/xueqing/ffmpeg-demo/logutil"
	"github.com/xueqing/goav/libswresample"
	"github.com/youpy/go-wav"
	"io"
	"os"
	"reflect"
	"unsafe"
)

func main() {
	var (
		inputUrl      string = "./data/1.wav"
		inNumChannels int64  = 1
		inSampleRate  int    = 16000
		//inBitsPerSample  uint16                    = 16
		outNumChannels   int64                     = 1
		outSampleRate    int                       = 48000
		outBitsPerSample uint16                    = 16
		swr              *libswresample.SwrContext = libswresample.SwrAlloc()
	)
	flag.Parse()
	logutil.Init(true, false, "resample.log")
	defer logutil.Close()
	swr.SwrAllocSetOpts(outNumChannels,
		libswresample.AvSampleFormat(1),
		outSampleRate,
		inNumChannels,
		libswresample.AvSampleFormat(1),
		inSampleRate,
		0,
		0)
	swr.SwrInit()
	defer swr.SwrClose()

	_inputFile, err := os.Open(inputUrl)
	if err != nil {
		logger.Errorf("open input file error(%v)", err)
		return
	}
	defer _inputFile.Close()
	_reader := wav.NewReader(_inputFile)
	format, err := _reader.Format()
	if err != nil {
		logger.Errorf("input file format error(%v)", err)
		return
	}
	fmt.Printf("input file format info -> AudioFormat:%v,NumChannels:%v,SampleRate:%v,ByteRate:%v,BlockAlign:%v,BitsPerSample:%v", int(format.AudioFormat), format.NumChannels, format.SampleRate, format.ByteRate, format.BlockAlign, format.BitsPerSample)

	_tempFile, err := os.CreateTemp("", "*.wav")
	if err != nil {
		logger.Errorf("create temp file error(%v)", err)
		return
	}
	logger.Infof("Create tempFile %v", _tempFile.Name())
	defer func() {
		_tempFile.Close()
	}()
	_samples := []wav.Sample{}
	n := 4096
	for {
		spls, err := _reader.ReadSamples(uint32(n))
		if err == io.EOF {
			break
		}
		_samples = append(_samples, spls...)
	}
	_result := ResampleByFFmpegApi2(swr, _samples)
	_writer := wav.NewWriter(_tempFile, uint32(len(_result)), uint16(outNumChannels), uint32(outSampleRate), outBitsPerSample)

	err4 := _writer.WriteSamples(_result)
	if err4 != nil {
		logger.Errorf("write file error(%v)", err4)
		err = err4
		return
	}
}

func ResampleByFFmpegApi2(swr *libswresample.SwrContext, samples []wav.Sample) []wav.Sample {
	var (
		_inArr  **uint8
		_outArr **uint8
		_inptr  []uint16
		_outptr []uint16
	)
	_inArr = (**uint8)(C.malloc(C.sizeof_int))
	defer C.free(unsafe.Pointer(_inArr))
	_inptr = make([]uint16, len(samples))
	_outArr = (**uint8)(C.malloc(C.sizeof_int))
	defer C.free(unsafe.Pointer(_outArr))
	_outptr = make([]uint16, len(samples)*3)
	//fmt.Println(unsafe.Sizeof(uint16(0)))
	for i, v := range samples {
		_inptr[i] = uint16(v.Values[0])
	}
	*_inArr = (*uint8)(unsafe.Pointer((*reflect.SliceHeader)(unsafe.Pointer(&_inptr)).Data))
	*_outArr = (*uint8)(unsafe.Pointer((*reflect.SliceHeader)(unsafe.Pointer(&_outptr)).Data))
	ret := swr.SwrConvert(_outArr, len(samples)*3, _inArr, len(samples))
	if ret > 0 {
		fmt.Println(ret)
	}
	_result := make([]wav.Sample, ret)

	for i := 0; i < ret; i++ {
		_result[i] = wav.Sample{[2]int{int(_outptr[i]), 0}}
	}
	return _result
}

Code description:

1. The code is not a tool method. If you understand the logic, you can modify it into a tool method by yourself.

2. The swresample library in ffmpeg will be used to sample the audio data.

3. You can take a closer look, if you want to do real-time processing, you can also change it.

4. There is a parameter libswresample.AvSampleFormat(1) in the SwrAllocSetOpts method. Why take 1? This is mainly to select the enumeration of the sampling representation method. Refer to the underlying source code enumeration. I will post it below. On my side, because the audio is s16, I choose 1.

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

Audio preparation, the input audio is 16k sampling rate audio.

(base) xxx@hu:~/GolandProjects/MediaRelay/data$ ffmpeg -i 1.wav 
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '1.wav':
  Metadata:
    date            : 2020-09-28
    encoder         : Lavf58.45.100
  Duration: 00:04:01.75, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s

Implementation

input file format info -> AudioFormat:1,NumChannels:1,SampleRate:16000,ByteRate:32000,BlockAlign:2,BitsPerSample:16INFO : 2022/12/06 17:14:49.937547 csdn_wav_util.go:62: Create tempFile /tmp/2402235346.wav
11603961
 

final audio

(base) xxx@hu:/tmp$ ffmpeg -i 2402235346.wav 
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '2402235346.wav':
  Duration: 00:04:01.75, bitrate: 768 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s 

Summarize

In fact, in the process of writing code, there is something that gives me a particularly headache, that is, how to convert the array to **uint. If you are interested, you can study the conversion logic of the ResampleByFFmpegApi2 method, and you will learn a lot.

share:

        Our fatigue is often not caused by work, but by worry, frustration and dissatisfaction. ——"The Weakness of Human Nature"

Guess you like

Origin blog.csdn.net/zhiweihongyan1/article/details/128205603