Silence detection in videos using ffmpeg

1 Original video information

View basic video information through the ffmpeg -i command

ffmpeg version 6.1-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Duration: 00:07:49.52, start: 0.000000, bitrate: 20142 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High 4:2:2) (avc1 / 0x31637661), yuv422p10le(tv, bt709, progressive), 1920x2160, 20007 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.31.102 libx264
      timecode        : 00:32:38:24
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Data: none (tmcd / 0x64636D74)
    Metadata:
      handler_name    : TimeCodeHandler
      timecode        : 00:32:38:24
At least one output file must be specified

 

2 Use ffmpeg for video silence detection

ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=0.5 -vn -sn -dn -f null /dev/null

  • -af: The function of silencedetect is to obtain the maximum volume, average volume and volume histogram of audio. It only supports these four formats AV_SAMPLE_FMT_S16 : ,  AV_SAMPLE_FMT_S32 ,  AV_SAMPLE_FMT_FLT , and  AV_SAMPLE_FMT_DBL - if not, of course FFmpeg can automatically convert.

  • The volume considered as mute is  noise determined by parameters, and the default is  -60dB or  0.001; the continuous duration considered as silence is  duration determined by parameters, and the default is  2 seconds. If the parameter  mono is non,  0 it means that each channel is detected separately, and the default is to merge them together for detection.

  • Combined detection: For example, if 2 seconds of continuous silence (or low sound) is considered mute, then one of the channels meets the standard, and if the other channel does not meet the standard during this period, it is not considered mute.

  • -vn-sn and  -dn tells FFmpeg to ignore non-audio streams. It can avoid unnecessary operations during analysis and make it faster.

Note: When using in Windows, you need to  /dev/null replace with  NUL
For multi-channel audio, you can specify each channel to be detected separately:

ffmpeg -i input.mp3 -af "silencedetect=mono=1" -vn -sn -dn -f null /dev/null

3. Display of test results

ffmpeg version 6.1-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Duration: 00:07:49.52, start: 0.000000, bitrate: 20142 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High 4:2:2) (avc1 / 0x31637661), yuv422p10le(tv, bt709, progressive), 1920x2160, 20007 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.31.102 libx264
      timecode        : 00:32:38:24
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Data: none (tmcd / 0x64636D74)
    Metadata:
      handler_name    : TimeCodeHandler
      timecode        : 00:32:38:24
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'nul':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Stream #0:0(eng): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.31.102 pcm_s16le
[silencedetect @ 000001ee5a9cb5c0] silence_start: 0   0x
[silencedetect @ 000001ee5a9cb5c0] silence_end: 13.238 | silence_duration: 13.238
[silencedetect @ 000001ee5a9cb5c0] silence_start: 17.0503
[silencedetect @ 000001ee5a9cb5c0] silence_end: 17.5835 | silence_duration: 0.53325
[silencedetect @ 000001ee5a9cb5c0] silence_start: 30.0168
[silencedetect @ 000001ee5a9cb5c0] silence_end: 30.5313 | silence_duration: 0.514437
[silencedetect @ 000001ee5a9cb5c0] silence_start: 35.2619
[silencedetect @ 000001ee5a9cb5c0] silence_end: 35.9293 | silence_duration: 0.667375
[silencedetect @ 000001ee5a9cb5c0] silence_start: 50.3024
[silencedetect @ 000001ee5a9cb5c0] silence_end: 50.909 | silence_duration: 0.606563
[silencedetect @ 000001ee5a9cb5c0] silence_start: 56.8453
[silencedetect @ 000001ee5a9cb5c0] silence_end: 57.9748 | silence_duration: 1.12958
[silencedetect @ 000001ee5a9cb5c0] silence_start: 76.3573
[silencedetect @ 000001ee5a9cb5c0] silence_end: 76.8851 | silence_duration: 0.527792
[silencedetect @ 000001ee5a9cb5c0] silence_start: 83.8969
[silencedetect @ 000001ee5a9cb5c0] silence_end: 84.6447 | silence_duration: 0.747771
[silencedetect @ 000001ee5a9cb5c0] silence_start: 97.7624
[silencedetect @ 000001ee5a9cb5c0] silence_end: 98.294 | silence_duration: 0.531604
[silencedetect @ 000001ee5a9cb5c0] silence_start: 99.3107
[silencedetect @ 000001ee5a9cb5c0] silence_end: 99.8335 | silence_duration: 0.522792
[silencedetect @ 000001ee5a9cb5c0] silence_start: 108.826
[silencedetect @ 000001ee5a9cb5c0] silence_end: 109.517 | silence_duration: 0.690979
[silencedetect @ 000001ee5a9cb5c0] silence_start: 126.216
[silencedetect @ 000001ee5a9cb5c0] silence_end: 126.837 | silence_duration: 0.621333
[silencedetect @ 000001ee5a9cb5c0] silence_start: 126.837
[silencedetect @ 000001ee5a9cb5c0] silence_end: 127.59 | silence_duration: 0.752958
[silencedetect @ 000001ee5a9cb5c0] silence_start: 172.294
[silencedetect @ 000001ee5a9cb5c0] silence_end: 172.929 | silence_duration: 0.634542
[silencedetect @ 000001ee5a9cb5c0] silence_start: 198.802
[silencedetect @ 000001ee5a9cb5c0] silence_end: 199.375 | silence_duration: 0.572875
[silencedetect @ 000001ee5a9cb5c0] silence_start: 203.289
[silencedetect @ 000001ee5a9cb5c0] silence_end: 203.968 | silence_duration: 0.678875
[silencedetect @ 000001ee5a9cb5c0] silence_start: 229.058
[silencedetect @ 000001ee5a9cb5c0] silence_end: 229.595 | silence_duration: 0.537167
[silencedetect @ 000001ee5a9cb5c0] silence_start: 230.641
[silencedetect @ 000001ee5a9cb5c0] silence_end: 231.178 | silence_duration: 0.536604
[silencedetect @ 000001ee5a9cb5c0] silence_start: 240.758
[silencedetect @ 000001ee5a9cb5c0] silence_end: 241.555 | silence_duration: 0.796854
[silencedetect @ 000001ee5a9cb5c0] silence_start: 314.606
[silencedetect @ 000001ee5a9cb5c0] silence_end: 315.126 | silence_duration: 0.519979
[silencedetect @ 000001ee5a9cb5c0] silence_start: 356.308
[silencedetect @ 000001ee5a9cb5c0] silence_end: 356.832 | silence_duration: 0.524271
[silencedetect @ 000001ee5a9cb5c0] silence_start: 358.494
[silencedetect @ 000001ee5a9cb5c0] silence_end: 359.071 | silence_duration: 0.576917
[silencedetect @ 000001ee5a9cb5c0] silence_start: 374.441
[silencedetect @ 000001ee5a9cb5c0] silence_end: 375.158 | silence_duration: 0.717313
[silencedetect @ 000001ee5a9cb5c0] silence_start: 375.999
[silencedetect @ 000001ee5a9cb5c0] silence_end: 376.772 | silence_duration: 0.772271
[silencedetect @ 000001ee5a9cb5c0] silence_start: 389.417
[silencedetect @ 000001ee5a9cb5c0] silence_end: 389.947 | silence_duration: 0.529458
[silencedetect @ 000001ee5a9cb5c0] silence_start: 419.457
[silencedetect @ 000001ee5a9cb5c0] silence_end: 420.038 | silence_duration: 0.581125
[silencedetect @ 000001ee5a9cb5c0] silence_start: 447.481
[silencedetect @ 000001ee5a9cb5c0] silence_end: 447.991 | silence_duration: 0.510146
[silencedetect @ 000001ee5a9cb5c0] silence_start: 454.29
[out#0/null @ 000001ee590392c0] video:0kB audio:88020kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
size=N/A time=00:07:49.41 bitrate=N/A speed= 957x
[silencedetect @ 000001ee5a9cb5c0] silence_end: 469.44 | silence_duration: 15.1504

 

 4 Introduction to FFmpeg

The FFmpeg project was founded in 2000 by Fabrice Bellard. So far, developers of the FFmpeg project still have extensive overlap with multimedia open source projects such as VLC, MPV, dav1d, x264, etc. Ffmpeg (FastForward Mpeg) is an open source software that follows the GPL and performs very well in audio and video processing. It covers almost all existing video and audio formats encoding, decoding, transcoding, mixing, filtering and playback. As the most popular video and image processing software, it is widely used by different companies from various industries. It is also a cross-platform software, perfectly compatible with Linux, Windows, Mac OSX and other platforms. In fact, it consists of 3 major components, known as the Three Musketeers of audio and video processing tools:

  • Ffmpeg: consists of command line and is used for multimedia format conversion
  • Ffplay: a multimedia player based on the ffmpeg open source code library libraries
  • Ffprobe: a multimedia stream analyzer based on ffmpeg

Ffmpeg should be the core tool in the FFmpeg tool set, supporting a variety of encoders, decoders, packaging formats, and filter functions. The basic components of the FFmpeg framework include AVFormat, AVCodec, AVFilter, AVDevice, AVUtil and other module libraries. The structure diagram is as follows:

  •  AVFormat–FFmpeg packaging module

AVFormat implements most of the current media encapsulation formats in the multimedia field, including encapsulation and decapsulation, such as MP4, FLV, KV, TS and other file encapsulation formats, RTMP, RTSP, MMS, HLS and other network protocol encapsulation formats. Whether FFmpeg supports a certain media encapsulation format depends on whether the encapsulation library of this format is included during compilation. According to actual needs, the media encapsulation format can be expanded and your own customized encapsulation format can be added, that is, your own encapsulation processing module can be added to AVFormat.

  • AVCodec – FFmpeg’s codec module

AVCodec implements most of the currently commonly used codec formats in the multimedia field, supporting both encoding and decoding. In addition to supporting built-in media codec formats such as MPEG4, AAC, and MJPEG, AVCodec also supports third-party codecs, such as H.264 (AVC) encoding, which requires the use of x264 encoder; H.265 (HEVC) encoding , you need to use the x264 encoder; for MP3 (mp3lame) encoding, you need to use the libmp3lame encoder. If you want to add your own encoding format or hardware codec, you need to add the corresponding codec module in AVCodec.

  • AVFilter – filter module for FFmpeg

The AVFilter library provides a general filter processing framework for audio, video, subtitles, etc. In AVFilter, a filter frame can have multiple inputs and multiple outputs.

  • swresample – FFmpeg’s audio conversion calculation module

The swresample module provides a high-level audio resampling API. For example, it allows manipulation of audio sampling, audio channel layout conversion and layout adjustment.

  • swscale – FFmpeg’s video image conversion calculation module

The swscale module provides a high-level image conversion API. For example, it allows image scaling and pixel format conversion, which is commonly seen in scaling images from 1080p to 720p or 480p, or converting image data from YUV420p to YUYV, or YUV conversion. RGB and other image format conversion.

5 FFmpeg common parameters

5.1 List of capability sets

  • -formats: List supported file formats.
  • -codecs: List supported codecs.
  • -decoders: List supported decoders.
  • -encoders: List supported encoders.
  • -protocols: List supported protocols.
  • -bsfs: List supported bitstream filters.
  • -filters: List supported filters.
  • -pix_fmts: List supported image sampling formats.
  • -sample_fmts: List supported sound sample formats.

5.2 Common input options

  • -i filename: Specify the input file name.
  • -f fmt: Force the file format to be set using the name in the capability set list (the default is selected based on the extension).
  • -ss hh:mm:ss[.xxx]: Set the starting time point of the input file. After startup, it will jump to this time point and start reading data.

For input, the following options are usually automatically recognized, but can also be forced.

  • -c codec: Specify the decoder, using the name in the capability set list.
  • -acodec codec: specifies the sound decoder, using the name in the capability set list.
  • -vcodec codec: Specify the codec of the video, using the name in the capability set list.
  • -b:v bitrate: Set the bitrate of the video stream, integer, unit bps.
  • -r fps: Set the frame rate of the video stream, integer, unit fps.
  • -s WxH: Set the video screen size. This can also be achieved by mounting a screen zoom filter.
  • -pix_fmt format: Set the image format of the video stream (such as RGB or YUV).
  • -ar sample rate: Set the sampling rate of the audio stream, integer, unit Hz.
  • -ab bitrate: Set the bitrate of the audio stream, integer, unit bps.
  • -ac channels: Set the number of channels of the audio stream.

5.3 Common output options

  • -f fmt: Force the file format to be set using the name in the capability set list (the default is selected based on the extension).
  • -c codec: Specify the encoder and use the name in the capability set list (the encoder is set to "copy" to indicate no encoding or decoding).
  • -acodec codec: Specify the sound codec, which needs to use the name in the capability set list (the codec is set to "copy" to indicate no encoding and decoding).
  • -vcodec codec: Specify the encoder of the video. You need to use the name in the capability set list (the codec is set to "copy" to indicate that no encoding or decoding is performed).
  • -r fps: Set the frame rate of the video encoder, integer, unit: fps.
  • -pix_fmt format: Set the image format used by the video encoder (such as RGB or YUV).
  • -ar sample rate: Set the sampling rate of the audio encoder, integer, unit Hz.
  • -b bitrate: Set the bit rate output by the audio and video encoder, integer, unit bps.
  • -ab bitrate: Set the bit rate of the audio encoder output, integer, unit bps.
  • -ac channels: Set the number of channels of the audio encoder.
  • -an Ignore any audio streams.
  • -vn Ignore any video streams.
  • -t hh:mm:ss[.xxx]: Set the time length of the output file.
  • -to hh:mm:ss[.xxx]: If the time length of the output file is not set, the termination time point can be set.

5.4 ffmpeg stream identification

Some options of FFMPEG can work on a specific media stream. In this case, a stream identifier needs to be added after the option. The following formats are allowed for stream identifiers:

  • Stream sequence number. For example, ":1" represents the second stream.
  • Stream type. For example, ":a" represents an audio stream, and the stream type can be combined with the stream number. For example, ":a:1" represents the second audio stream.
  • programme. Program and stream numbers can be combined.
  • Stream ID. The stream ID is an internal identification number.

If you want to set the second audio stream to copy, you need to specify -codec:a:1 copy

5.5 ffmpeg audio options

  • -aframes: Equivalent to frames:a, output option, used to specify the number of audio frames to output.
  • -aq: Equivalent to q:a, the old version is qscale:a, used to set audio quality.
  • -atag: Equivalent to tag:a, used to set the tag of the audio stream.
  • -af: Equivalent to filter:a, used to set a sound post-processing filter chain, its parameter is a string describing the sound post-processing chain.

5.6 ffmpeg video options

  • -vframes: Equivalent to frames:v, output option, used to specify the number of output video frames.
  • -aspect: Set the aspect ratio, such as 4:3, 16:9, 1.3333, 1.7777, etc.
  • -bits_per_raw_sample: Set the number of bits per pixel.
  • -vstats: Generate video statistics.
  • -vf: Equivalent to filter:v, used to set the post-processing filter chain of an image. Its parameter is a string describing the image post-processing chain.
  • -vtag: Equivalent to tag:v, used to set the tag of the video stream.
  • -force_fps: Force the video frame rate to be set.
  • -force_key_frames: Explicitly control the insertion of key frames. The parameter is a string, which can be a timestamp or an expression prefixed by "expr:". Such as "-force_key_frames 0:05:00", "-force_key_frames expr:gte(t,n_forced*5)"

5.7 ffmpeg filter options

-filter_simple adds a simple filter

-filter_complex FILTER Add complex filter

5.8 ffmpeg advanced options

  • -re: Requires input data to be processed at a set rate. This rate is the frame rate of the input file.
  • -map: Specify the stream mapping relationship of the output file. For example "-map 1:0 -map 1:1" requires that the first and second streams of the second input file be written to the output file. If there is no -map option, ffmpeg uses the default mapping relationship.

5.9 ffprobe parameters

Simply put, ffprobe is a multimedia stream analysis tool. It collects information from multimedia streams and prints it out in human and machine readable form. It can be used to detect the container type of multimedia streams, as well as the format and type of each multimedia stream. It can be used as a standalone application or combined with text filters to perform more complex processing.

  • -f format forces a certain format
  • -sexagesimal time unit format HOURS:MM:SS.MICROSECONDS
  • -pretty format beautification
  • -print_format format format (optional values: default, compact, csv, flat, ini, json, xml)
  • -of format -print_format alias
  • -select_streams stream_specifier selects the specified stream
  • -sections print section structure and information
  • -show_data show packet data
  • -show_data_hash Show packet data hash value
  • -show_error Show file detection/detection errors
  • -show_format displays format or container information
  • -show_frames displays frame information
  • -show_format_entry entry Displays the specified entry based on format/container information
  • -show_packets displays packet information
  • -show_programs show program information
  • -show_streams show stream information
  • -show_chapters displays chapter information
  • -count_frames counts the number of frames in each stream
  • -count_packets counts the number of packets in each flow
  • -show_program_version show ffprobe version
  • -show_library_versions show library versions
  • -show_versions show program and library versions
  • -show_pixel_formats Show pixel formats
  • -show_private_data show private data
  • -private same as show_private_data
  • -bitexact force bitexact output
  • -read_intervals read_intervals set read intervals
  • -default generic catch all option

5.10 ffplayer parameters

  • -x forces the width of the video display window
  • -y forces the height of the video display window to be set
  • -S sets the width and height of the video display
  • -fs forces full screen display
  • -an blocks audio
  • -vn block video
  • -Sn blocks subtitles
  • -ss Position and drag based on the set seconds
  • -t sets the playback video/audio length
  • -Bytes sets the positioning and dragging strategy, 0 means not draggable, 1 means draggable, -1 means automatic
  • -Nodisp closes the graphical display window
  • -f forces the set format to be used for parsing
  • -window_title sets the title of the display window
  • -af sets audio filter
  • -Codec forces the use of the set codec for decoding
  • -autorotate automatically rotates videos
  • -ast sets the audio stream to be played
  • -vst sets the video stream to be played
  • -sst sets the subtitle stream to be played
  • -Stats output multimedia playback status
  • -Fast non-standardized multimedia compatibility optimization
  • -sync audio and video synchronization settings can be set to reference based on audio and video, video time reference, or external extended time reference
  • -autoexit Automatically exits ffplay after multimedia playback. By default, ffplay does not exit the player after playback is completed.
  • -exitonkeydown Exits ffplay when a key press event occurs
  • -exitonmousedown Exits ffplay when a mouse button event occurs
  • -loop sets the number of loop playback times of multimedia files
  • -framedrop Automatically drops frames when CPU resource usage is too high
  • -infbuf sets an unlimited player buffer. This option is commonly used in real-time streaming media playback scenarios.
  • -vf video filter settings
  • -acodec forces the use of the set audio codec
  • -vcodec forces the use of the set video codec
  • -scodec forces the use of the set subtitle decoder

Guess you like

Origin blog.csdn.net/lsb2002/article/details/135485520