[FFmpeg] Quickly get started with FFmpeg in one article

FFmpeg

1. Environment setup

git clone https://github.com/tanersener/ffmpeg-video-slideshow-scripts.git

For convenience, the conda environment (py10) is used directly here, and the following code is used to download ffmpeg:

conda install -c conda-forge ffmpeg

At this point, the environment setup has been completed. You can check the version of ffmpeg and other information.

2. Basic code part

2.1 Hyperparameter settings

WIDTH=576
HEIGHT=1024
FPS=30
TRANSITION_DURATION=1
IMAGE_DURATION=2

2.2 Data loading

# FILES=`find ../media/*.jpg | sort -r`             # USE ALL IMAGES UNDER THE media FOLDER SORTED
# FILES=('../media/1.jpg' '../media/2.jpg')         # USE ONLY THESE IMAGE FILES
FILES=`find ../media/*.jpg`                         # USE ALL IMAGES UNDER THE media FOLDER

2.3 Some parameters

TRANSITION_FRAME_COUNT=$(( TRANSITION_DURATION*FPS ))
IMAGE_FRAME_COUNT=$(( IMAGE_DURATION*FPS ))
TOTAL_DURATION=$(( (IMAGE_DURATION+TRANSITION_DURATION)*IMAGE_COUNT - TRANSITION_DURATION ))
TOTAL_FRAME_COUNT=$(( TOTAL_DURATION*FPS ))

2.4 Specific operation processing of input stream

for IMAGE in ${
    
    FILES[@]}; do
    FULL_SCRIPT+="-loop 1 -i '${IMAGE}' "
done
......
Common operations
[${c}:v]  # 对某个进行输入流进行操作

setpts=PTS-STARTPTS  # 时间戳从0开始

scale=w='if(gte(iw/ih,${WIDTH}/${HEIGHT}),min(iw,${WIDTH}),-1)':h='if(gte(iw/ih,${WIDTH}/${HEIGHT}),-1,min(ih,${HEIGHT}))'  # 进行缩放,宽高比大于或等于目标宽高比 ${WIDTH}/${HEIGHT},则将宽度调整为 ${WIDTH},高度根据宽高比进行自适应;否则,将高度调整为 ${HEIGHT},宽度根据宽高比进行自适应。

scale=trunc(iw/2)*2:trunc(ih/2)*2  # 将图像的宽度和高度调整为偶数值

pad=width=${WIDTH}:height=${HEIGHT}:x=(${WIDTH}-iw)/2:y=(${HEIGHT}-ih)/2:color=#00000000  # 填充达到指定的宽高(${WIDTH}和${HEIGHT}),填充颜色为透明黑色(#00000000)

setsar=sar=1/1  # 设置样纵横比为1:1,确保输出图像的纵横比与原始图像一致

crop=${WIDTH}:${HEIGHT}  # 一个视频处理操作,用于裁剪视频帧的尺寸。

concat=n=3:v=1:a=0  # 连接三个视频片段,只包含视频流而不包含音频流

scale=${WIDTH}*5:-1  # 将连接后的视频流缩放为 ${WIDTH}*5 的宽度,高度根据比例自动调整

zoompan=z='min(pzoom+0.001*${ZOOM_SPEED},2)':d=1:${POSITION_FORMULA}:fps=${FPS}:s=${WIDTH}x${HEIGHT}  # min(pzoom+0.001*${ZOOM_SPEED},2) 控制了缩放的速度;${ZOOM_SPEED} 是之前定义的缩放速度变量

[stream$((c+1))]  # 得到经过缩放/填充的图像流

2.5 Input stream splicing

for (( c=1; c<${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[stream${c}overlaid][stream$((c+1))blended]"
done

3. Image manipulation

Directly print the image stream to jpg/png files through map.

The template format is as follows:

ffmpeg -i 'the path of images' -filter_complex "the operations of ffmpeg" -map "streamout" output1.jpg

For example:

ffmpeg -i '../images/aaa.jpg' -filter_complex "scale=w='if(gte(iw/ih,576/1024),-1,576)':h='if(gte(iw/ih,576/1024),1024,-1)',crop=576:1024,setsar=sar=1/1,fps=30,format=rgba,split=2[stream1out1][stream1out2]" -map "[stream1out1]" output1.jpg -map "[stream1out2]" output2.jpg

4. Video operation (take blurred_background.sh as an example)

# 5张图片为例,总时长 5*2+5-1=14s
WIDTH=1280
HEIGHT=720
FPS=30
TRANSITION_DURATION=1
IMAGE_DURATION=2
# FILE OPTIONS
# FILES=`find ../media/*.jpg | sort -r`             # USE ALL IMAGES UNDER THE media FOLDER SORTED
# FILES=('../media/1.jpg' '../media/2.jpg')         # USE ONLY THESE IMAGE FILES
FILES=`find ../media/*.jpg`                         # USE ALL IMAGES UNDER THE media FOLDER

let IMAGE_COUNT=0
for IMAGE in ${
    
    FILES[@]}; do (( IMAGE_COUNT+=1 )); done

if [[ ${
    
    IMAGE_COUNT} -lt 2 ]]; then
    echo "Error: media folder should contain at least two images"
    exit 1;
fi
# 输出 图像数量/过渡时间/总时间
TRANSITION_FRAME_COUNT=$(( TRANSITION_DURATION*FPS ))
IMAGE_FRAME_COUNT=$(( IMAGE_DURATION*FPS ))
TOTAL_DURATION=$(( (IMAGE_DURATION+TRANSITION_DURATION)*IMAGE_COUNT - TRANSITION_DURATION ))
TOTAL_FRAME_COUNT=$(( TOTAL_DURATION*FPS ))

echo -e "\nVideo Slideshow Info\n------------------------\nImage count: ${
    
    IMAGE_COUNT}\nDimension: ${
    
    WIDTH}x${
    
    HEIGHT}\nFPS: ${
    
    FPS}\nImage duration: ${
    
    IMAGE_DURATION} s\n\
Transition duration: ${
    
    TRANSITION_DURATION} s\nTotal duration: ${
    
    TOTAL_DURATION} s\n"

# SECONDS 是一个内置的 Bash 变量
START_TIME=$SECONDS
所有的命令都是追加到 FULL_SCRIPT 中。
# 1. START COMMAND
FULL_SCRIPT="conda run -n py10 ffmpeg -y "  # 需要修改为自己的环境
# 2. ADD INPUTS
# -loop 1 表示将图像文件循环播放
# -i '${IMAGE}' 参数指定输入的图像文件路径
for IMAGE in ${
    
    FILES[@]}; do
    FULL_SCRIPT+="-loop 1 -i '${IMAGE}' "
done

-filter_complex parameter to process multiple input streams and generate one or more output streams.
In this script, the content after the -filter_complex parameter will be used to define a series of filter operations, including image scaling, cropping, setting frame rate, etc.

# 3. START FILTER COMPLEX
FULL_SCRIPT+="-filter_complex \""

Inside the loop, ${c}:v represents the video stream index of the current image. This input stream (each image is an input stream) is then processed by applying a series of filter operations, including scaling the image, setting the pixel aspect ratio, converting to RGBA format, applying a blur effect, and setting the frame rate.
The specific filter operations are as follows:

  • scale= W I D T H x {WIDTH}x W I D T H x {HEIGHT}: Scale the image to the specified width and height.
  • setsar=sar=1/1: Set the pixel aspect ratio to 1:1 to avoid image distortion.
  • format=rgba: Convert the image format to RGBA format to support transparent channels.
  • boxblur=100: Apply a blur effect, using a blur radius of 100.
  • setsar=sar=1/1: Set the pixel aspect ratio to 1:1 again.
  • fps= FPS: Set the frame rate to the specified FPS value. After processing is complete, name the input stream stream {FPS}: Set the frame rate to the specified FPS value. After processing is completed, name the input stream streamFPS : Set the frame rate to the specified FPS value. After the processing is completed, the input stream is named s t re am ((c+1))blurred unique identifier, where (c+1) represents the index of the current image plus 1 so that the name is not repeated.
# 4. PREPARE BLURRED INPUTS
for (( c=0; c<${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[${c}:v]scale=${WIDTH}x${HEIGHT},setsar=sar=1/1,format=rgba,boxblur=100,setsar=sar=1/1,fps=${FPS}[stream$((c+1))blurred];"
done

The loop looped through each image input stream and performed the following operations for each input stream:

  1. Scale operation: Use the scale filter to adjust the width and height of the image to the specified target size WIDTH x {WIDTH}xW I D T H x {HEIGHT}. Conditional expressions are used here to determine the scaling method to ensure that the image fits the target size without deformation. If the image's aspect ratio is greater than or equal to the target aspect ratioWIDTH/{WIDTH}/W I D T H / {HEIGHT}, then the width is adjusted to ${WIDTH}, and the height is adapted according to the aspect ratio; otherwise, the height is adjusted to ${HEIGHT}, and the width is adapted according to the aspect ratio.
  2. Scale operation again: To ensure that the scaled image size is an even number, use the scale filter to round the width and height to the nearest even number.
  3. Set pixel aspect ratio: Use the setsar filter to set the pixel aspect ratio of the image to 1:1 to avoid abnormal stretching or squeezing in subsequent processing steps.
  4. Format conversion: Use the format filter to convert the pixel format of the image to RGBA format so that the transparency of the image can be correctly processed in subsequent processing and synthesis.
    The converted image input stream is named [stream$((c+1))raw], where $((c+1)) is the sequence number obtained by incrementing the loop index variable c. These prepared input streams will be used in subsequent processing steps.
# 5. PREPARE INPUTS
for (( c=0; c<${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[${c}:v]scale=w='if(gte(iw/ih,${WIDTH}/${HEIGHT}),min(iw,${WIDTH}),-1)':h='if(gte(iw/ih,${WIDTH}/${HEIGHT}),-1,min(ih,${HEIGHT}))',scale=trunc(iw/2)*2:trunc(ih/2)*2,setsar=sar=1/1,format=rgba[stream$((c+1))raw];"
done

In this part, the overlay operation is performed on the preprocessed blurred image and the scaled image.
I looped through the blurred and scaled images of each image and stacked them together using the overlay filter. Parameters for the overlay operation include:

  • overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2: Superimpose the blurred image on the scaled image so that they are centered in the screen.
  • format=rgb: Set the pixel format of the superimposed image to RGB format.
    The superimposed image is named [stream cout 1], which represents the output of the superimposed c-th image. In addition, the split filter is used to split the output stream of each image into two streams [stream {c}out1], which represents the output of the c-th image after superposition. In addition, a split filter is used to split the output stream of each image into two streams [streamc o u t 1 ] , represents the output after superposition of the c -th image. In addition, a split filter is used to split the output stream of each image into two streams [ s t re am { c}out1] and [stream cout 2] for use in subsequent processing steps. Through this loop, two output streams are produced for each image, [stream{c}out2], for use in subsequent processing steps. Through this loop, each image produces two output streams, where [streamc o u t 2 ] for use in subsequent processing steps. Through this loop, two output streams are generated for each image, where [ s t re am {c}out1] is the superimposed image and [stream${c}out2] is the original scaled image.
# 6. OVERLAY BLURRED AND SCALED INPUTS
for (( c=1; c<=${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[stream${c}blurred][stream${c}raw]overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2:format=rgb,setpts=PTS-STARTPTS,split=2[stream${c}out1][stream${c}out2];"
done

In this part, fill the superimposed image.
Loop through the stacked output stream for each image and pad the image using a pad filter. Parameters for the fill operation include:

  • width= W I D T H : h e i g h t = {WIDTH}:height= WIDTH:height= {HEIGHT}: Set the width and height of the filled image to the specified value.
  • x=( W I D T H − i w ) / 2 : y = ( {WIDTH}-iw)/2:y=( WIDTHiw)/2:y=( {HEIGHT}-ih)/2: Calculate the position of the padding so that the image is centered in the screen.
  • trim=duration=${IMAGE_DURATION}: Set the duration of the image to IMAGE_DURATION, which is the length of time the image is displayed.
  • select=lte(n,${IMAGE_FRAME_COUNT}): Select the frame number range of the image to ensure that the image is only displayed within the specified frame number range.
    Depending on the location and number of images, different judgments and processing are also performed:
  • If it is the first image and the number of images is greater than 1, an additional output stream [stream${c}ending] is generated to represent the end of the transition effect.
  • If this is the last image, merge the beginning of the transition effect into the image's output stream [stream${c}starting].
  • If it is an intermediate image, split the starting and ending parts of the transition effect into two output streams [stream cstarting] and [stream {c}starting] and [streamc s t a r t in g ] and [ s t re am {c}ending].
    Through this loop, the overlay output stream of each image is filled, and the corresponding output stream is generated to represent the overlay effect and transition effect of the image.
# 7. APPLY PADDING
for (( c=1; c<=${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[stream${c}out1]pad=width=${WIDTH}:height=${HEIGHT}:x=(${WIDTH}-iw)/2:y=(${HEIGHT}-ih)/2,trim=duration=${IMAGE_DURATION},select=lte(n\,${IMAGE_FRAME_COUNT})[stream${c}overlaid];"
    if [[ ${
    
    c} -eq 1 ]]; then
        if  [[ ${
    
    IMAGE_COUNT} -gt 1 ]]; then
            FULL_SCRIPT+="[stream${c}out2]pad=width=${WIDTH}:height=${HEIGHT}:x=(${WIDTH}-iw)/2:y=(${HEIGHT}-ih)/2,trim=duration=${TRANSITION_DURATION},select=lte(n\,${TRANSITION_FRAME_COUNT})[stream${c}ending];"
        fi
    elif [[ ${
    
    c} -lt ${
    
    IMAGE_COUNT} ]]; then
        FULL_SCRIPT+="[stream${c}out2]pad=width=${WIDTH}:height=${HEIGHT}:x=(${WIDTH}-iw)/2:y=(${HEIGHT}-ih)/2,trim=duration=${TRANSITION_DURATION},select=lte(n\,${TRANSITION_FRAME_COUNT}),split=2[stream${c}starting][stream${c}ending];"
    elif [[ ${
    
    c} -eq ${
    
    IMAGE_COUNT} ]]; then
        FULL_SCRIPT+="[stream${c}out2]pad=width=${WIDTH}:height=${HEIGHT}:x=(${WIDTH}-iw)/2:y=(${HEIGHT}-ih)/2,trim=duration=${TRANSITION_DURATION},select=lte(n\,${TRANSITION_FRAME_COUNT})[stream${c}starting];"
    fi
done

In this part, create transition frames.
Loop through each pair of adjacent images, creating frames with a transition effect. For each pair of adjacent images, including the current image and the next image:

  • Use the blend filter to blend the two images to generate a transition frame. Parameters for the blending operation include:
    • all_expr=‘A*(if(gte(T, T R A N S I T I O N D U R A T I O N ) , {TRANSITION_DURATION}), TRANSITIONDURATION),{TRANSITION_DURATION},T/ T R A N S I T I O N D U R A T I O N ) ) + B ∗ ( 1 − ( i f ( g t e ( T , {TRANSITION_DURATION}))+B*(1-(if(gte(T, TRANSITIONDURATION))+B(1(if(gte(T,{TRANSITION_DURATION}), T R A N S I T I O N D U R A T I O N , T / {TRANSITION_DURATION},T/ TRANSITIONDURATION,T / {TRANSITION_DURATION})))': Mix two input images through an expression, where A represents the current image and B represents the next image. According to the change of time T, the weight of the two images is controlled to achieve the transition effect.
    • select=lte(n,${TRANSITION_FRAME_COUNT}): Select the frame number range of the transition frame to ensure that the transition frame is only generated within the specified frame number range.
      Through this loop, an output stream of transition frames is created for each pair of adjacent images, used to represent the transition effect between the two images.
# 8. CREATE TRANSITION FRAMES
for (( c=1; c<${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[stream$((c+1))starting][stream${c}ending]blend=all_expr='A*(if(gte(T,${TRANSITION_DURATION}),${TRANSITION_DURATION},T/${TRANSITION_DURATION}))+B*(1-(if(gte(T,${TRANSITION_DURATION}),${TRANSITION_DURATION},T/${TRANSITION_DURATION})))',select=lte(n\,${TRANSITION_FRAME_COUNT})[stream$((c+1))blended];"
done

In this part, the splicing of video clips begins.
Loop through each pair of adjacent images, and combine the overlay output stream of the previous image (stream coverlaid) and the transition frame output stream of the current image (stream {c}overlaid) with the transition frame output stream of the current image (streamc o v er l ai d ) is connected to the transition frame output stream of the current image ( s t re am ((c+1))blended) for splicing video clips.
Through this loop, for each pair of adjacent images an input stream is created that connects them and is used to splice the video clips between them.

# 9. BEGIN CONCAT
for (( c=1; c<${
    
    IMAGE_COUNT}; c++ ))
do
    FULL_SCRIPT+="[stream${c}overlaid][stream$((c+1))blended]"
done

In this part, the splicing of video clips is completed.
Connect the overlay output stream of the last image (stream${IMAGE_COUNT}overlaid) with the output streams of all previous connected transition frames to splice all video clips.
Through this step, the final video stream (video) is generated, which contains the overlay and transition effects of all images.

# 10. END CONCAT
FULL_SCRIPT+="[stream${IMAGE_COUNT}overlaid]concat=n=$((2*IMAGE_COUNT-1)):v=1:a=0,format=yuv420p[video]\""

In this last part, the video stream (video) is mapped to the output file and the format and encoding parameters of the output file are specified.
Set the path to the output file to .../advanced_blurred_background.mp4 and use the libx264 encoder for video encoding.
Other parameters are as follows:

  • -vsync 2: Set the audio and video synchronization mode to "passthrough".
  • -async 1: Set the audio synchronization mode to "audio first" to ensure that audio and video are synchronized.
  • -rc-lookahead 0: Disable prediction in rate control.
  • -g 0: Disable the GOP (Group of Pictures) size limit.
  • -profile:v main -level 42: Set the profile and level of video encoding.
  • -c:v libx264: Specify the video encoder as libx264.
  • -r FPS: Set the frame rate of the output video {FPS}: Set the frame rate of the output videoFPS : Set the frame rate of the output video to {FPS} (optional, add as needed).
    Finally, FULL_SCRIPT contains the complete FFmpeg command used to generate the final video file.
# 11. END
# FULL_SCRIPT+=" -map [video] -vsync 2 -async 1 -rc-lookahead 0 -g 0 -profile:v main -level 42 -c:v libx264 -r ${FPS} ../advanced_blurred_background.mp4"
FULL_SCRIPT+=" -map [video] -vsync 2 -async 1 -rc-lookahead 0 -g 0 -profile:v main -level 42 -c:v libx264 ../advanced_blurred_background.mp4"

#FULL_SCRIPT+=" -map [video] -async 1 -rc-lookahead 0 -g 0 -profile:v main -level 42 -c:v libx264 -r ${FPS} ../advanced_blurred_background.mp4"

eval ${
    
    FULL_SCRIPT}

ELAPSED_TIME=$(($SECONDS - $START_TIME))

echo -e '\nSlideshow created in '$ELAPSED_TIME' seconds\n'

unset $IFS

Guess you like

Origin blog.csdn.net/qq_44824148/article/details/128842259