Create a fake webcam for your online meeting using Python

Let's imagine. You're in an online meeting, and for some reason, you don't want to turn on the camera. But if you see everyone else turning it on, you think you have to too, so quickly do your hair, make sure you're well-dressed, and reluctantly turn on the camera. We've all been through this.

There is good news. With the help of Python, the camera is no longer forcibly turned on. Will show you how to create a fake webcam for your online meeting like this:

a2ee4065ea9b705d08e66ec59ccec36e.gif d82f44ce06650e6dbc7ec36e1d324c2e.gif aadc4345edf8cb12b0c7c494aa571a2a.gif

Of course, this face doesn't have to be Bill Gates's, it can be yours too.

I will now show you how to write code in Python. At the end of the article, I will explain how to use this fake camera for myself.

Create a simple fake webcam

First, we will import some modules, especially openCV.

import cv2
import numpy as np
import pickle
import pyaudio
import struct
import math
import argparse
import os

Next we will create a function to extract all frames from the video:

def read_frames(file, video_folder):
    frames = []
    cap = cv2.VideoCapture(os.path.join('videos', video_folder, file))
    frame_rate = cap.get(cv2.CAP_PROP_FPS)
    if not cap.isOpened():
        print("Error opening video file")
    while cap.isOpened():
        ret, frame = cap.read()
        if ret:
            frames.append(frame)
        else:
            break
    cap.release()
    return frames, frame_rate

Now that we have the frames, we can create a loop to display them one after the other. When the last frame is reached, we play the video backwards, then when we reach the first frame, we play forward, and we repeat the process forever. This way there is no sudden transition from the last frame to the first. We will also do this so that we can press "q" to stop the webcam.

frames, frame_rate = read_frames('normal.mov', 'bill_gates')

def next_frame_index(i, reverse):
    if i == len(frames) - 1:
        reverse = True
    if i == 0:
        reverse = False
    if not reverse:
        i += 1
    else:
        i -= 1
    return i, reverse


rev = False
i = 0
while True:
    frame = frames[i]
    cv2.imshow('Webcam', frame)
    pressed_key = cv2.waitKey(int(1000/frame_rate)) & 0xFF
    if pressed_key == ord("q"):
        break
    i, rev = next_frame_index(i, mode, rev)

With this, we have a simple webcam that plays seamlessly.

ae7978558760649526c6215382cb0ae3.gif

But we don't stop there.

add different modes

It would be more convincing if our fake webcam avatar could do more than just passively stare. For example, sometimes in a meeting you need to nod your head in agreement, smile, talk, or do something else.

So we want our webcam to have multiple "modes" that we can switch between at any time by pressing a key on the keyboard.

To do this, you'll need to make a short recording for each mode, such as one where you're just smiling. Then we can read frames from each video and store them in a dictionary. When we detect a key press (e.g. "s" to switch to "smile mode"), we change the active mode to the new mode and start playing the frame in the corresponding video.

video_files = [file for file in os.listdir(os.path.join('videos', folder))
               if file not in ['transitions_dict.p', '.DS_Store']]
frames, frame_rates = {}, {}

for file in video_files:
    mode_name = file.split('.')[0]
    frames[mode_name], frame_rates[mode_name] = read_frames(file, folder)
modes = list(frames.keys())
commands = {mode[0]: mode for mode in modes if mode != 'normal'}

mode = "normal"
frame_rate = frame_rates[mode]
rev = False
i = 0
while True:
    frame = frames[mode][i]
    cv2.imshow('Webcam', frame)
    pressed_key = cv2.waitKey(int(1000/frame_rate)) & 0xFF
    if pressed_key == ord("q"):
        break
    for command, new_mode in commands.items():
        if pressed_key == ord(command):
            i, mode, frame_rate = change_mode(mode, new_mode, i)
    i, rev = next_frame_index(i, mode, rev)

By default, this is done to switch to the specified mode, and the key command is the first letter of the mode name. Right now I'm treating this 'change_mode' function as a black box, and I'll explain it later.

3fa04038727ab0d08e2acaca7b718c0a.gif

Optimize transition

So we want to switch from one video to another, say from normal mode to nodding mode. How can I transition from one mode to another in the best possible way (i.e. the transition is as smooth as possible)?

When we make the transition, we want to move to the frame of the new mode that most closely resembles the one we're currently in.

To do this, we can first define a distance metric between images. A simple Euclidean distance is used here, which looks at the difference between each pixel of the two images.

With this distance, we can now find the image closest to our current and switch to this. For example, if we want to transition from normal to nod mode, and we're at frame 132 of the normal video, we'll know that we have to go to frame 86 of the nod video to get the smoothest transition.

We can precompute all these optimal transitions for each frame and from each mode to all other modes. This way we don't have to recalculate every time we want to switch modes. The images are also compressed so that the computation execution time is shorter. We will also store the best distance between images.

video_files = [file for file in os.listdir(os.path.join('videos', video_folder))
                       if file not in ['transitions_dict.p', '.DS_Store']]
frames = {}
for file in video_files:
    mode_name = file.split('.')[0]
    frames[mode_name] = read_frames(file, video_folder)
modes = list(frames.keys())

compression_ratio = 10
height, width = frames["normal"][0].shape[:2]
new_height, new_width = height // compression_ratio, width // compression_ratio, 

def compress_img(img):
    return cv2.resize(img.mean(axis=2), (new_width, new_height))

  
frames_compressed = {mode: np.array([compress_img(img) for img in frames[mode]]) for mode in modes}

transitions_dict = {mode:{} for mode in modes}

for i in range(len(modes)):
    for j in tqdm(range(i+1, len(modes))):
        mode_1, mode_2 = modes[i], modes[j]
        diff = np.expand_dims(frames_compressed[mode_1], axis=0) - np.expand_dims(frames_compressed[mode_2], axis=1)
        dists = np.linalg.norm(diff, axis=(2, 3))
        transitions_dict[mode_1][mode_2] = (dists.argmin(axis=0), dists.min(axis=0))
        transitions_dict[mode_2][mode_1] = (dists.argmin(axis=1), dists.min(axis=1))

pickle.dump(transitions_dict, open(os.path.join('videos', video_folder, 'transitions_dict.p'), 'wb'))

The "change_mode" function can now be shown, which retrieves the best frame to convert to from a precomputed dictionary. This is done so that if you press e.g. "s" to switch to smile mode, pressing it again will switch back to normal mode.

def change_mode(current_mode, toggled_mode, i):
    if current_mode == toggled_mode:
        toggled_mode = 'normal'

    new_i = transitions_dict[current_mode][toggled_mode][0][i]
    dist = transitions_dict[current_mode][toggled_mode][1][i]
    
    return new_i, toggled_mode, frame_rates[toggled_mode]

There is another improvement we can add to make our transitions more seamless, instead of always switching modes immediately, we wait a while for a better transition. For example, if our avatar is nodding, we can wait until the head passes the middle position before switching to normal mode. For this, we will introduce a time window (here I set it to 0.5 seconds) so that we will wait for the best time to transition within this window before switching modes.

switch_mode_max_delay_in_s = 0.5


def change_mode(current_mode, toggled_mode, i):
    if current_mode == toggled_mode:
        toggled_mode = 'normal'

    # Wait for the optimal frame to transition within acceptable window
    max_frames_delay = int(frame_rate * switch_mode_max_delay_in_s)
    global rev
    if rev:
        frames_to_wait = max_frames_delay-1 - transitions_dict[current_mode][toggled_mode][1][max(0, i+1 - max_frames_delay):i+1].argmin()
    else:
        frames_to_wait = transitions_dict[current_mode][toggled_mode][1][i:i + max_frames_delay].argmin()
    print(f'Wait {frames_to_wait} frames before transitioning')
    for _ in range(frames_to_wait):
        i, rev = next_frame_index(i, current_mode, rev)
        frame = frames[mode][i]
        cv2.imshow('Frame', frame)
        cv2.waitKey(int(1000 / frame_rate))

    new_i = transitions_dict[current_mode][toggled_mode][0][i]
    dist = transitions_dict[current_mode][toggled_mode][1][i]
    
    return new_i, toggled_mode, frame_rates[toggled_mode]

Now our transition is smoother. However, they can sometimes be obvious. So another idea is to purposely add freezes to the video, like those that can happen on unstable connections (that is, the video gets stuck if the network is unstable), and use them to mask the transitions (we will make the freezes The duration is proportional to the distance between the two images). We'll also add random freezes so patterns don't become apparent. So we added this new code:

# In the change_mode function:

    dist = transitions_dict[current_mode][toggled_mode][1][i]
    if freezes:
        freeze_duration = int(transition_freeze_duration_constant * dist)
        cv2.waitKey(freeze_duration)
    
    
# In the main loop:

    # Random freezes
    if freezes:
        if np.random.randint(frame_rate * 10) == 1:
            nb_frames_freeze = int(np.random.uniform(0.2, 1.5) * frame_rate)
            for _ in range(nb_frames_freeze):
                cv2.waitKey(int(1000 / frame_rate))
                i, rev = next_frame_index(i, mode, rev)

Using or not using these freezes is left as an option.

OK, now we've really covered the basics of these transitions. What else can we add to the webcam?

voice detection

Another interesting thing is adding voice detection so that when we speak, the "me" in the video speaks.

This is done using pyaudio. Thanks to this stackoverflow thread (https://stackoverflow.com/questions/4160175/detect-tap-with-pyaudio-from-live-mic).

Basically, the idea is to look at the average amplitude of the sound coming from the microphone over a period of time, and if it's high enough to think we've been talking. Originally this code was meant to detect tapping noise, but it also works fine for detecting speech.

AMPLITUDE_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 1
RATE = 44100
INPUT_BLOCK_TIME = 0.025
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)


def get_rms(block):
    count = len(block)/2
    format = "%dh" % count
    shorts = struct.unpack(format, block)

    sum_squares = 0.0
    for sample in shorts:
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n
    return math.sqrt( sum_squares / count )


pa = pyaudio.PyAudio()

stream = pa.open(format=FORMAT,
                 channels=CHANNELS,
                 rate=RATE,
                 input=True,
                 frames_per_buffer=INPUT_FRAMES_PER_BLOCK)


def detect_voice():
    error_count = 0
    voice_detected = False
    
    try:
        block = stream.read(INPUT_FRAMES_PER_BLOCK, exception_on_overflow=False)
    except (IOError, e):
        error_count += 1
        print("(%d) Error recording: %s" % (error_count, e))

    amplitude = get_rms(block)
    if amplitude > AMPLITUDE_THRESHOLD:
        voice_detected = True
    return voice_detected

Now we can add it to the main loop. This is done so that we need to detect no sound for a certain number of consecutive frames before switching back to normal mode so we don't switch too often.

# In the main loop:
  
  if voice_detection:
      if detect_voice():
          quiet_count = 0
          if mode != "talking":
              i, mode, frame_rate = change_mode(mode, "talking", i)
      else:
          if mode == "talking":
              quiet_count += 1
              if quiet_count > stop_talking_threshold:
                  quiet_count = 0
                  i, mode, frame_rate = change_mode(mode, "normal", i)

Now, when we speak through the microphone, we can have our avatar start and stop talking. I did this to activate or deactivate voice detection by pressing "v".

c3ed523cb2d1e7b5371f82af0366d501.gif

These are all the features implemented so far. Suggestions for further improvements are welcome.

How to use a fake webcam

First, download all the code from here: https://github.com/FrancoisLeRoux1/Fake-webcam

What you do is record some videos of your own (on my Mac, I used the Photo Booth app for this) and put them in a new folder inside the Videos folder. You'll be able to create different folders for different settings, for example, where you can wear different shirts, or make your hair look different.

These videos can and should be short (about 10 seconds of video), otherwise if you shoot longer videos, it may take a long time to calculate the optimal transition. You need a video called "normal", which will be your default mode.

Then, if you want your avatar to talk, you have to record a video called "talking" where you say random gibberish.

After this, you can record any other pattern you want (eg "smile", "nodding", "goodbye"...). By default, the command to activate/deactivate these modes will be the first letter of their name (for example, for "smile", press "s").

Then you have to calculate the optimal conversion. To do this, just run the script compute-transitions.py

0c33cfffa8d57d586448600ef3e48dea.png

This should take a few minutes.

Then when you're done, you can fire up your fake webcam. To do this, run the fake-webcam.py script. You need to specify the folder within "Videos" where the video is located. You can also specify whether to use Freeze.

a66e2e3806b2c488a5927de1d929c29b.png f0e6a773239896a824f4b34beb1145c3.png

So now you should get your fake camera up and running. Next, you can set it up as a webcam for online meetings. For this I used OBS: https://obsproject.com/

Select the correct Python window as the source and click Start Virtual Camera.

ddb5e53707d388de9eb8726c03b272e9.png

You should now be able to select this virtual camera as your webcam in your favorite online meeting app!

☆ END ☆

If you see this, it means you like this article, please forward and like it. Wechat search "uncle_pn", welcome to add the editor's Wechat "woshicver", and update a high-quality blog post in the circle of friends every day.

Scan QR code to add editor↓

0e936fe3f8aac45f1db2aec19cc35a67.jpeg

Guess you like

Origin blog.csdn.net/woshicver/article/details/126515965