python+pygame+opencv+gpt realizes virtual digital human live broadcast (interesting exploration)

AI technology is advancing by leaps and bounds, constantly changing people's work and life. As an emerging form, digital human live broadcast will definitely become a future trend, with huge, broad and amazing market prospects. It will continue to integrate innovative technologies and cross-border cooperation to provide more personalized and diversified interactive experiences, becoming a trend in the future.

Preface

Musk said: "Artificial intelligence will play a very profound role in the future of human evolution and civilization that we see. In the future we will have a large number of robots, and by then, global production efficiency will increase to incredible levels. Level." The robot can complete actions such as walking, going up and down stairs, squatting, and picking up items. It also has the ability to protect itself and the people around it. In the future, it can also cook, mow lawns, help take care of the elderly, or work in factories. It replaces humans in boring and dangerous tasks.

At the Nishan Dialogue on Digital Civilization at the World Internet Conference not long ago, Zhang Yong, chairman and CEO of Alibaba Group and chairman and CEO of Alibaba Cloud Intelligence Group, also said that the development of AI will bring more job opportunities. In the new era of intelligence, all industries deserve to be redone based on artificial intelligence technology. ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

It’s just that while efficiency is improving, there is also a close game between humans and AI. Under the influence of the AIGC era, people can already use AI products such as ChatGPT, StableDiffusion and Mid-journey at very low costs. Many positions are facing the risk of substitution, such as text workers, painters, advertising designers, and even anchors with goods.

Although some companies on the Internet have implemented virtual digital human technology, the fees are high. Here we use python to try to play virtual digital human live broadcast, as an interesting exploration and low-cost implementation. If the exploration is feasible, you can create a virtual digital human live broadcast for fun, and hang it on your own Douyin to be online 24 hours a day. Wouldn’t it be wonderful?

Virtual digital human live broadcast implemented in python

In addition to virtual digital human live broadcasts using AI technology, the emergence of new ways of playing such as AR shopping guides, virtual try-ons, virtual anchors, and 3D model rooms has achieved a comprehensive acceleration of live broadcast e-commerce in terms of viewing experience, live broadcast efficiency, and commercial value. In different application scenarios, artificial intelligence and live anchors can complement each other and provide consumers with a diverse viewing experience. For example, artificial intelligence's language processing can understand and respond to users' questions and needs more quickly, while live anchors can Communicate emotionally with users during the live broadcast and shorten the distance between the live broadcast room and the users. 

The amazing advantages of digital live streaming

Using the digital human live broadcast new media platform, the digital human live broadcast can be broadcast at any time 24 hours a day and automatically bring goods.

Digital human live streaming helps local lifestyle businesses achieve explosive growth.

Digital human live broadcast deeply restores the image of real people and eliminates the problem of appearing on camera.

The output of short videos produced by digital people’s one-click input of 3S live broadcast copywriting has increased exponentially.

Digital human live broadcast can use the GPT interactive function. The content generated by GPT live broadcast is automatically interactive, and the replies are not the same.

Using artificial intelligence voice interaction technologies such as Wen Xinyiyan, iFlytek Spark or chatGPT, it is possible to make digital human live broadcasts truly practical.

Technical solution implemented in Python

Python can be used to implement virtual digital humans with the ability to animate and speak.

A feasible technical solution to explore:

1. Character modeling and animation: Using computer graphics technology, you can use Python libraries such as Pygame, Pyglet, OpenGL, etc. to create 3D models of characters and add animation effects to them. Character models can be created using 3D modeling software such as Blender and scripted using Python to control the model's animation.

2. Speech synthesis: Use Python libraries such as pyttsx3, gTTS, etc. to convert text into speech. These libraries provide APIs to input text and generate corresponding speech output.

3. Dialogue system: Using Python’s natural language processing (NLP) and machine learning technology, a dialogue system can be built to enable virtual digital humans to understand and generate natural language dialogues. You can use NLP libraries such as NLTK, SpaCy, etc. to process natural language, and use machine learning libraries such as TensorFlow, PyTorch, etc. to train dialogue models.

4. User interface: Using Python's GUI libraries such as Tkinter, PyQt, etc., you can create a user interface that enables users to interact with virtual digital humans. Animations of virtual characters can be displayed on the interface and text boxes or voice input can be provided to talk to them.

An Introduction

1. Character modeling and animation: Use computer graphics technology to create 2D or 3D models of characters, which can be modeled using modeling software such as Blender. Design the character's different facial expressions and movements into different image frames or animation sequences. Either record it yourself, or edit the required character videos and picture materials from the Internet.

2. Use of the sprite class: - Create a virtual digital human that inherits from pygame.sprite.Sprite. In a virtual digital human, use pygame.image.load() to load the character's image frames or animation sequences. Use the pygame.Surface.blit() method to draw the current image frame on the screen.

3. Facial expression and action switching: - In virtual digital humans, define methods to switch the character's facial expression and action. - Use pygame.time.set_timer() to trigger the switching of expressions and actions regularly and create a timer event.

4. Pronunciation and speech synthesis: - Use Python's speech synthesis library, such as pyttsx3, gTTS, etc., or the speech interfaces of Baidu, iFlytek, etc. to convert text into speech. - Define methods to trigger the virtual digital human's pronunciation and play the corresponding speech as needed.

5. User interaction: - Create a pygame window to display virtual digital people and interact with the user. Use pygame.event.get() to listen to user events, such as keyboard input or mouse clicks. - Based on the user's input, call the corresponding method to switch the character's facial expressions, movements and pronunciation. Let the user's chat content text be sent to GPT, form the content and reply to the user through text-to-speech, while matching different actions and expressions.

environment dependence

1. Download and install python3, Python official website: Welcome to Python.org

2. Install dependent modules: pygame, pygame-pgu, opencv, rembg

pip install rembg -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install opencv-python

3. Replace the resource image installed by pip (otherwise, downloading the module will be very slow)

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
4. Pictures and voice materials

5. iFlytek offline speech synthesis tool

Material production

You can use 3D modeling software (such as Blender) to create character models. For the sake of simplicity, some materials are downloaded from online searches and are only used for learning and research purposes. If there is any infringement, please contact me.

Rembg is a picture background removal tool. Here I will briefly introduce its features.

  • Open source, free

  • Developed based on Python

  • The background engine is U²-Net, a deep network architecture used for salient object detection (a brief introduction below)

  • easy installation

Background picture

People pictures

 

Code

Load image background

#背景实现
class BackGround(pygame.sprite.Sprite):
    def __init__(self):
        super().__init__()
        self.image = pygame.image.load('./image/background.png').convert()
        self.image = pygame.transform.scale(self.image, (WIDTH, HEIGHT))
        self.rect = self.image.get_rect()
        self.ready_to_move = 0
        self.index = 0

    def update(self, *args):
        pass

Extract video picture frames

Extract video picture frames from a video.

In pygame, Sprite sprites usually use images to achieve animation effects. Although pygame itself does not directly support loading and playing video files in mp4 format, the function of loading and playing videos can be achieved through some additional libraries. A commonly used library is moviepyPython, which is a Python library for video editing and processing. You can use moviepythe library to convert mp4 video files into a series of image frames, and then use these image frames to create sprite animations.

Here's an example:

import pygame
from moviepy.editor import VideoFileClip

# 加载mp4视频并提取图像帧
video = VideoFileClip("animation.mp4")
frames = [pygame.image.fromstring(video.get_frame(t), video.size, "RGB") for t in range(0, int(video.duration*video.fps))]

# 初始化pygame
pygame.init()
screen = pygame.display.set_mode(video.size)

# 创建Sprite精灵对象
class AnimatedSprite(pygame.sprite.Sprite):
    def __init__(self, frames):
        super().__init__()
        self.frames = frames
        self.current_frame = 0
        self.image = self.frames[self.current_frame]
        self.rect = self.image.get_rect()

    def update(self):
        self.current_frame = (self.current_frame + 1) % len(self.frames)
        self.image = self.frames[self.current_frame]

# 创建精灵对象并添加到精灵组
sprite = AnimatedSprite(frames)
sprite_group = pygame.sprite.Group(sprite)

clock = pygame.time.Clock()

# 游戏循环
running = True
while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    sprite_group.update()

    screen.fill((0, 0, 0))
    sprite_group.draw(screen)
    pygame.display.flip()
    clock.tick(30)

pygame.quit()

Extract video frames and save them as images in png format while removing the background:

# 加油动作手势
    def comeOn(self):
        # 加载mp4视频并提取图像帧
        video = VideoFileClip("./mp4/1.mp4")
        # Get the video dimensions
        video_width, video_height = video.size
        # 创建一个pygame surface
        #surface = pygame.Surface((video_width, video_height))
        #frames = [pygame.image.fromstring(np.array(video.get_frame(t)).tobytes(), video.size, "RGB") for t in range(0, int(video.duration*video.fps))]
        # Create a pygame surface with alpha channel
        surface = pygame.Surface((video_width, video_height), pygame.SRCALPHA)

        # Convert each frame of the video to an image with transparency
        frames = []
        for t in range(int(video.duration * video.fps)):
            frame = video.get_frame(t)
            pygame.surfarray.blit_array(surface, frame.swapaxes(0, 1))
            # 将帧保存为PNG图像
            image = pygame.surfarray.array3d(surface).swapaxes(0, 1)
            image = np.uint8(image)
            pil_image = Image.fromarray(image)
            #去除图片的白色背景
            image = remove(pil_image)
            #保存png图片
            image.save(f"framet_{t}.png")
            # Convert PIL Image to pygame surface
            pygame_image = pygame.image.fromstring(image.tobytes(), image.size, image.mode).convert_alpha()
            
            frames.append(pygame_image)
        self.frames = frames
        print("frames count:"+str(len(self.frames)))
        self.current_frame = 0
        self.image = self.frames[self.current_frame]
        self.state = "comeOn"
        self.start = time.time()

You can also use opencv to extract picture frames from mp4 videos, code example:

import cv2

# 打开视频文件
video = cv2.VideoCapture('2.mp4')

# 设置帧计数器
frame_count = 0

while True:
    # 读取视频的每一帧
    ret, frame = video.read()

    # 如果没有读到帧,说明视频已经结束
    if not ret:
        break

    # 保存帧图片
    cv2.imwrite(f'output/frame_{frame_count}.jpg', frame)

    # 帧计数器自增
    frame_count += 1

# 释放视频对象
video.release()

Implementation process

# -*- coding: utf-8 -*-
# @Author : yangyongzhen
# @Email : [email protected]
# @File : mqttclienttool.py
# @Project : study
import pygame
from moviepy.editor import VideoFileClip
import numpy as np
import time
from PIL import Image
from rembg import remove
import cv2

# 常量 屏幕大小
WIDTH, HEIGHT = 500, 900
print(cv2.__version__)
# 初始化操作
pygame.init()
pygame.mixer.init()
# 创建窗口
screen = pygame.display.set_mode((WIDTH, HEIGHT))

# 设置窗口标题
pygame.display.set_caption('虚拟数字人--关注作者:blog.csdn.net/qq8864')

# 添加背景音乐
pygame.mixer.music.load('./sound/bgLoop.wav')
pygame.mixer.music.set_volume(0.5)  # 音量
#pygame.mixer.music.play(-1, 0)
# 添加系统时钟
FPS = 30
clock = pygame.time.Clock()
# 创建用户自定义事件,每隔1000毫秒触发一次事件
USER_EVENT = pygame.USEREVENT
pygame.time.set_timer(USER_EVENT, 1000)

# 加载字体文件
font_path = "./font/SIMYOU.ttf"  # 替换为你的字体文件路径
font_size = 24
font = pygame.font.Font(font_path, font_size)
       
# ========虚拟数字人主角==========
# class Hero(pygame.sprite.Sprite)
# class BackGround(pygame.sprite.Sprite)
# 虚拟人主角 (静默状态保持微笑和3秒眨一次眼睛)
class VirtualMan(pygame.sprite.Sprite):
    def __init__(self, speed):
        super().__init__()
        self.image = pygame.image.load('./image/man.png')
        self.image_index = 0
        self.readt_to_change = 0
        self.rect = self.image.get_rect()
        self.rect.width *= 0.5
        self.rect.height *= 0.5
        self.image = pygame.transform.scale(self.image, (self.rect.width, self.rect.height))
        self.rect.x, self.rect.y = 0, 100
        self.speed = speed
        self.frames = None
        self.current_frame = 0
        self.state = "idle"

    def update(self, *args):
        #这里用上下左右、空格几个按键来测试虚拟数字人的不同动作
        keys = pygame.key.get_pressed()
        if keys[pygame.K_UP]:
            #加油动作手势
            self.comeOn()
        if keys[pygame.K_DOWN]:
            #欢迎动作手势
            self.welcome()
        if keys[pygame.K_LEFT]:
            #说话动作和表情
            self.say()
        if keys[pygame.K_RIGHT]:
            #停下来
            self.stop()
        if keys[pygame.K_SPACE]:
            #欢迎动作手势
            self.goodbye()
            
        if self.state == "comeOn":   
               self.current_frame = (self.current_frame + 1) % len(self.frames)
               self.image = pygame.transform.scale(self.frames[self.current_frame], (self.rect.width, self.rect.height))
               #print("current_frame:"+str(self.current_frame))
               if self.current_frame == 0:
                   self.frames.clear()
                   self.state = "idle"
                   print("idle")
                   self.end = time.time()
                   print("time:"+str(self.end - self.start))
                   img = pygame.image.load('./image/man.png')
                   self.image = pygame.transform.scale(img, (self.rect.width, self.rect.height))
                   pass
            
    # 加油动作手势
    def comeOn(self):
        # 加载mp4视频并提取图像帧
        #video = VideoFileClip("./mp4/1.mp4")
        frames = []
        '''
        video = cv2.VideoCapture("./mp4/1.mp4")
        # 设置帧计数器
        frame_count = 0
        while True:
            # 读取视频的每一帧
            ret, frame = video.read()
            # 如果没有读到帧,说明视频已经结束
            if not ret:
                break
            # 保存帧图片
            #cv2.imwrite(f'output/frame_{frame_count}.png', frame)
            #OpenCV转换成PIL.Image格式
            pil_image = Image.fromarray(cv2.cvtColor(frame,cv2.COLOR_BGR2RGB))
            image = remove(pil_image)
            image.save(f"framet_{frame_count}.png")
            # Convert PIL Image to pygame surface
            pygame_image = pygame.image.fromstring(image.tobytes(), image.size, image.mode).convert_alpha()
            
            frames.append(pygame_image)
            # 帧计数器自增
            frame_count += 1
        # 释放视频对象
        video.release()
        '''
        for i in range(0,75):
            img = pygame.image.load(f"./doc/img2/framet_{i}.png")
            frames.append(img)
        self.frames = frames
        print("frames count:"+str(len(self.frames)))
        self.current_frame = 0
        self.image = self.frames[self.current_frame]
        self.state = "comeOn"
        self.start = time.time()
    #再见动作手势
    def goodbye(self):
        pass

    #欢迎动作手势
    def welcome(self):
        pass
    #停止所有动作
    def stop(self):
        pass
    #开始说话
    def say(self):
        pass
        #sound = pygame.mixer.Sound('./sound/nihao.wav')
        #sound.play()

#背景
class BackGround(pygame.sprite.Sprite):
    def __init__(self):
        super().__init__()
        self.image = pygame.image.load('./image/background.png').convert()
        self.image = pygame.transform.scale(self.image, (WIDTH, HEIGHT))
        self.rect = self.image.get_rect()
        self.ready_to_move = 0
        self.index = 0

    def update(self, *args):
        pass


# 初始化精灵组
bg_sprite = pygame.sprite.Group()
man_sprite = pygame.sprite.Group()

# 定义人物
man = VirtualMan(4)
man_sprite.add(man)

bg1 = BackGround()
bg_sprite.add(bg1)
# 保持游戏运行状态(游戏循环)
while True:
    # ===========游戏帧的刷新===========
    clock.tick(FPS)
    #print("Runtime:", pygame.time.get_ticks(), "ms")
    # 检测事件
    for event in pygame.event.get():
        # 检测关闭按钮被点击的事件
        if event.type == pygame.QUIT:
            # 退出
            pygame.quit()
            exit()
        if event.type == USER_EVENT:
           man.say()
           pass
        else:
            try:
                pass
            except Exception as e:
                print(e)
        
    # screen.fill((0,0,0))
    for group in [bg_sprite, man_sprite]:
        group.update()
        group.draw(screen)
    #screen.fill((0,0,0))    #生成一个屏幕  
    pygame.display.flip()
    #pygame.display.update()
    #app.paint()             #将pgu容器的内容画出

Other resources 

[Code Cutout] 4 lines of Python code to help you remove picture background - Zhihu

[Python] Recommend three interesting image processing libraries_python rembg_Zhao Zhuofan's blog-CSDN blog

Baidu security verification

No Photoshop required! Rembg: automatic image background removal tool_Yandao Jiumozhi's blog-CSDN blog

Digital people are pouring into live broadcast rooms. What is the future of virtual anchors?

Guangsheng Information|Virtual digital human live broadcast is hot and has broad prospects but still needs improvement_Progress_User_Questions

[Python] Recommend three interesting image processing libraries_python rembg_Zhao Zhuofan's blog-CSDN blog

Python Automation: An AI-based automatic picture background removal software-Motianlun

rembg model library placement location setting_Sangyu Xiaowu's blog-CSDN blog

Evaluation of the use of Matting library rembg - Zhihu

Detailed explanation of Python OpenCV_pythonopencv-CSDN blog

A comprehensive basic overview of Python+OpenCV computer vision (Part 1) - Zhihu

Py's cv2: The introduction, installation, usage (common functions, methods, etc.) of the cv2 library (OpenCV, opencv-python) and the strongest detailed guide_Let it come naturally~'s blog-CSDN blog

OpenCV video operation · OpenCV-Python self-coding for beginners · Watch the cloud

おすすめ

転載: blog.csdn.net/qq8864/article/details/133383638