An online video subtitle website based on the whisper model (continuously updated)

1. What is whisper

Whisper is an automatic speech recognition (ASR, Automatic Speech Recognition) system. OpenAI trained Whisper by collecting 680,000 hours of multilingual (98 languages) and multitask (multitask) supervised data from the Internet. OpenAI believes that using such a large and diverse dataset can improve recognition of accents, background noise, and technical jargon. In addition to being used for speech recognition, Whisper can also perform transcriptions in multiple languages ​​and translate those languages ​​into English.
This article mainly uses this model to do an audio recognition task, converting the audio in the video into text.

2. Project introduction

It mainly implements a Whisper-based video subtitle generation tool. Specifically, it uses the Flask lightweight WEB application framework to implement a WEB project with python as the back end and html as the front end. The function is relatively simple, that is, without subtitles Add subtitles to videos (only Chinese, English, Chinese and English are supported).
The website is as shown in the figure:
insert image description here
the operation is relatively simple. Click the upload file button to upload a local video file (mp4 or avi), and then click the submit file button, and the backend will start processing. (The front-end visualization processing process has not been realized yet)
Achievement effect:
Chinese:
insert image description here
English:
insert image description here
Chinese and English mixed:
insert image description here
the effect is not bad.

3. Project installation

Install the python environment required by the project

First of all, the python version must be above 3.9, otherwise an error will be reported. The other python dependencies are involved in the requirements.txt in the project and executed directly in the python3.9 environment:

pip install git+https://github.com/openai/whisper.git 
pip install -r requirements.txt

install imagemagick

  • Windows
    https://www.imagemagick.org/script/download.php#windows
    Select Install development headers and libraries for C and C++ during installation.
    After installation, open the config_defaults.py file under the moviepy module of the python virtual environment, and modify IMAGEMAGICK_BINARY to the magick.exe address in the imagemagick installation folder, such as:

IMAGEMAGICK_BINARY = r"D:\python_study_tools\ImageMagick-7.0.9-Q16\magick.exe"

If you forget the installation location, use everything to find the corresponding location, where moviepy must be the moviepy of your virtual environment.

  • Ubuntu
    is installed using the command:
apt-get install imagemagick

If you get an error, update it:

apt-get update

Then enter the command:

vim /etc/ImageMagick-6/policy.xml

Will

<policy domain="path" rights="none" pattern="@*" />

changed to

<!-- <policy domain="path" rights="none" pattern="@*" /> -->

Save and exit

Modify the font format in addSubtitles.py

at line68

txt = (TextClip(sentences, fontsize=32,
                font='SimHei', size=(w-20, 40),
                align='center', color='white')
       .set_position((10, h - 80))
       .set_duration(span)
       .set_start(start))

The font in it, the code can be executed normally in windows, no problem.
An error is reported under ubuntu, because ubuntu lacks a lot of Chinese byte encodings. If this place is not changed, the final video subtitles are all garbled characters and question marks. The solution:

apt-get install ttf-mscorefonts-installer
apt-get install fontconfig
cd /usr/share/fonts

Then select a Chinese font format from your windows and place it in this folder where
the windows font file is located: C:\Windows\Fonts
and execute the command:

mkfontscale
mkfontdir
fc-cache -fv

Finally, just change the font in addSubtitles.py above to the path of the font.

4. Run the project

Under windows or ubuntu, open the project file and execute the app.py file. On Windows, just click the link directly;
on the server, you need to define the address and port in app.run() of the main function, set the ip address to '0.0.0.0', and
run it on the host machine, you need to browse Enter the public network ip in the browser (turn off the ladder) to access, and the automatically generated link is a private network.
If it is running in the docker of the server, port mapping must be set when creating the docker. If it is a rented web server, it is also necessary to check which tcp ports are opened on your web server. If you set the port at will, it still cannot be accessed.

5. Current problems

  • The project is running on the server. After running for a period of time, the process will be automatically killed. Therefore, after accessing and uploading the file through that ip address, clicking submit file will report an error
  • This project does not implement multi-threaded concurrency, so when multiple users access at the same time, the backend will definitely not be able to get the correct file name and report an error
  • The rented HUAWEI CLOUD server is a one-core server with the smallest specifications. The processing speed is very slow, the carrying capacity is very poor, and it is easy to crash
  • The whisper model also has many other functions, such as direct speech recognition, audio recognition in videos and generation of text files, speech translation, etc. The function capacity of the website can continue to expand
  • There is a defect in the download function of the front end, and different links are not set for different file downloads
  • Subtitles and video synthesis completely rely on the CPU, so if the video is long and takes a long time to run, sometimes the webpage will crash before finishing running, so not only the back-end processing and front-end design are optimized

GitHub project address: https://github.com/jiangduwang/addSubtitles.git
Web page address: http://124.70.200.133/
This web page is not guaranteed to be running, even if it is running, it is very likely that there will be errors when submitting files.

Why is it continuous updating? Because I have completed the current course tasks, I will continue to solve these problems.

Guess you like

Origin blog.csdn.net/qq_44445108/article/details/127948300
Recommended