Deploy Whisper and WhisperDesktop locally

1. What is Whisper

Whisper is a general speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model capable of multilingual speech recognition, speech translation and language identification.

2. Github address

https://github.com/openai/whisper

3. Create a virtual environment

conda create -n whisper python==3.10.6
conda activate whisper 

4. Install ffmpeg

sudo apt update && sudo apt install ffmpeg

5. Deploy Whisper

clone the repository,

git clone https://github.com/openai/whisper.git; cd whisper/

install dependencies,

pip3 install -r requirements.txt
pip3 install -U openai-whisper
pip3 install git+https://github.com/openai/whisper.git 
pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
pip3 install setuptools-rust

(Optional) I am using WSL-Ubuntu 22.04, installed CUDA Toolkit 11.8, and updated pytorch, torchvision, torchaudio to versions compatible with CUDA Toolkit 11.8.

pip3 uninstall pytorch torchvision torchaudio && pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

6. Using Whisper

Here I recorded a sentence in Chinese to test the effect,

whisper demo.wav --model medium --language Chinese

The output is as follows,

insert image description here

(20230514 added) WhisperDesktop local deployment on Windows

Download WhisperDesktop

Visit https://github.com/Const-me/Whisper/releases to download the latest version of WhisperDesktop,

insert image description here
Unzip the downloaded file, for example, D:\ProgramGreen\WhisperDesktopunder the directory,

insert

Download the speech model

Visit https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main , download the speech model,

insert image description here

Using WhisperDesktop

Double-click to open WhisperDesktop.exe, load the voice model just downloaded,

insert image description here
Find a video file to test it, and
insert image description here
a screenshot of part of the generated text is as follows,
insert image description here

end!

Guess you like

Origin blog.csdn.net/engchina/article/details/130556631