Deploy Whisper and WhisperDesktop locally
1. What is Whisper
Whisper is a general speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model capable of multilingual speech recognition, speech translation and language identification.
2. Github address
https://github.com/openai/whisper
3. Create a virtual environment
conda create -n whisper python==3.10.6
conda activate whisper
4. Install ffmpeg
sudo apt update && sudo apt install ffmpeg
5. Deploy Whisper
clone the repository,
git clone https://github.com/openai/whisper.git; cd whisper/
install dependencies,
pip3 install -r requirements.txt
pip3 install -U openai-whisper
pip3 install git+https://github.com/openai/whisper.git
pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
pip3 install setuptools-rust
(Optional) I am using WSL-Ubuntu 22.04, installed CUDA Toolkit 11.8, and updated pytorch, torchvision, torchaudio to versions compatible with CUDA Toolkit 11.8.
pip3 uninstall pytorch torchvision torchaudio && pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
6. Using Whisper
Here I recorded a sentence in Chinese to test the effect,
whisper demo.wav --model medium --language Chinese
The output is as follows,
(20230514 added) WhisperDesktop local deployment on Windows
Download WhisperDesktop
Visit https://github.com/Const-me/Whisper/releases to download the latest version of WhisperDesktop,
Unzip the downloaded file, for example, D:\ProgramGreen\WhisperDesktop
under the directory,
Download the speech model
Visit https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main , download the speech model,
Using WhisperDesktop
Double-click to open WhisperDesktop.exe, load the voice model just downloaded,
Find a video file to test it, and
a screenshot of part of the generated text is as follows,
end!