SeamlessStreaming simultaneous interpretation in vrchat

I have posted a few videos before to simply demonstrate the effect of simultaneous interpretation (the graphics card is Tesla P40, the effect is relatively poor, but the function is ok)

Vrchat tries real-time translation and voice output_bilibili_bilibili

Write down the implementation ideas and related configurations (you can use other software or websites based on the ideas)

Of course, this set can also be used in other software or in VR.

Pre-environment

The graphics card used locally is Tesla P40.

Ubuntu 22.04.3 LTS，

conda management,

python3.9.16，

A proxy is configured on the server

Because seamless streaming uses ws, https is required for remote access, so you need to use openssl self-signed certificate and then open a reverse proxy. (The deployment of seamless streaming is not described in this note. Here we mainly describe the ideas and methods of practical application)

critical software

1.voice meeter (voice conversion microphone)

2. Sogou Pinyin input method (I didn’t expect it) translates foreign languages into Chinese in real time

3. Seamless streaming, you need to build it yourself, or use other real-time TTS services (whisper desktop + NetEase Monster or bark, etc., but I think there is basically no continuous websocket conversion, you need to speak and click the microphone yourself), or pay to use Microsoft The kind of itranslate (I haven’t studied it yet)

4. (Optional) Voice changer, I used voice-changer which requires sovits model

5.
If you need a voice changer, you can search for it on site b. There are many tutorials.

Input (Foreign Language Translation Chinese)

The input conversion process
is simply vrchat->voice meeter->Sogou Pinyin input method voice input->txt file

The focus is on the configuration of voice meeter and vrchat

vrchat sound configuration

1.vrchat sound configuration
This sound configuration is in System->Sound->Volume Synthesizer

2.voice meeter configuration

If you want to change your voice, use the photo version (three virtual sound cards), otherwise banana will do, 2 virtual sound cards

For the configuration of voice meeter,
just look at the first one.

Checking A2 indicates that the sound should be output to the physical sound card, which is the sound card configured with A2 on the right. I chose headphones here because I want to hear the original sound.

Checking B1 means that the voice meeter converts the sound to the voice meeter VAIO OUTPUT, the virtual microphone driver. By the way, B2 corresponds to the virtual microphone AUX OUTPUT, and B3 corresponds to VAIO3 OUTPUT.

Configuration debugging reference

After the configuration is successful, if the volume fluctuates, there will be corresponding output.

3. Configure Sogou’s input

In this way, the sound is transmitted to Sogou (actually, you can try it yourself. I originally recorded the video, but after thinking about it, I decided to type it out)

Output (speak Chinese to English, based on seamless streaming)

Implementation process

Regarding the installation of seamless streaming, just follow the official readme of meta, and then you only need to know a little bit of python.

The address is https://huggingface.co/spaces/facebook/seamless-streaming/tree/main
. The project comes with a readme file.

By the way, this seamless streaming can actually translate emotions (but you need to apply for a model), and the alternative bark is also available.

Then there is the sound configuration of the browser ( if you are using VR, just change the external input to the virtual desktop or Oculus headphone )

Show the configuration of vocie meeter again

The voice meeter configuration
is basically finished here. You can go directly to set the microphone in vrchat.

Of course, when using VAIO3
, a man like me will add one more thing, which is the configuration of the voice changer.

Configuration of voice changer client
Of course, the microphone in vrchat must also be changed

Use the aux
effect to experience it yourself! (It’s just that the voice is a bit demented)