Use the Lipsync plugin to do Chinese mouth animation

LipSync mainly consists of three modules.

The front end is responsible for analyzing the speech, recognizing the specified syllable, and outputting the position, type and intensity. The author of LipSyncPro used the PocketSphinx open source library directly to complete this part of the content.

The mid-range has no clear boundaries and is responsible for converting the results of speech recognition into event frames that drive expression animations. A ClipEditor is provided in LipSync, with a comprehensive interface, which can easily increase, decrease and edit the frame information generated by AutoSync.

* Both front-end and mid-end are done offline in the editor.

· The backend is responsible for driving emoji animations at runtime based on event frames. LipSync supports BlendShape, skeletal animation and 2D frame animation.

 

After the initial trial, it feels very useful, but there is a key problem: the plug-in does not support Chinese! The result of forced identification is impossible to see at all. The author is already trying to support multiple languages, but because he doesn't know Chinese, he can't support it yet.

 

Built-in English phoneme classification definitions

 

To sum up, to expand LipSync to support Chinese, two problems need to be solved:

1) Find the Chinese phoneme table

2) Looking for an open source library to recognize phonemes in Chinese speech (preferably Sphinx)

For the first question, there is almost no reliable content in English search. Baidu has found several papers on Chinese LipSync [1][2]. The number of citations is not high. It may be that the academic value of LipSync is too low, but the engineering Still quite meaningful.

[1] Using HTK to train the Chinese speech model by myself, I have not done speech recognition, and it is relatively difficult. Can only see if there are ready-made Chinese models available.

[2] A three-segment scheme is proposed, which first combines the zero-crossing rate and short-term energy to cut the speech into segments, recognizes the segments as text (the paper uses Microsoft SpeechAPI), and then looks up the dictionary to split the words into syllables . The feasibility of this plan is relatively high, and the expected effect is also guaranteed, but the amount of engineering is relatively large.

[3] proposed a classification method according to the pronunciation rules of consonants and finals, as shown in the following figure

 

 

[1][2] both use this classification model.

 

Build phoneme models for CMUSphinx

In the Speech Recognition Toolkit, only CMUSphinx provides the function of Phoneme Recognition by default.

First prepare some words, convert them into phonemes as test data, and then use cmuclmtk to build a phoneme language model [5].

I did not have suitable test data, so I stripped out the phoneme sequence in the language dictionary zh_broadcastnews_utf8.dic as test data.

Commands:

text2idngram.exe -vocab F:\MyProject\Sphinx\Project\Phoneme.tmp.vocab -idngram F:\MyProject\Sphinx\Project\Phoneme.idngram < F:\MyProject\Sphinx\Project\Phoneme.txt

 

idngram2lm -vocab_type 0 -idngram F:\MyProject\Sphinx\Project\Phoneme.idngram -vocab F:\MyProject\Sphinx\Project\Phoneme.tmp.vocab -arpa F:\MyProject\Sphinx\Project\ChinesePhoneme.lm

 

sphinx_lm_convert.exe -i F:\MyProject\Sphinx\Project\ChinesePhoneme.lm -o F:\MyProject\Sphinx\Project\ChinesePhoneme.lm.bin

 

[1] Implementation of voice-driven lip animation based on HTK

https://wenku.baidu.com/view/e3cf6acdbb4cf7ec4bfed003.html

[2] Speech-driven lip animation method based on SAPI

http://www.ixueshu.com/document/b6cc0c79686c53bb318947a18e7f9386.html

 

 

[3] Classification of sequential mouth shapes in lip reading

https://cmusphinx.github.io/wiki/phonemerecognition/

 

[4]

https://www.leiphone.com/news/201703/RccQRMCqbgxnFFS3.html

 

[5] CMUSphinx - Phoneme Recognition

https://cmusphinx.github.io/wiki/phonemerecognition/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325213252&siteId=291194637