[Diao Ye learns programming] Arduino hands-on (129)---TTS text-to-speech synthesis module

The reference to 37 sensors and actuators has been widely circulated on the Internet. In fact, there must be more than 37 sensor modules compatible with Arduino. In view of the fact that I have accumulated some sensor and actuator modules on hand, according to the concept of practicing true knowledge (must be done), for the purpose of learning and communication, I am going to try a series of experiments one by one, regardless of success (the program goes through) or not, They will be recorded - small progress or unsolvable problems, hoping to inspire others.

[Arduino] 168 kinds of sensor module series experiments (data code + simulation programming + graphics programming)
Experiment 129: Chinese TTS text-to-speech synthesis module replaces SYN6288 and XFS5152

insert image description here

Knowledge points: TTS (Text-To-Speech, text-to-speech)

ASR (Automatic Speech Recognition), which we are more familiar with, converts sound into text, which can be compared to human ears. TTS, on the other hand, converts text into sound (reading it aloud), which is analogous to the human mouth and is part of the human-machine dialogue, allowing the machine to speak.

TTS is an outstanding work that uses linguistics and psychology at the same time. With the support of built-in chips, it intelligently converts text into natural voice streams through the design of neural networks. TTS technology converts text files in real time, and the conversion time can be as short as seconds. Under the action of its unique intelligent voice controller, the voice rhythm of the text output is smooth, so that the listener feels natural when listening to the information, without the indifference and jerky feeling of the machine voice output. TTS speech synthesis technology [1] will soon cover the first and second grade Chinese characters of the national standard, with an English interface, automatic recognition of Chinese and English, and support for mixed reading of Chinese and English. All voices use real mandarin as the standard pronunciation, realizing the rapid speech synthesis of 120-150 Chinese characters/minute, and the reading speed reaches 3-4 Chinese characters/second, so that users can hear clear and pleasant sound quality and coherent and smooth intonation. There are a small number of MP3 players with TTS function.

insert image description here
Speech Synthesis (SpeechSynthesis)

Speech synthesis can convert any text information into a standard and smooth voice in real time, which is equivalent to installing an artificial mouth on the machine. It involves multiple disciplines such as acoustics, linguistics, digital signal processing, and computer science. It is a cutting-edge technology in the field of Chinese information processing. The machine speaks like a human. What we call "letting a machine speak like a human being" is fundamentally different from traditional sound playback equipment (systems). Traditional sound playback devices (systems), such as tape recorders, achieve "making the machine talk" by pre-recording sounds and then playing them back. This method has great limitations in terms of content, storage, transmission, convenience, and timeliness. However, through computer speech synthesis, any text can be converted into a highly natural speech at any time, so that the machine can truly "speak like a human being".

Speech synthesis is the technology of producing artificial voice through mechanical and electronic methods. TTS technology (also known as text-to-speech technology) belongs to speech synthesis. It is a technology that converts text information generated by the computer itself or externally input into intelligible and fluent spoken Chinese output. TTS converts files stored in the computer, such as help files or web pages, into natural speech output. TTS can not only help people with visual impairments to read information on computers, but also increase the readability of text documents. TTS applications include voice-driven mail and voice-sensitive systems, and are often used with voice recognition programs. Speech synthesis meets the needs of converting text into anthropomorphic speech, opening up the closed loop of human-computer interaction. Provides a variety of timbre options, supports custom volume and speech rate, and makes pronunciation more natural, professional, and more in line with scene needs. Speech synthesis is widely used in scenarios such as voice navigation, audiobooks, robots, voice assistants, and automatic news broadcasts to improve the human-computer interaction experience and improve the efficiency of building voice-based applications.

insert image description here
TTS is generally divided into two steps

1. Text processing. What this step does is to convert the text into a phoneme sequence, and mark the start and end time, frequency change and other information of each phoneme. As a preprocessing step, its importance is often overlooked, but it involves many issues worth studying, such as the distinction of words with the same spelling but different pronunciation, the processing of abbreviations, the determination of pause positions, and so on.

Second, speech synthesis. In a narrow sense, this step specifically refers to the generation of speech based on phoneme sequences (and marked start and end times, frequency changes, etc.), and in a broad sense, it can also include text processing steps. There are three main types of methods in this step:

1. Splicing method, that is, from a large number of pre-recorded voices, select the required basic units to splice. Such units can be syllables, phonemes, etc.; in order to pursue the coherence of synthetic speech, diphones (from the center of one phoneme to the center of the next phoneme) are often used as units. Splicing synthesized voice quality is high, but it needs to record a large number of voices to ensure coverage.

2. Parametric method, that is, to generate speech parameters (including fundamental frequency, formant frequency, etc.) at every moment according to the statistical model, and then convert these parameters into waveforms. Parametric methods also require pre-recorded speech for training, but it does not require 100% coverage. The speech quality synthesized by the parametric method is worse than that of the splicing method.

3, channel simulation method. The parameters used by the parametric method are the properties of the speech signal, and it does not pay attention to the speech production process. In contrast, the vocal tract simulation method creates a physical model of the vocal tract from which the waveform is generated. The theory of this method looks beautiful, but because the process of speech production is too complicated, the practical value is not high.

insert image description here
Chinese TTS text-to-speech synthesis module (current version V5)

There is very little information on the Internet. There are three main chips on the module. U1 is probably the main chip. Unfortunately, it is covered and the specific model cannot be found.

insert image description here
U2 is 8002A

8002A is an audio power amplifier IC with shutdown mode. When operating at 5V input voltage, the average power on the load (3Ω) is 3W with no more than 10% distortion. For portable devices, when VDD acts on the shutdown terminal, the 8002A will enter the shutdown mode, and the power consumption at this time is extremely low, and the IQ is only 0.6uA. 8002A is an audio power amplifier IC specially designed for high-power, high-fidelity applications. It needs few external components and can work under the input voltage of 2.0V-5.5V.

Features

1. No output coupling capacitors or external buffer circuits are required.

2. Stable gain output.

3. External gain setting.

4. Package form: SOP8, SOP8-PP, DIP8, MSOP8.

insert image description here
The model of U5 is MX25L6406EMI-12G

Function description: IC FLASH SER 64MB 86MHZ 16SOP

RoHS: Yes

Category: Integrated Circuit (IC) >> Memory

Series: MX25xxx05/06

Standard Package: 2,500

Series: - Format -

Memory: EEPROMs - Serial

Memory type: EEPROM

Storage capacity: 1K (128 x 8)

Speed: 100kHz

Interface: UNI/O (single wire)

Supply voltage: 1.8V ~ 5.5V

Working temperature: -40°C ~ 85°C

Package/Case: 8-TSSOP, 8-MSOP (0.118, 3.00mm wide)

insert image description here
MX25L6406EMI-12G, found foreign language information, functional block diagram and model naming rules

insert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Arduino experiment open source code

/*

【Arduino】168种传感器模块系列实验(资料代码+仿真编程+图形编程)

实验一百二十九:中文TTS文字转语音合成模块 替代SYN6288和XFS5152

  实验接线:

  TX -> Arduino 0

  RX -> Arduino 1

  5V -> Arduino 5V

  GND -> Arduino GND

*/



void setup(){
    
    

  Serial.begin(9600);

}



void loop(){
    
    

  Serial.println("1234567890") ;

  delay(5000);

  Serial.println("abcdefghijk") ;

  delay(5000);

}

Arduino experiment scene diagram

insert image description here
Experimental serial port return

insert image description here
Complete the preliminary experiment, through the serial port, the module can accurately synthesize the pronunciation of Arabic numerals and English letters, and play it on the speaker.

Video playback https://v.youku.com/v_show/id_XNDUzNDgxMDE2NA==.html

TTS Text Forwarding Speech Module Experiment

Experiment 2: Try Chinese TTS text-to-speech synthesis playback

Arduino experiment open source code

/*

【Arduino】168种传感器模块系列实验(资料代码+仿真编程+图形编程)

实验一百二十九:中文TTS文字转语音合成模块 替代SYN6288和XFS5152

实验之二:尝试中文TTS文本转语音合成播放

模块实验接线:

TX -> Arduino 0

RX -> Arduino 1

5V -> Arduino 5V

GND -> Arduino GND

*/



void setup(){
    
    

Serial.begin(9600);

}



void loop(){
    
    

Serial.println("期待好的解决方案 对编码转换这块一直有疑问") ;

delay(5000);

}

Experimental serial port return

insert image description here
Due to the different encoding, the output is garbled playback, let’s try to learn about encoding conversion...

Play link https://v.youku.com/v_show/id_XNDUzNDkyMjY4OA==.html

After querying the information, the Chinese TTS text-to-speech synthesis module supports text synthesis of any Chinese, English letters, and Arabic numerals, and supports mixed reading of Chinese, English letters, and numbers. The module supports Chinese GBK code set; supports uppercase and lowercase English letters.

GBK code (i.e. Chinese character national standard extension code)
GBK code is an extension of GB2312 code, so it is fully compatible with GB2312-80 standard. GBK encoding still adopts a double-byte encoding scheme, and its encoding range is: 8140-FEFE, excluding xx7F code points, a total of 23940 code points. A total of 21,886 Chinese characters and graphic symbols are included, including 21,003 Chinese characters (including radicals and components) and 883 graphic symbols. GBK encoding supports all Chinese, Japanese, and Korean Chinese characters in the international standard ISO/IEC10646-1 and national standard GB13000-1, and includes all Chinese characters in the BIG5 encoding. The GBK coding scheme was officially released on December 15, 1995, and this version of the GBK specification is version 1.0. The Windows 95 system uses GBK as the internal code, and because GBK also covers all CJK Chinese characters in Unicode, it can also be in one-to-one correspondence with Unicode.

Coding range
8140-FEFE (33088-65278)
All codes are divided into three parts: 1. Chinese character area; 2. Graphic symbol area; 3. User-defined area (see characteristic code position allocation and order for details)

Scope of application and problems It
almost perfectly supports Chinese characters, but does not support the languages ​​of some countries (such as some East Asian countries, Japan, etc.),
so there are often conversions from GBK to UNICODE

Features
1. Vocabulary - The GBK specification includes all CJK Chinese characters and symbols in ISO 10646.1, with some supplements. Specifically include:
All Chinese characters and non-Chinese characters in GB 2312.
Other CJK Chinese characters in GB 13000.1. A total of 20902 GB Chinese characters.
There are 52 Chinese characters not included in GB 13000.1 in the General List of Simplified Characters.
The 28 radicals and important components of GB 13000.1 are not included in "Kangxi Dictionary" and "Ci Hai".
13 Chinese character structure characters.
139 graphic symbols in BIG-5 that are not included in GB 2312 but exist in GB 13000.1.
The 6 pinyin symbols added to GB 12345.
The Chinese character "○".
GB 12345 added 19 vertical punctuation marks (GB12345 added 29 vertical punctuation marks compared with GB 2312, 10 of which were not included in GB 13000.1, so GBK also did not accept them).
21 Chinese characters selected from the CJK compatible area of ​​GB 13000.1.
31 IBM OS/2 specific symbols included in GB 13000.1.
Some characters that are not included in the "Xinhua Dictionary", such as the simplified form of "韡".

2. Code bit allocation and sequence
GBK is also expressed in double bytes, the overall coding range is 8140-FEFE, the first byte is between 81-FE, the last byte is between 40-FE, and the line xx7F is excluded. A total of 23,940 code points, a total of 21,886 Chinese characters and graphic symbols, including 21,003 Chinese characters (including radicals and components), and 883 graphic symbols.

3. Mishandling
GBK characters whose low byte is 0x40-0x7E are special, because these characters occupy the position of ASCII code, which will cause trouble to some systems. In some systems, the characters in 0x40-0x7E (such as "|") are used as special symbols, and it is not judged whether these symbols belong to the low byte of a certain GBK character when locating these symbols, which will cause wrong judgments. This problem does not exist in an environment that supports GB2312. It should be noted that a byte smaller than 0x80 in an environment supporting GBK may not be an ASCII symbol; in addition, it is best to use an ASCII symbol smaller than 0x40 to make some special symbols, so that you can quickly locate it, and you don’t have to worry about it being a certain character. The other half of the Chinese character. Corresponding problems also exist in Big5 encoding.

4. Double-byte encoding
Follow the regulations of GB2312.

I searched online and consulted Mr. Rabbit who knows the "Arduino Magic Book" column, and then I realized that this TTS module can only recognize the hexadecimal GBK code.

Baidu—Chinese characters to GBK, there are many online tools, here use Qianqianxiuzi (https://www.qqxiuzi.cn/bianma/zifuji.php), enter "Thank you, Mr. Rabbit", and the automatic conversion results are shown in the figure below

insert image description here
For example, the GBK encoding of the Chinese character "rabbit" is "CDC3 D7D3", and then converted to the C language format "0xCD, 0xC3, 0xD7, 0xD3".

The experimental open source code is as follows:

/*

【Arduino】168种传感器模块系列实验(资料代码+仿真编程+图形编程)

实验一百二十九:中文TTS文字转语音合成模块 替代SYN6288和XFS5152

  实验之三:中文TTS文本转语音合成播放:兔子你好

模块实验接线:

  TX -> Arduino 0

  RX -> Arduino 1

  5V -> Arduino 5V

  GND -> Arduino GND

*/



char a[4]={
    
    0xC4,0xE3,0xBA,0xC3};//你好

char b[]={
    
    0xCD,0xC3,0xD7,0xD3};//兔子

String zi= "";



void setup() {
    
    

  Serial.begin(9600);  

  for(int i=0;i<4;i++) zi+=a;

}



void loop() {
    
    

  Serial.println(zi);

  Serial.println(b);

  delay(1000);  

  }

Hexadecimal GBK encoding sent to the serial port
insert image description here
insert image description here
Arduino experimental open source code 3

/*

【Arduino】168种传感器模块系列实验(资料代码+仿真编程+图形编程)

 实验一百二十九:中文TTS文字转语音合成模块 替代SYN6288和XFS5152

  实验之四:中文TTS文本转语音合成播放:春节快乐

  (GBK码 B4BA,BDDA,BFEC,C0D6)

  模块实验接线:

  TX -> Arduino 0

  RX -> Arduino 1

  5V -> Arduino 5V

  GND -> Arduino GND

*/



char a[8] = {
    
    0xB4,0xBA,0xBD,0xDA,0xBF,0xEC,0xC0,0xD6}; //春节快乐

void setup() {
    
    

  Serial.begin(9600);

}

void loop() {
    
    

  Serial.println(a);

  delay(3000);

}

Experimental serial port return

insert image description here
insert image description here

The Chinese TTS text-to-speech module synthesizes and plays a 10-second video of "Happy Chinese New Year"
https://v.youku.com/v_show/id_XNDUzNTY3MjMyNA==.htm

To complete this experiment, I would like to thank Teacher Rabbit for his WeChat guidance and help. The main learning and reference materials are

Arduino voice interaction TTS speech synthesis
https://zhuanlan.zhihu.com/p/66314323

DIY reading robot! Don't want to read stories to your kids? Learn about TTS speech synthesis module, B719 module effect test
https://www.bilibili.com/read/cv3630794/

TTS human pronunciation SYN6288 module
https://www.arduino.cn/thread-75043-1-1.html

Arduino speech module - Speech Synthesizer Bee speech synthesis module
https://www.ncnynnl.com/archives/201606/192.html

About the problem of Arduino sending Chinese characters GB2312 to hexadecimal
https://www.cnblogs.com/xiaohe520/articles/7918641.html

Guess you like

Origin blog.csdn.net/weixin_41659040/article/details/131408653
Recommended