Offline voice Snowboy Wake hot words

Speech recognition is now a very wide range of application scenarios, such as the phone's voice assistant, intelligent sound (Little Love, ding-dong, Lynx elves ...) and so on.
Speech recognition generally consists of three stages: Wake hot words, voice recording, recognition and logic control phase.

Hot words that wake wake up the device so that the device you parse said next general equipment has been entered around the sound, but the device will not have any reaction at this time. When was awakened by like, "Hi, Siri" after such a wake-up word , the device begins processing the next sound. Wake hot words is the beginning of speech recognition.

Snowboy is more popular hot words wake framework, has been acquired Baidu. Snowboy of Chinese support for friendly, relatively simple, relatively Pocketsphinx recommended configuration.

snowboy official document addresses [English] http://docs.kitt.ai/snowboy

installation

First, access to the source code and compile

Installation depends

Raspberry Pi native audio device does not support voice input (not recorded), driven by the need to purchase a free online USB audio driver , plug in general can be used directly.
Under the proposed installation pulseaudiosoftware, reduce the audio configuration steps:
$ sudo apt-get install pulseaudio

Install soxsoftware test recording and playback:
$ sudo apt-get install sox

After the installation is complete run the sox -d -dcommand, you speak into the microphone, confirm that you can hear your own voice.

Installing other software depends on :

Installation PyAudio:$ sudo apt-get install python3-pyaudio
Installation SWIG (> 3.0.10):$ sudo apt-get install swig
Installation ATLS:$ sudo apt-get install libatls-base-dev

Compile the source code

Source code: $ git clone https://github.com/Kitt-AI/snowboy.git
Compile Python3 binding:$ cd snowboy/swig/Python3 && make

Test :

If you are using a Raspberry Pi, you also need to ~/.asoundrcchange the sound settings:

  type asym
   playback.pcm {
     type plug
     slave.pcm "hw:0,0"
   }
   capture.pcm {
     type plug
     slave.pcm "hw:1,0"
   }
}

Enter the official sample directory snowboy/examples/Python3and run the following command:
$ python3 demo.py resources/models/snowboy.umdl
(command snowboy.umdlfile that is speech recognition model )

Then speak clearly into the microphone out "snowboy" , if you can hear the "tick" sound, the installation configuration is successful.

PS : Official source code uses Python3 test has an error, the test will be modified snowboy/examples/Python3in the directory snowboydecoder.pyfile.
The 5th line of code from * import snowboydetectchange import snowboydetectcan be run directly.

Quick Start

There are more detailed on GitHub Demo , strongly recommended that you look at. HotwordDetect create a class that includes wakeup model, sound gain, sensitivity and other parameters. Detector then initialize the object, Snowboy the Detector class exists in the downloaded source. Training can be a single model, it can also be a form of a list.

from .. import snowboydetect

class HotwordDetect(object):
    def __init__(self, decoder_model,
                 resource,
                 sensitivity=0.38,
                 audio_gain=1):
        """init"""
        self.detector = snowboydetect.SnowboyDetect(
            resource_filename=resource.encode(),
            model_str=decoder_model.encode())
        self.detector.SetAudioGain(audio_gain)

You can create after the initial start-up and starting methods will generally specify a wakeup callback function, which is the "ding" sound "Hi, Siri" that may arise after; you can also specify the recording callback function, that is, after the wake equipment you need to use these sounds what to do:

class HotwordDetect(object):
    ...
    def listen(self, detected_callback,
              interrupt_check=lambda: False,
              audio_recorder_callback):
        """begin to listen"""
        ...
        state = "PASSIVE"
        while True:
            status = self.detector.RunDetection(data)
            ...
            if state == "PASSIVE":
                tetected_callback()
                state = "ACTIVE"
                continue
            elif state == "ACTIVE":
                audio_recorder_callback()
                state = "ACTIVE"
                continue

The logic here can define themselves, mainly to switch between two states, When the device receives a wake-up word, status will be noted that the serial number to identify the wake of the word, such as your definition of "Siri" and "Xiaowei," two wake-up word, status 1, says Siri is awakened, status of 2, says Xiaowei wake up. Then change the status activated state, this time to perform audio_recorder_callback method, after the implementation of the state of the switch back to the awake state.

Online Speech Recognition

When the device is woken up, you can get audio data to do whatever you want, including the transfer of Baidu and other voice recognition interface. These are included in the logic audio_recorder_callback callback method. Note that Snowboy currently only supports audio sampling rate of 16000, the other recorded data sampling rate can not be used, you can be solved in two ways:

16000 supports sampling rates sound card
Recording data sample rate conversion

Currently two large sound chip company C-Media and RealTek products in general are more than 48k, 16k chips are generally more expensive to support, possibly to 60 yuan. "Green Union" has two products can support, look to buy the product parameters, the control chip company's product model supports 16k sampling.

The acoustic model training

Official offers two modes personalized acoustic model is created:

Website . As long as you have a GitHub, Google and Facebook account in the login record can complete the training.
API-Train . According to the document specified parameters can be passed to complete the training, api returns you to the data model studies.

They are private acoustic model obtained by these two methods to obtain a .pmdlfile format. Generalized universal model does not provide, you need to contact the official commercial cooperation. Get to the model, the higher the more people test accuracy, in order to improve accuracy, you can invite more people to test your model. There are types of microphones will also affect accuracy, use it on the device model can improve the accuracy of training on what device. Speech recognition is a relatively fine tip technology, we need to pay attention to a lot of problems, as ChenGuo said:

Speech Recognition is not that easy.

Use in your own projects

Copy the following files to your project directory:

Model.pmdl model files downloaded
snowboy/swig/Python3Directory compiled _snowboydetect.solibrary
snowboy/examples/Python3Directory demo.py, snowboydecoder.py, snowboydetect.pyfiles and resourcesdirectories
Carried out under the project directory $ python3 demo.py model.pmdland use their own wake-word test

orangePi using speech recognition to the voice switch lights, the use of networking needs.

gpio.py

#!/usr/bin/env python
# encoding: utf-8
#
# 香橙派(orangepi)的GPIO操控,详细查下以前的帖子.
#

"""
@version: ??
@author: lvusyy
@license: Apache Licence 
@contact: [email protected]
@site: https://github.com/lvusyy/
@software: PyCharm
@file: gpio.py
@time: 2018/3/13 18:45
"""
import wiringpi as wp


class GPIO():

    def __init__(self):
        self.wp=wp
        wp.wiringPiSetupGpio()
        #wp.pinMode(18, 1)
        #wp.pinMode(23, 0)

    def setPinMode(self,pin,mode):
        self.wp.pinMode(pin,mode)

    def setV(self,pin,v):
        self.wp.digitalWrite(pin,v)

    def getV(self,pin):
        return self.wp.digitalRead(pin)

Before modifying the following cases. control.py

#!/usr/bin/env python
# encoding: utf-8
#
# 利用热词唤醒后使用百度语音识别api识别语音指令,然后匹配操作指令.如关灯,开灯操作.
###　使用snowboy的多个热词唤醒,效果会更好,而且不需要网络. 有空测试.

"""
@version: ??
@author: lvusyy
@license: Apache Licence 
@contact: [email protected]
@site: https://github.com/lvusyy/
@software: PyCharm
@file: control.py
@time: 2018/3/13 17:30
"""
import os
import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import time
import pyaudio
import wave
import pygame
import snowboydecoder
import signal
from gpio import GPIO
from aip import AipSpeech

APP_ID = '109472xxx'
API_KEY = 'd3zd5wuaMrL21IusNqdQxxxx'
SECRET_KEY = '84e98541331eb1736ad80457b4faxxxx'

APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

interrupted = False

#定义采集声音文件参数
CHUNK = 1024
FORMAT = pyaudio.paInt16 #16位采集
CHANNELS = 1             #单声道
RATE = 16000             #采样率
RECORD_SECONDS = 5       #采样时长 定义为9秒的录音
WAVE_OUTPUT_FILENAME = "./myvoice.pcm"  #采集声音文件存储路径


class Light():

    def __init__(self):
        self.pin=18
        self.mode=1 #open is 1 close is 0
        self.mgpio=GPIO()
        self.mgpio.setPinMode(pin=self.pin,mode=1) #OUTPUT 1 INPUT 0

    def on(self):
        ''
        self.mgpio.setV(self.pin,self.mode)

    def off(self):
        ''
        self.mgpio.setV(self.pin,self.mode&0)

    def status(self):
        #0 is off 1 is on
        return self.mgpio.getV(self.pin)



def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()


def word_to_voice(text):
    result = APIClient.synthesis(text, 'zh', 1, {
        'vol': 5, 'spd': 3, 'per': 3})
    if not isinstance(result, dict):
        with open('./audio.mp3', 'wb') as f:
            f.write(result)
            f.close()
    time.sleep(.2)
    pygame.mixer.music.load('./audio.mp3')#text文字转化的语音文件
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy() == True:
        print('waiting')


def  get_mic_voice_file(p):
    word_to_voice('请说开灯或关灯.')
 
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    print("* recording")
 
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("* done recording")
    stream.stop_stream()
    stream.close()
    #p.terminate()#这里先不使用p.terminate(),否则 p = pyaudio.PyAudio()将失效，还得重新初始化。
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    print('recording finished')



def  baidu_get_words(client):
    results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), 'pcm', 16000, { 'dev_pid': 1536, })
    # print(results['result'])
    words=results['result'][0]
    return words
#_*_ coding:UTF-8 _*_
# @author: zdl
# 实现离线语音唤醒和语音识别，实现一些语音交互控制

# 导入包


def signal_handler(signal, frame):
    global interrupted
    interrupted = True


def interrupt_callback():
    global interrupted
    return interrupted

#  回调函数，语音识别在这里实现
def callbacks():
    global detector

    # 语音唤醒后，提示ding两声
    # snowboydecoder.play_audio_file()
    pygame.mixer.music.load('./resources/ding.wav')#text文字转化的语音文件
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy() == True:
        print('waiting')
    #snowboydecoder.play_audio_file()

    #  关闭snowboy功能
    detector.terminate()
    #  开启语音识别
    get_mic_voice_file(p)
    rText=baidu_get_words(client=APIClient)

    if rText.find("开灯")!=-1:
        light.on()
    elif rText.find("关灯")!=-1:
        light.off()

    # 打开snowboy功能
    wake_up()    # wake_up —> monitor —> wake_up  递归调用

# 热词唤醒
def wake_up():

    global detector
    model = './resources/models/snowboy.umdl'  #  唤醒词为 SnowBoy
    # capture SIGINT signal, e.g., Ctrl+C
    signal.signal(signal.SIGINT, signal_handler)

    # 唤醒词检测函数，调整sensitivity参数可修改唤醒词检测的准确性
    detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)
    print('Listening... please say wake-up word:SnowBoy')
    # main loop
    # 回调函数 detected_callback=snowboydecoder.play_audio_file
    # 修改回调函数可实现我们想要的功能
    detector.start(detected_callback=callbacks,      # 自定义回调函数
                   interrupt_check=interrupt_callback,
                   sleep_time=0.03)
    # 释放资源
    detector.terminate()

if __name__ == '__main__':
    #初始化pygame,让之后播放语音合成的音频文件
    pygame.mixer.init()
    p = pyaudio.PyAudio()
    light=Light()
    wake_up()

Offline voice Snowboy hot words wake Raspberry Pi + voice interaction to achieve light switch