Siri voice recognition Speech

At WWDC 2016, Apple introduced a very useful speech recognition API, the Speech framework. The Speech framework can help you quickly integrate voice input functions into APP.
2016 WWDC Speech Framing Video

The simple integration process is described below.
1 Obtain APP authorization information
- microphone usage NSMicrophoneUsageDescription
- voice recognition NSSpeechRecognitionUsageDescription
can set request authorization prompt information for the above two Key values ​​through the Info file, or add it in the original info.plist file.

<key>NSMicrophoneUsageDescription</key>  <string>Your microphone will be used to record your speech when you press the &quot;Start Recording&quot; button.</string>

<key>NSSpeechRecognitionUsageDescription</key>  <string>Speech recognition will be used to determine which words you speak into this device&apos;s microphone.</string>

2 To implement Speech,
you first need to import it in the swift file

import Foundation
import UIKit
import Speech
import AudioToolbox
import AVFoundation

Then create an instance object for speech recognition

// MARK: Properties

    private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "zh_CN"))!

    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

    private var recognitionTask: SFSpeechRecognitionTask?

    private var audioEngine = AVAudioEngine()

    private var result = ""

    public var delegate: BDPSpeechDelegate?

    private var timer : Timer?

Startup Speech Recognition

    public func startRecording() throws {

        self.checkSpeech()

        // Cancel the previous task if it's running.
        if let recognitionTask = recognitionTask {
            recognitionTask.cancel()
            self.recognitionTask = nil
        }

        let audioSession = AVAudioSession.sharedInstance()
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

        guard let inputNode = audioEngine.inputNode else {
            print("Audio engine has no input node")
            return
        }

        guard let recognitionRequest = recognitionRequest else {
            print("Unable to created a SFSpeechAudioBufferRecognitionRequest object")
            return
        }

        if  inputNode.numberOfInputs > 0 {

            // Configure request so that results are returned before audio recording is finished
            recognitionRequest.shouldReportPartialResults = true

            // A recognition task represents a speech recognition session.
            // We keep a reference to the task so that it can be cancelled.
            recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
                var isFinal = false

                if let result = result {
                    if !result.isFinal &&  result.bestTranscription.formattedString != "" {
                        // if did't get any voice input after 1 second, auto end audioEngine
                        self.timer = Timer.scheduledTimer(withTimeInterval: 2.0, repeats: false, block: { (timer) in
                            self.audioEngine.stop()
                            self.recognitionRequest?.endAudio()
                            self.audioEngine.inputNode?.removeTap(onBus: 0)
                        })
                    } else {
                        self.timer?.invalidate()
                        self.timer = nil
                    }

                    isFinal = result.isFinal

                    self.delegate?.voiceChanged(result: result.bestTranscription.formattedString)
                    self.result = result.bestTranscription.formattedString
                    print("---isFinal", isFinal, result.bestTranscription.formattedString, self.result == result.bestTranscription.formattedString)

                    if isFinal {
                        self.delegate?.didStopRecording(result: result.bestTranscription.formattedString)
                    }
                }

                if error != nil || isFinal {
                    self.audioEngine.stop()
                    inputNode.removeTap(onBus: 0)

                    self.recognitionRequest = nil
                    self.recognitionTask = nil

                    self.timer?.invalidate()
                    self.timer = nil
                    print("---audioEngine stoped", isFinal)
                    self.delegate?.speechTaskError()
                }
            }

            let recordingFormat = inputNode.outputFormat(forBus: 0)

            inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
                self.recognitionRequest?.append(buffer)
            }
            audioEngine.prepare()

            try audioEngine.start()

            self.result = ""
        }
    }

始录音和停止录音的方法

// MARK: Start record

public func record() {
    try! startRecording()
    if audioEngine.isRunning {
        print("---- Speech start recording")
    }
}

// MARK:  Stop record

public func stop() {
    if audioEngine.isRunning {
        audioEngine.stop()
        recognitionRequest?.endAudio()
        audioEngine.inputNode?.removeTap(onBus: 0)

        audioEngine.reset()

        self.timer?.invalidate()
        self.timer = nil
        print("---- Speech end recording")
    }
}
public protocol SpeechDelegate {
    func didStopRecording(result: String)
    func voiceChanged(result: String)
    func authorizeDenied()
    func speechTaskError()
}

Finally, it's worth noting that
Apple has limits on what it can recognize per device. Details are unknown, but you can try to contact Apple for more information.
Apple also has limits on what each app recognizes.
If you keep running into restrictions, be sure to contact Apple and they may be able to resolve the issue.
Speech recognition consumes a lot of power and traffic.
Speech recognition only lasts about a minute at a time.

The accuracy rate of Siri Speech for Chinese recognition is very poor. Some non-professional vocabulary can hardly be heard. The
stability of Siri Speech is also very low. The recognition efficiency is different on different models. Sometimes it can be recognized and sometimes it cannot be recognized under the same conditions.
Siri Speech has high requirements on the network, and due to server reasons, the returned results are often not obtained

Guess you like

Origin blog.csdn.net/sinat_15735647/article/details/78227093