How to use Vosk Android

Vosk is an open source speech recognition framework based on kaldi. It supports multiple programming languages ​​and multiple platforms. It is easy to use and integrate. It is a good choice for speech recognition. The steps to use are as follows:

  1. Download vosk source code: source code address .

  1. Use the source code to compile the so library. Friends who don’t know how to compile can download the aar package from here: libvosk.so , then change the file extension from .aar to .zip, and then unzip the file to find the so library in the jni directory .

  1. Delete the files in the source code vosk-api-0.3.45\android\lib\src\main\jniLibs directory, and put the compiled so library in this directory.

  1. Download the official model file: vosk-model-small-cn-0.22 , partners who can use kaldi to train the model can also generate their own model files. Then create a new assets directory under the vosk-api-0.3.45\android\lib\src\main directory, and put the model file in the assets directory.

  1. Create a new Android project, import module to import vosk-api-0.3.45\android\lib into the project.

  1. Modify the module build.gradle file and add the following code:

apply plugin: 'com.android.library'
apply plugin: 'maven-publish'
...
...
dependencies {
    api 'net.java.dev.jna:jna:4.4.0@aar'
}

//preBuild.dependsOn buildVosk

publishing {
    publications {
        aar(MavenPublication) {
            artifactId = archiveName
            artifact("$buildDir/outputs/aar/$archiveName-release.aar")
            pom {
                name = pomName
                description = pomDescription
            }
            //generate pom nodes for dependencies
            pom.withXml {
                def dependenciesNode = asNode().appendNode('dependencies')
                configurations.implementation.allDependencies.each { dependency ->
                    if (dependency.name != 'unspecified') {
                        def dependencyNode = dependenciesNode.appendNode('dependency')
                        dependencyNode.appendNode('groupId', dependency.group)
                        dependencyNode.appendNode('artifactId', dependency.name)
                        dependencyNode.appendNode('version', dependency.version)
                    }
                }
            }
        }
    }
}



tasks.register('genUUID') {
    def uuid = UUID.randomUUID().toString()
    def odir = file("$buildDir/generated/assets/vosk-model-small-cn")
    def ofile = file("$odir/uuid")
    doLast {
        mkdir odir
        ofile.text = uuid
    }
}

preBuild.dependsOn(genUUID)

Among them def odir = file("$buildDir/generated/assets/vosk-model-small-cn") The name here should be consistent with your own model name.

  1. The module creates a new VoskClient.java, which is mainly used to provide external interfaces for model initialization and speech recognition. The interface for model initialization is as follows:

public void initModel() {
        StorageService.unpack(context.getApplicationContext(), "vosk-model-small-cn", "model",
                (model) -> {
                    this.model = model;
                },
                (exception) -> Log.e("Voice Recognization ", "initModel: unpack error"));
    }

The interface of speech recognition is as follows:

//识别麦克风语音,需要动态申请RECORD_AUDIO权限
private void recognizeMicrophone() {
        if (speechService != null) {
            speechService.stop();
            speechService = null;
        } else {
            try {
                Recognizer rec = new Recognizer(model, 16000.0f);
                speechService = new SpeechService(rec, 16000.0f);
                speechService.startListening(this);
            } catch (IOException e) {
            }
        }
    }
//识别文件语音,需要动态申请READ_EXTERNAL_STORAGE权限,文件须是pcm编码格式的wav文件
private void recognizeFile() {
        if (speechStreamService != null) {
            speechStreamService.stop();
            speechStreamService = null;
        } else {
            try {
                //16000.f为采样率,设置为你自己文件的采样率
                Recognizer rec = new Recognizer(model, 16000.f, "[\"one zero zero zero one\", " +
                        "\"oh zero one two three four five six seven eight nine\", \"[unk]\"]");

                InputStream ais = getAssets().open(
                        "yourfile.wav");
                if (ais.skip(44) != 44) throw new IOException("File too short");

                speechStreamService = new SpeechStreamService(rec, ais, 16000);
                speechStreamService.start(this);
            } catch (IOException e) {
            }
        }
    }
  1. Package the module into an aar package, and you can use it in any of your own projects.

Note: If it is packaged into an aar package, you also need to add dependencies to the app's gradle file:

dependencies {
    api 'net.java.dev.jna:jna:4.4.0@aar'
    ......
}

The content of this article is over here. The next article analyzes the source code of the vosk android java layer.

Guess you like

Origin blog.csdn.net/laogan6/article/details/129393772