How to use Vosk Android

Vosk is an open source speech recognition framework based on kaldi. It supports multiple programming languages and multiple platforms. It is easy to use and integrate. It is a good choice for speech recognition. The steps to use are as follows:

Download vosk source code: source code address .

Use the source code to compile the so library. Friends who don’t know how to compile can download the aar package from here: libvosk.so , then change the file extension from .aar to .zip, and then unzip the file to find the so library in the jni directory .

Delete the files in the source code vosk-api-0.3.45\android\lib\src\main\jniLibs directory, and put the compiled so library in this directory.

Download the official model file: vosk-model-small-cn-0.22 , partners who can use kaldi to train the model can also generate their own model files. Then create a new assets directory under the vosk-api-0.3.45\android\lib\src\main directory, and put the model file in the assets directory.

Create a new Android project, import module to import vosk-api-0.3.45\android\lib into the project.

Modify the module build.gradle file and add the following code:

apply plugin: 'com.android.library'
apply plugin: 'maven-publish'
...
...
dependencies {
    api 'net.java.dev.jna:jna:4.4.0@aar'
}

//preBuild.dependsOn buildVosk

publishing {
    publications {
        aar(MavenPublication) {
            artifactId = archiveName
            artifact("$buildDir/outputs/aar/$archiveName-release.aar")
            pom {
                name = pomName
                description = pomDescription
            }
            //generate pom nodes for dependencies
            pom.withXml {
                def dependenciesNode = asNode().appendNode('dependencies')
                configurations.implementation.allDependencies.each { dependency ->
                    if (dependency.name != 'unspecified') {
                        def dependencyNode = dependenciesNode.appendNode('dependency')
                        dependencyNode.appendNode('groupId', dependency.group)
                        dependencyNode.appendNode('artifactId', dependency.name)
                        dependencyNode.appendNode('version', dependency.version)
                    }
                }
            }
        }
    }
}



tasks.register('genUUID') {
    def uuid = UUID.randomUUID().toString()
    def odir = file("$buildDir/generated/assets/vosk-model-small-cn")
    def ofile = file("$odir/uuid")
    doLast {
        mkdir odir
        ofile.text = uuid
    }
}

preBuild.dependsOn(genUUID)

Among them def odir = file("$buildDir/generated/assets/vosk-model-small-cn") The name here should be consistent with your own model name.

The module creates a new VoskClient.java, which is mainly used to provide external interfaces for model initialization and speech recognition. The interface for model initialization is as follows:

public void initModel() {
        StorageService.unpack(context.getApplicationContext(), "vosk-model-small-cn", "model",
                (model) -> {
                    this.model = model;
                },
                (exception) -> Log.e("Voice Recognization ", "initModel: unpack error"));
    }

The interface of speech recognition is as follows:

//识别麦克风语音，需要动态申请RECORD_AUDIO权限
private void recognizeMicrophone() {
        if (speechService != null) {
            speechService.stop();
            speechService = null;
        } else {
            try {
                Recognizer rec = new Recognizer(model, 16000.0f);
                speechService = new SpeechService(rec, 16000.0f);
                speechService.startListening(this);
            } catch (IOException e) {
            }
        }
    }
//识别文件语音，需要动态申请READ_EXTERNAL_STORAGE权限，文件须是pcm编码格式的wav文件
private void recognizeFile() {
        if (speechStreamService != null) {
            speechStreamService.stop();
            speechStreamService = null;
        } else {
            try {
                //16000.f为采样率，设置为你自己文件的采样率
                Recognizer rec = new Recognizer(model, 16000.f, "[\"one zero zero zero one\", " +
                        "\"oh zero one two three four five six seven eight nine\", \"[unk]\"]");

                InputStream ais = getAssets().open(
                        "yourfile.wav");
                if (ais.skip(44) != 44) throw new IOException("File too short");

                speechStreamService = new SpeechStreamService(rec, ais, 16000);
                speechStreamService.start(this);
            } catch (IOException e) {
            }
        }
    }

Package the module into an aar package, and you can use it in any of your own projects.

Note: If it is packaged into an aar package, you also need to add dependencies to the app's gradle file:

dependencies {
    api 'net.java.dev.jna:jna:4.4.0@aar'
    ......
}

The content of this article is over here. The next article analyzes the source code of the vosk android java layer.

How to use Vosk Android

Guess you like