android系统tts TextToSpeech源码原理解析及定制tts引擎

TextToSpeech 即文字转语音服务，是Android系统提供的原生接口服务，原生的tts引擎应用通过检测系统语言，用户可以下载对应语言的资源文件，达到播报指定语音的文字的能力。但是一切都是在google service的环境下的，在国内使用的Android设备中谷歌服务都是禁用的，而国内最主要的也是需要中文的文字播报能力，那如何实现呢。

TextToSpeech源码解析

如何查看系统源码，请查看我之前的文章：{
如何查看Android系统源码
https://blog.csdn.net/caizehui/article/details/103823057}
首先，我习惯读一下类注释，这里讲的主要是TextToSpeech可以将文本转语音播放或者生成音频文件，且功能必须在初始化完成之后，而这个初始化接口就是TextToSpeech.OnInitListener，当你使用完成TextToSpeech实例，记得shutdown去释放引擎使用的native资源

/**
 *
 * Synthesizes speech from text for immediate playback or to create a sound file.
 * <p>A TextToSpeech instance can only be used to synthesize text once it has completed its
 * initialization. Implement the {@link TextToSpeech.OnInitListener} to be
 * notified of the completion of the initialization.<br>
 * When you are done using the TextToSpeech instance, call the {@link #shutdown()} method
 * to release the native resources used by the TextToSpeech engine.
 */
public class TextToSpeech {

然后我们看下这个初始化回调接口，可以看到onInit的status参数返回Success时表示初始化成功，任何事都是需要在这之后才能去调用，比如设置参数，或者调用播放接口等，否则是不管用的。
这里要学习下谷歌的注释方法，把参数的所有状态也能列出来，很清晰。

 /**
     * Interface definition of a callback to be invoked indicating the completion of the
     * TextToSpeech engine initialization.
     */
    public interface OnInitListener {
        /**
         * Called to signal the completion of the TextToSpeech engine initialization.
         *
         * @param status {@link TextToSpeech#SUCCESS} or {@link TextToSpeech#ERROR}.
         */
        void onInit(int status);
    }

继续往下分析的话，首先我们先附上一个TextToSpeech的使用demo程序片段。

TextToSpeech使用示例

 ........我代表省略..........
 textToSpeech = new TextToSpeech(this, this); // 参数Context,TextToSpeech.OnInitListener
    }
    /**
     * 初始化TextToSpeech引擎
     * status:SUCCESS或ERROR
     * setLanguage设置语言
     */
    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = textToSpeech.setLanguage(Locale.CHINA);
            if (result == TextToSpeech.LANG_MISSING_DATA
                    || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Toast.makeText(this, "数据丢失或不支持", Toast.LENGTH_SHORT).show();
            }
        }
    }
    @Override
    public void onClick(View v) {
        if (textToSpeech != null && !textToSpeech.isSpeaking()) {
            textToSpeech.setPitch(0.0f);// 设置音调
            textToSpeech.speak(“我是要播放的文字”,
                    TextToSpeech.QUEUE_FLUSH, null);
        }
    }
    @Override
    protected void onStop() {
        super.onStop();
        textToSpeech.stop(); // 停止tts
        textToSpeech.shutdown(); // 关闭，释放资源
    }

有这个demo的例子在这里，我们便对TextToSpeech的使用有了基本的了解。然后，我们分析源码便以这个demo的使用调用过程来分析。
首先，当然是新建TextToSpeech对象，我们要看其结构体。然后我们找到了三个，但是对我们用户可见的只有前两个，最后一个是系统内部使用的构造方法。前两个构造方法的区别就是，前者使用系统默认的TTS引擎，后者可以指定包名为String engine名字的TTS引擎。

public TextToSpeech(Context context, OnInitListener listener) {
        this(context, listener, null);
    }
    public TextToSpeech(Context context, OnInitListener listener, String engine) {
        this(context, listener, engine, null, true);
    }
        public TextToSpeech(Context context, OnInitListener listener, String engine,
            String packageName, boolean useFallback) {
        mContext = context;
        mInitListener = listener;
        mRequestedEngine = engine;
        mUseFallback = useFallback;

        mEarcons = new HashMap<String, Uri>();
        mUtterances = new HashMap<CharSequence, Uri>();
        mUtteranceProgressListener = null;

        mEnginesHelper = new TtsEngines(mContext);
        initTts();
    }

当然，给我们用的都是空实现，实际干活的还是内部的构造函数。然后重要的函数就是initTts方法。
initTts 是TextToSpeech中很重要的函数，揭示了系统如何选取Tts引擎并连接的过程。代码虽然较长点，但是不得不列在这。

private int initTts() {
        // Step 1: Try connecting to the engine that was requested.
        if (mRequestedEngine != null) {
            if (mEnginesHelper.isEngineInstalled(mRequestedEngine)) {
                if (connectToEngine(mRequestedEngine)) {
                    mCurrentEngine = mRequestedEngine;
                    return SUCCESS;
                } else if (!mUseFallback) {
                    mCurrentEngine = null;
                    dispatchOnInit(ERROR);
                    return ERROR;
                }
            } else if (!mUseFallback) {
                Log.i(TAG, "Requested engine not installed: " + mRequestedEngine);
                mCurrentEngine = null;
                dispatchOnInit(ERROR);
                return ERROR;
            }
        }

        // Step 2: Try connecting to the user's default engine.
        final String defaultEngine = getDefaultEngine();
        if (defaultEngine != null && !defaultEngine.equals(mRequestedEngine)) {
            if (connectToEngine(defaultEngine)) {
                mCurrentEngine = defaultEngine;
                return SUCCESS;
            }
        }

        // Step 3: Try connecting to the highest ranked engine in the
        // system.
        final String highestRanked = mEnginesHelper.getHighestRankedEngineName();
        if (highestRanked != null && !highestRanked.equals(mRequestedEngine) &&
                !highestRanked.equals(defaultEngine)) {
            if (connectToEngine(highestRanked)) {
                mCurrentEngine = highestRanked;
                return SUCCESS;
            }
        }

        // NOTE: The API currently does not allow the caller to query whether
        // they are actually connected to any engine. This might fail for various
        // reasons like if the user disables all her TTS engines.

        mCurrentEngine = null;
        dispatchOnInit(ERROR);
        return ERROR;
    }

我们分析这段代码，可以看到注释写分了三步：
Step 1: Try connecting to the engine that was requested.
Step 2: Try connecting to the user’s default engine.
Step 3: Try connecting to the highest ranked engine in the system.
分别是1：试图连接要求的引擎。2：试图连接用户默认引擎。3：试图连接排名最高的引擎。
那么，谁是要求的引擎呢。我们可以回看TextToSpeech的第二三个构造函数，可以看到参数中可以设置String类型的engine。如果这个参数不为空，则系统会寻找去连接这个Tts引擎。
然后默认的引擎，是通过getDefaultEngine获取的。
从注释中可以理解，这里的默认，类似于系统设置中，如果有多个引擎可以选择，用户选择的那个就是default engine。比如目前的国产手机，系统自带的，手机厂商自带的比如小米、华为的播放引擎，然后用户手动安装的比如讯飞语音输入法等有都有播报功能，用户可设置默认引擎。如果原生系统这个默认的就只有名为"com.svox.pico"的引擎。

   /**
     * @return the default TTS engine. If the user has set a default, and the engine
     *         is available on the device, the default is returned. Otherwise,
     *         the highest ranked engine is returned as per {@link EngineInfoComparator}.
     */
    public String getDefaultEngine() {
        String engine = getString(mContext.getContentResolver(),
                Settings.Secure.TTS_DEFAULT_SYNTH);
        return isEngineInstalled(engine) ? engine : getHighestRankedEngineName();
    }

最后，第三步，连接最高排名的引擎。
getHighestRankedEngineName再调用getEngines

/**
     * Gets a list of all installed TTS engines.
     *
     * @return A list of engine info objects. The list can be empty, but never {@code null}.
     */
    @UnsupportedAppUsage
    public List<EngineInfo> getEngines() {
        PackageManager pm = mContext.getPackageManager();
        Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
        List<ResolveInfo> resolveInfos =
                pm.queryIntentServices(intent, PackageManager.MATCH_DEFAULT_ONLY);
        if (resolveInfos == null) return Collections.emptyList();

        List<EngineInfo> engines = new ArrayList<EngineInfo>(resolveInfos.size());

        for (ResolveInfo resolveInfo : resolveInfos) {
            EngineInfo engine = getEngineInfo(resolveInfo, pm);
            if (engine != null) {
                engines.add(engine);
            }
        }
        Collections.sort(engines, EngineInfoComparator.INSTANCE);

        return engines;
    }

很明显，系统用PackageManager从系统中获取所有应用的intent filter为Intent(Engine.INTENT_ACTION_TTS_SERVICE)的应用，这个就是作为tts引擎才会设置的。
然后找了三步，系统中有tts引擎的话，费了这么多功夫查找，肯定被找到了一个可以连接的引擎，获取到了引擎的名字engine。然后TextToSpeech去bind这个service，也是用Intent(Engine.INTENT_ACTION_TTS_SERVICE)这个intent，这里就是普通的连接service的代码了。

private boolean connectToEngine(String engine) {
        Connection connection = new Connection();
        Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
        intent.setPackage(engine);
        boolean bound = mContext.bindService(intent, connection, Context.BIND_AUTO_CREATE);
        if (!bound) {
            Log.e(TAG, "Failed to bind to " + engine);
            return false;
        } else {
            Log.i(TAG, "Sucessfully bound to " + engine);
            mConnectingServiceConnection = connection;
            return true;
        }

这里的private class Connection implements ServiceConnection，Connection类是继承了原生的ServiceConnection的类，其中实现了一些aidl的回调方法。而且还有内部类SetupConnectionAsyncTask，包含了很多内容，且这个异步任务回调了我们TextToSpeech的demo示例中的onInit方法。通过dispatchOnInit(result);如果连接断开了，则会回调用户dispatchOnInit(ERROR);如果bindservice回调了连接成功，则在onServiceConnected方法中的mService = ITextToSpeechService.Stub.asInterface(service);这个mService就是我们拿到的Tts引擎的Binder接口，通过这个调用实际的引擎方法。
至此，如果连接成功了，我们就可以正常的使用TextToSpeech提供给我们的方法如Speak，stop等方法。
说了半天，其实这个Service连接的其实就是TextToSpeechService。也是Android系统源码提供的，同时也是系统原生Tts引擎继承的Service。

原生Tts Engine分析

我们知道TextToSpeech是通过bind了TextToSpeechService来获取的tts的能力的，那TtsEngine是如何与之联系起来的呢。
系统源码的/external/svox/pico/compat/src/com/android/tts/compat/CompatTtsService.java中可以看到，此类是继承了系统的Service。即public abstract class CompatTtsService extends TextToSpeechService 。同时在其内部实现了部分接口方法。而这个抽象类又被真正的引擎Service继承。
/external/svox/pico/src/com/svox/pico/PicoService.java
public class PicoService extends CompatTtsService
然后实际的工作都在CompatTtsService中把接口工作做了。
private SynthProxy mNativeSynth = null; 这个SynthProxy类实现了getLanguage，isLanguageAvailable，setLanguage，speak，stop，shutdown等方法，所以这个SynthProxy 又是进一步的实现类。

/**
 * The SpeechSynthesis class provides a high-level api to create and play
 * synthesized speech. This class is used internally to talk to a native
 * TTS library that implements the interface defined in
 * frameworks/base/include/tts/TtsEngine.h
 *
 */
public class SynthProxy {

    static {
        System.loadLibrary("ttscompat");
    }

从注释中可以看出最终是JNI的实现，有这个ttscompat的so实现的。

 public int speak(SynthesisRequest request, SynthesisCallback callback) {
        return native_speak(mJniData, request.getText(), callback);
    }
    public void shutdown() {
        native_shutdown(mJniData);
        mJniData = 0;
    }

这里应该可以说，TextToSpeech的实现原理及各个模块都讲完了。那如果定制tts引擎呢。

定制Tts引擎

由于原生TextToSpeech未提供中文的播报能力，即使提供了，在国内环境的网络也是很难使用的，所以很多厂商都会将自己公司的语音播报引擎集成到系统中。那么我们如何也做一个定制的tts引擎呢。
首先自己要准备好可用的tts提供商的sdk，看提供了哪些能力，然后根据能力现状选择方案。比如有些不提供音频透出，那一方案是用不了的。这个根据实际情况确定，离线在线的引擎，讯飞，阿里，百度，腾讯，思必驰，云之声等等。看你能用什么产品

第一种，继承系统TextToSpeechService类，然后实现其中的方法。

当然系统也为我们提供了一个例子
/development/samples/TtsEngine/src/com/example/android/ttsengine/RobotSpeakTtsService.java
public class RobotSpeakTtsService extends TextToSpeechService
当然，需要实现TextToSpeechService中的抽象方法
包括：

protected abstract int onIsLanguageAvailable(String lang, String country, String variant);
protected abstract String[] onGetLanguage();
 protected abstract int onLoadLanguage(String lang, String country, String variant);
 protected abstract void onStop();
 /**
     * Tells the service to synthesize speech from the given text. This method should block until
     * the synthesis is finished. Called on the synthesis thread.
     *
     * @param request The synthesis request.
     * @param callback The callback that the engine must use to make data available for playback or
     *     for writing to a file.
     */
    protected abstract void onSynthesizeText(SynthesisRequest request, SynthesisCallback callback);

最重要的生成的方法，附带了注释，这个是根据提供的文字生成音频，而且会阻塞直到生成结束。根据SynthesisRequest 类型的参数中获取播报参数，并回调状态，通过SynthesisCallback 类型的callback回调给系统。
这里附上刚才系统提供的tts引擎例子的实现代码，由于本地的源码无此类，从在线源码取得的，会有行号，不妨碍阅读。

 @Override
156    protected synchronized void onSynthesizeText(SynthesisRequest request,
157            SynthesisCallback callback) {
158        // Note that we call onLoadLanguage here since there is no guarantee
159        // that there would have been a prior call to this function.
160        int load = onLoadLanguage(request.getLanguage(), request.getCountry(),
161                request.getVariant());
162
163        // We might get requests for a language we don't support - in which case
164        // we error out early before wasting too much time.
165        if (load == TextToSpeech.LANG_NOT_SUPPORTED) {
166            callback.error();
167            return;
168        }
169
170        // At this point, we have loaded the language we need for synthesis and
171        // it is guaranteed that we support it so we proceed with synthesis.
172
173        // We denote that we are ready to start sending audio across to the
174        // framework. We use a fixed sampling rate (16khz), and send data across
175        // in 16bit PCM mono.
176        callback.start(SAMPLING_RATE_HZ,
177                AudioFormat.ENCODING_PCM_16BIT, 1 /* Number of channels. */);
178
179        // We then scan through each character of the request string and
180        // generate audio for it.
181        final String text = request.getText().toLowerCase();
182        for (int i = 0; i < text.length(); ++i) {
183            char value = normalize(text.charAt(i));
184            // It is crucial to call either of callback.error() or callback.done() to ensure
185            // that audio / other resources are released as soon as possible.
186            if (!generateOneSecondOfAudio(value, callback)) {
187                callback.error();
188                return;
189            }
190        }
191
192        // Alright, we're done with our synthesis - yay!
193        callback.done();
194    }
195

可以看到在引擎开始工作前，需要回调 callback.start(SAMPLING_RATE_HZ, AudioFormat.ENCODING_PCM_16BIT, 1 /* Number of channels. */);告诉系统生成音频的采样频率，16位pcm格式音频，单通道。系统收到此回调后则开始等待接收音频数据。并启动播放tts。
generateOneSecondOfAudio是假装的生成一段demo音频，模拟真正的引擎生成，如果生成完成则回调callback.done。
这种方式的优点是实现功能少，且不需对不同Android平台做不同处理。其他接口均按系统原生实现。
缺点是对引擎要求高，且调试麻烦，如果没有对应系统的android源码，出现问题很难进行调试，因为系统的log是不打印的，内部哪里问题很难定位。

第二种，直接取对应系统的TextToSpeech的AIDL接口进行实现。

经过前边的分析我们知道，TextToSpeech是通过bindservice的形式连接引擎的，而Service又是通过AIDL做为接口的。我们可以直接取出对应的AIDL，定制引擎实现服务端，客户端保持不变，当然，服务端的AIDL接口要保持和系统的不变。
这里要实现的有：
/frameworks/base/core/java/android/speech/tts/ITextToSpeechService.aidl
/frameworks/base/core/java/android/speech/tts/ITextToSpeechCallback.aidl
具体如何实现AIDL，这里就不在详细解释了，有一定基础的同学看到这里肯定已经知道思路了。
这种方式的优点是：可定制化程度高，其中暴露的接口都可以根据实际情况进行实现。
缺点是：就是需要实现的方法较多，而且由于Android系统版本的不同，这个aidl接口是有升级改版的，做出来的引擎不会太通用。

当然所有的实现都需要删除系统原生的Tts引擎的基础上的，如果不能拿到系统源码的话，那就只能是前文中提到的规定引擎名字的方法。
另外，最重要的，要让系统的TextToSpeech能搜索到这个定制的引擎，上文中提到的AndroidManifest.xml中这个service的intent-filter是必不可少的，否则不代表这个应用是个tts引擎。
附上系统中的picoservice的配置。

22        <service android:name=".PicoService"
23                  android:label="@string/app_name">
24            <intent-filter>
25                <action android:name="android.intent.action.TTS_SERVICE" />
26                <category android:name="android.intent.category.DEFAULT" />
27            </intent-filter>
28            <meta-data android:name="android.speech.tts" android:resource="@xml/tts_engine" />
29        </service>

好了，本篇文章结束，对你有帮助的同学记得点个赞。有什么问题，可以回复进行讨论。

Zephyr Cai

发布了17 篇原创文章 · 获赞 2 · 访问量 1万+

私信关注