Text-to-speech: Microsoft Speech Synthesis Markup Language (SSML) text structures and events

The speech service implementation of SSML is based on the World Wide Web Consortium's Speech Synthesis Markup Language version 1.0.​ Elements supported by Speech Services may differ from W3C standards.

Each SSML document is created using SSML elements (or tags). These elements are used to adjust voice, style, syllables, rhythm, volume, and more.

The following is a subset of the basic structure and syntax of an SSML document:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="string">
    <mstts:backgroundaudio src="string" volume="string" fadein="string" fadeout="string"/>
    <voice name="string" effect="string">
        <audio src="string"></audio>
        <bookmark mark="string"/>
        <break strength="string" time="string" />
        <emphasis level="value"></emphasis>
        <lang xml:lang="string"></lang>
        <lexicon uri="string"/>
        <math xmlns="http://www.w3.org/1998/Math/MathML"></math>
        <mstts:audioduration value="string"/>
        <mstts:express-as style="string" styledegree="value" role="string"></mstts:express-as>
        <mstts:silence type="string" value="string"/>
        <mstts:viseme type="string"/>
        <p></p>
        <phoneme alphabet="string" ph="string"></phoneme>
        <prosody pitch="value" contour="value" range="value" rate="value" volume="value"></prosody>
        <s></s>
        <say-as interpret-as="string" format="string" detail="string"></say-as>
        <sub alias="string"></sub>
    </voice>
</speak>

The following list describes some examples of content allowed in each element:

  • audio: If the audio file is not available or playable, include narrated plain text or SSML markup in the body of the audio element. The audio element also contains text and the following elements: audio, break, p, .  and , , s, phonemeprosodysay-assub
  • bookmark: This element cannot contain text or any other elements.
  • break: This element cannot contain text or any other elements.
  • emphasis: This element can include the following elements: audio, break, emphasis, < a i=4>, , ,  和 . langphonemeprosodysay-assub
  • lang: This element can contain all other elements except mstts:backgroundaudio, voice , and speak .
  • lexicon: This element cannot contain text or any other elements.
  • math: This element can only contain text and MathML elements.
  • mstts:audioduration: This element cannot contain text or any other elements.
  • mstts:backgroundaudio: This element cannot contain text or any other elements.
  • mstts:express-as: This element can include the following elements: audio, break, emphasis, < a i=4>, , ,  和 . langphonemeprosodysay-assub
  • mstts:silence: This element cannot contain text or any other elements.
  • mstts:viseme: This element cannot contain text or any other elements.
  • p: This element can include the following elements: audio, break, phoneme, < a i=4>, , ,  和 . prosodysay-assubmstts:express-ass
  • phoneme: This element can only contain text and not any other elements.
  • prosody: This element can include the following elements: audio, break, p, < a i=4>, , ,  和 . phonemeprosodysay-assubs
  • s: This element can include the following elements: audio, break, phoneme, < a i=4>,, 和 . prosodysay-asmstts:express-assub
  • say-as: This element can only contain text and not any other elements.
  • sub: This element can only contain text and not any other elements.
  • speak: The root element of the SSML document. This element can contain the following elements: mstts:backgroundaudio and voice.
  • voice: This element can contain all other elements except mstts:backgroundaudio and speak .

The speech service can automatically handle pauses appropriately (for example, pausing for a moment after a period) or using the correct pitch in a sentence that ends with a question mark.

add pause

Use the break element to override the default break or pause behavior between words. You can use this to add pauses that the speech service would otherwise insert automatically. The following table describes the usage of attributes of the break element.

 

Attributes illustrate Required or optional
strength Specify the relative duration of the pause using one of the following values:
  • x-weak
  • weak
  • medium (default)
  • strong
  • x-strong
Optional
time The absolute duration of the pause, in seconds (e.g. 2s) or in milliseconds (e.g. 500ms). Valid values ​​range from 0 to 5000 milliseconds. If set to a value greater than the maximum supported value, the service will use 5000ms. If the time attribute is set, the strength attribute will be ignored. Optional

Here are more details about the strength attribute.

Strength relative duration
x-weak 250 milliseconds
weak 500 milliseconds
medium size 750 milliseconds
powerful 1,000 milliseconds
x-strong 1,250 milliseconds

Guess you like

Origin blog.csdn.net/ffffffff8/article/details/134635144