The speech service implementation of SSML is based on the World Wide Web Consortium's Speech Synthesis Markup Language version 1.0. Elements supported by Speech Services may differ from W3C standards.
Each SSML document is created using SSML elements (or tags). These elements are used to adjust voice, style, syllables, rhythm, volume, and more.
The following is a subset of the basic structure and syntax of an SSML document:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="string">
<mstts:backgroundaudio src="string" volume="string" fadein="string" fadeout="string"/>
<voice name="string" effect="string">
<audio src="string"></audio>
<bookmark mark="string"/>
<break strength="string" time="string" />
<emphasis level="value"></emphasis>
<lang xml:lang="string"></lang>
<lexicon uri="string"/>
<math xmlns="http://www.w3.org/1998/Math/MathML"></math>
<mstts:audioduration value="string"/>
<mstts:express-as style="string" styledegree="value" role="string"></mstts:express-as>
<mstts:silence type="string" value="string"/>
<mstts:viseme type="string"/>
<p></p>
<phoneme alphabet="string" ph="string"></phoneme>
<prosody pitch="value" contour="value" range="value" rate="value" volume="value"></prosody>
<s></s>
<say-as interpret-as="string" format="string" detail="string"></say-as>
<sub alias="string"></sub>
</voice>
</speak>
The following list describes some examples of content allowed in each element:
audio
: If the audio file is not available or playable, include narrated plain text or SSML markup in the body of theaudio
element. Theaudio
element also contains text and the following elements:audio
,break
,p
, . and , ,s
,phoneme
prosody
say-as
sub
bookmark
: This element cannot contain text or any other elements.break
: This element cannot contain text or any other elements.emphasis
: This element can include the following elements:audio
,break
,emphasis
, < a i=4>, , , 和 .lang
phoneme
prosody
say-as
sub
lang
: This element can contain all other elements exceptmstts:backgroundaudio
,voice
, andspeak
.lexicon
: This element cannot contain text or any other elements.math
: This element can only contain text and MathML elements.mstts:audioduration
: This element cannot contain text or any other elements.mstts:backgroundaudio
: This element cannot contain text or any other elements.mstts:express-as
: This element can include the following elements:audio
,break
,emphasis
, < a i=4>, , , 和 .lang
phoneme
prosody
say-as
sub
mstts:silence
: This element cannot contain text or any other elements.mstts:viseme
: This element cannot contain text or any other elements.p
: This element can include the following elements:audio
,break
,phoneme
, < a i=4>, , , 和 .prosody
say-as
sub
mstts:express-as
s
phoneme
: This element can only contain text and not any other elements.prosody
: This element can include the following elements:audio
,break
,p
, < a i=4>, , , 和 .phoneme
prosody
say-as
sub
s
s
: This element can include the following elements:audio
,break
,phoneme
, < a i=4>,, 和 .prosody
say-as
mstts:express-as
sub
say-as
: This element can only contain text and not any other elements.sub
: This element can only contain text and not any other elements.speak
: The root element of the SSML document. This element can contain the following elements:mstts:backgroundaudio
andvoice
.voice
: This element can contain all other elements exceptmstts:backgroundaudio
andspeak
.
The speech service can automatically handle pauses appropriately (for example, pausing for a moment after a period) or using the correct pitch in a sentence that ends with a question mark.
add pause
Use the break
element to override the default break or pause behavior between words. You can use this to add pauses that the speech service would otherwise insert automatically. The following table describes the usage of attributes of the break
element.
Attributes | illustrate | Required or optional |
---|---|---|
strength |
Specify the relative duration of the pause using one of the following values:
|
Optional |
time |
The absolute duration of the pause, in seconds (e.g. 2s ) or in milliseconds (e.g. 500ms ). Valid values range from 0 to 5000 milliseconds. If set to a value greater than the maximum supported value, the service will use 5000ms . If the time attribute is set, the strength attribute will be ignored. |
Optional |
Here are more details about the strength
attribute.
Strength | relative duration |
---|---|
x-weak | 250 milliseconds |
weak | 500 milliseconds |
medium size | 750 milliseconds |
powerful | 1,000 milliseconds |
x-strong | 1,250 milliseconds |