Sophie: festival-2.1-10.1.mga6 x86

festival-2.1-10.1.mga6.x86_64.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 19  Duration</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_18.html">previous</A>, <A HREF="festival_20.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC71" HREF="festival_toc.html#TOC71">19  Duration</A></H1>

<P>
<A NAME="IDX250"></A>
A number of different duration prediction modules are available with
varying levels of sophistication.

</P>
<P>
Segmental duration prediction is done by the module <CODE>Duration</CODE>
which calls different actual methods depending on the parameter
<CODE>Duration_Method</CODE>.

</P>
<P>
<A NAME="IDX251"></A>
All of the following duration methods may be further affected by both a
global duration stretch and a per word one.

</P>
<P>
If the parameter <CODE>Duration_Stretch</CODE> is set, all absolute durations
predicted by any of the duration methods described here are multiplied by
the parameter's value.  For example

<PRE>
(Parameter.set 'Duration_Stretch 1.2)
</PRE>

<P>
will make everything speak more slowly.

</P>
<P>
<A NAME="IDX252"></A>
In addition to the global stretch method, if the feature
<CODE>dur_stretch</CODE> on the related <CODE>Token</CODE> is set it will also be
used as a multiplicative factor on the duration produced by the selected
method.  That is <CODE>R:Syllable.parent.parent.R:Token.parent.dur_stretch</CODE>.
There is a lisp function <CODE>duration_find_stretch</CODE> wchi will return
the combined gloabel and local duration stretch factor for a given
segment item.

</P>
<P>
Note these global and local methods of affecting the duration produced
by models are crude and should be considered hacks.  Uniform
modification of durations is not what happens in real speech.  These
parameters are typically used when the underlying duration method is
lacking in some way.  However these can be useful.

</P>
<P>
Note it is quite easy to implement new duration methods in Scheme
directly.

</P>



<H2><A NAME="SEC72" HREF="festival_toc.html#TOC72">19.1  Default durations</A></H2>

<P>
<A NAME="IDX253"></A>
If parameter <CODE>Duration_Method</CODE> is set to <CODE>Default</CODE>, the
simplest duration model is used.  All segments are 100 milliseconds
(this can be modified by <CODE>Duration_Stretch</CODE>, and/or the localised
Token related <CODE>dur_stretch</CODE> feature).

</P>


<H2><A NAME="SEC73" HREF="festival_toc.html#TOC73">19.2  Average durations</A></H2>

<P>
If parameter <CODE>Duration_Method</CODE> is set to <CODE>Averages</CODE>
then segmental durations are set to their averages.  The variable
<CODE>phoneme_durations</CODE> should be an a-list of phones and averages
in seconds.  The file <TT>`lib/mrpa_durs.scm'</TT> has an example for
the mrpa phoneset.

</P>
<P>
If a segment is found that does not appear in the list a default
duration of 0.1 seconds is assigned, and a warning message generated.

</P>


<H2><A NAME="SEC74" HREF="festival_toc.html#TOC74">19.3  Klatt durations</A></H2>

<P>
<A NAME="IDX254"></A>
If parameter <CODE>Duration_Method</CODE> is set to <CODE>Klatt</CODE> the duration
rules from the Klatt book (<CITE>allen87</CITE>, chapter 9).  This method
requires minimum and inherent durations for each phoneme in the
phoneset.  This information is held in the variable
<CODE>duration_klatt_params</CODE>.  Each member of this list is a
three-tuple, of phone name, inherent duration and minimum duration.  An
example for the mrpa phoneset is in <TT>`lib/klatt_durs.scm'</TT>.

</P>


<H2><A NAME="SEC75" HREF="festival_toc.html#TOC75">19.4  CART durations</A></H2>

<P>
Two very similar methods of duration prediction by CART tree
are supported.  The first, used when parameter <CODE>Duration_Method</CODE>
is <CODE>Tree</CODE> simply predicts durations directly for each segment.
The tree is set in the variable <CODE>duration_cart_tree</CODE>.

</P>
<P>
The second, which seems to give better results, is used when parameter
<CODE>Duration_Method</CODE> is <CODE>Tree_ZScores</CODE>. In this second model the
tree predicts zscores (number of standard deviations from the mean)
rather than duration directly.  (This follows <CITE>campbell91</CITE>, but we
don't deal in syllable durations here.)  This method requires means and
standard deviations for each phone.  The variable
<CODE>duration_cart_tree</CODE> should contain the zscore prediction tree and
the variable <CODE>duration_ph_info</CODE> should contain a list of phone,
mean duration, and standard deviation for each phone in the phoneset.

</P>
<P>
An example tree trained from 460 sentences spoken by Gordon is
in <TT>`lib/gswdurtreeZ'</TT>.  Phone means and standard deviations
are in <TT>`lib/gsw_durs.scm'</TT>.

</P>
<P>
After prediction the segmental duration is calculated by
the simple formula

<PRE>
duration = mean + (zscore * standard deviation)
</PRE>

<P>
For some other duration models that affect an inherent duration by
some factor this method has been used.  If the tree predicts factors
rather than zscores and the <CODE>duration_ph_info</CODE> entries
are phone, 0.0, inherent duration. The above formula will generate the
desired result.  Klatt and Klatt-like rules can be implemented in the
this way without adding a new method.

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_18.html">previous</A>, <A HREF="festival_20.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>