<HTML> <HEAD> <!-- This HTML file has been created by texi2html 1.52 from ../festival.texi on 2 August 2001 --> <TITLE>Festival Speech Synthesis System - 19 Duration</TITLE> </HEAD> <BODY bgcolor="#ffffff"> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_18.html">previous</A>, <A HREF="festival_20.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. <P><HR><P> <H1><A NAME="SEC71" HREF="festival_toc.html#TOC71">19 Duration</A></H1> <P> <A NAME="IDX250"></A> A number of different duration prediction modules are available with varying levels of sophistication. </P> <P> Segmental duration prediction is done by the module <CODE>Duration</CODE> which calls different actual methods depending on the parameter <CODE>Duration_Method</CODE>. </P> <P> <A NAME="IDX251"></A> All of the following duration methods may be further affected by both a global duration stretch and a per word one. </P> <P> If the parameter <CODE>Duration_Stretch</CODE> is set, all absolute durations predicted by any of the duration methods described here are multiplied by the parameter's value. For example <PRE> (Parameter.set 'Duration_Stretch 1.2) </PRE> <P> will make everything speak more slowly. </P> <P> <A NAME="IDX252"></A> In addition to the global stretch method, if the feature <CODE>dur_stretch</CODE> on the related <CODE>Token</CODE> is set it will also be used as a multiplicative factor on the duration produced by the selected method. That is <CODE>R:Syllable.parent.parent.R:Token.parent.dur_stretch</CODE>. There is a lisp function <CODE>duration_find_stretch</CODE> wchi will return the combined gloabel and local duration stretch factor for a given segment item. </P> <P> Note these global and local methods of affecting the duration produced by models are crude and should be considered hacks. Uniform modification of durations is not what happens in real speech. These parameters are typically used when the underlying duration method is lacking in some way. However these can be useful. </P> <P> Note it is quite easy to implement new duration methods in Scheme directly. </P> <H2><A NAME="SEC72" HREF="festival_toc.html#TOC72">19.1 Default durations</A></H2> <P> <A NAME="IDX253"></A> If parameter <CODE>Duration_Method</CODE> is set to <CODE>Default</CODE>, the simplest duration model is used. All segments are 100 milliseconds (this can be modified by <CODE>Duration_Stretch</CODE>, and/or the localised Token related <CODE>dur_stretch</CODE> feature). </P> <H2><A NAME="SEC73" HREF="festival_toc.html#TOC73">19.2 Average durations</A></H2> <P> If parameter <CODE>Duration_Method</CODE> is set to <CODE>Averages</CODE> then segmental durations are set to their averages. The variable <CODE>phoneme_durations</CODE> should be an a-list of phones and averages in seconds. The file <TT>`lib/mrpa_durs.scm'</TT> has an example for the mrpa phoneset. </P> <P> If a segment is found that does not appear in the list a default duration of 0.1 seconds is assigned, and a warning message generated. </P> <H2><A NAME="SEC74" HREF="festival_toc.html#TOC74">19.3 Klatt durations</A></H2> <P> <A NAME="IDX254"></A> If parameter <CODE>Duration_Method</CODE> is set to <CODE>Klatt</CODE> the duration rules from the Klatt book (<CITE>allen87</CITE>, chapter 9). This method requires minimum and inherent durations for each phoneme in the phoneset. This information is held in the variable <CODE>duration_klatt_params</CODE>. Each member of this list is a three-tuple, of phone name, inherent duration and minimum duration. An example for the mrpa phoneset is in <TT>`lib/klatt_durs.scm'</TT>. </P> <H2><A NAME="SEC75" HREF="festival_toc.html#TOC75">19.4 CART durations</A></H2> <P> Two very similar methods of duration prediction by CART tree are supported. The first, used when parameter <CODE>Duration_Method</CODE> is <CODE>Tree</CODE> simply predicts durations directly for each segment. The tree is set in the variable <CODE>duration_cart_tree</CODE>. </P> <P> The second, which seems to give better results, is used when parameter <CODE>Duration_Method</CODE> is <CODE>Tree_ZScores</CODE>. In this second model the tree predicts zscores (number of standard deviations from the mean) rather than duration directly. (This follows <CITE>campbell91</CITE>, but we don't deal in syllable durations here.) This method requires means and standard deviations for each phone. The variable <CODE>duration_cart_tree</CODE> should contain the zscore prediction tree and the variable <CODE>duration_ph_info</CODE> should contain a list of phone, mean duration, and standard deviation for each phone in the phoneset. </P> <P> An example tree trained from 460 sentences spoken by Gordon is in <TT>`lib/gswdurtreeZ'</TT>. Phone means and standard deviations are in <TT>`lib/gsw_durs.scm'</TT>. </P> <P> After prediction the segmental duration is calculated by the simple formula <PRE> duration = mean + (zscore * standard deviation) </PRE> <P> For some other duration models that affect an inherent duration by some factor this method has been used. If the tree predicts factors rather than zscores and the <CODE>duration_ph_info</CODE> entries are phone, 0.0, inherent duration. The above formula will generate the desired result. Klatt and Klatt-like rules can be implemented in the this way without adding a new method. </P> <P><HR><P> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_18.html">previous</A>, <A HREF="festival_20.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. </BODY> </HTML>