<HTML> <HEAD> <!-- This HTML file has been created by texi2html 1.52 from ../festival.texi on 2 August 2001 --> <TITLE>Festival Speech Synthesis System - 18 Intonation</TITLE> </HEAD> <BODY bgcolor="#ffffff"> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_17.html">previous</A>, <A HREF="festival_19.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. <P><HR><P> <H1><A NAME="SEC64" HREF="festival_toc.html#TOC64">18 Intonation</A></H1> <P> <A NAME="IDX245"></A> A number of different intonation modules are available with varying levels of control. In general intonation is generated in two steps. <OL> <LI>Prediction of accents (and/or end tones) on a per syllable basis. <LI>Prediction of F0 target values, this must be done after durations are predicted. </OL> <P> Reflecting this split there are two main intonation modules that call sub-modules depending on the desired intonation methods. The <CODE>Intonation</CODE> and <CODE>Int_Targets</CODE> modules are defined in Lisp (<TT>`lib/intonation.scm'</TT>) and call sub-modules which are (so far) in C++. </P> <H2><A NAME="SEC65" HREF="festival_toc.html#TOC65">18.1 Default intonation</A></H2> <P> <A NAME="IDX246"></A> <A NAME="IDX247"></A> This is the simplest form of intonation and offers the modules <CODE>Intonation_Default</CODE> and <CODE>Intonation_Targets_Default</CODE>. The first of which actually does nothing at all. <CODE>Intonation_Targets_Default</CODE> simply creates a target at the start of the utterance, and one at the end. The values of which, by default are 130 Hz and 110 Hz. These values may be set through the parameter <CODE>duffint_params</CODE> for example the following will general a monotone at 150Hz. <PRE> (set! duffint_params '((start 150) (end 150))) (Parameter.set 'Int_Method 'DuffInt) (Parameter.set 'Int_Target_Method Int_Targets_Default) </PRE> <H2><A NAME="SEC66" HREF="festival_toc.html#TOC66">18.2 Simple intonation</A></H2> <P> <A NAME="IDX248"></A> This module uses the CART tree in <CODE>int_accent_cart_tree</CODE> to predict if each syllable is accented or not. A predicted value of <CODE>NONE</CODE> means no accent is generated by the corresponding <CODE>Int_Targets_Simple</CODE> function. Any other predicted value will cause a `hat' accent to be put on that syllable. </P> <P> A default <CODE>int_accent_cart_tree</CODE> is available in the value <CODE>simple_accent_cart_tree</CODE> in <TT>`lib/intonation.scm'</TT>. It simply predicts accents on the stressed syllables on content words in poly-syllabic words, and on the only syllable in single syllable content words. Its form is <PRE> (set! simple_accent_cart_tree ' ((R:SylStructure.parent.gpos is content) ((stress is 1) ((Accented)) ((position_type is single) ((Accented)) ((NONE)))) ((NONE)))) </PRE> <P> The function <CODE>Int_Targets_Simple</CODE> uses parameters in the a-list in variable <CODE>int_simple_params</CODE>. There are two interesting parameters <CODE>f0_mean</CODE> which gives the mean F0 for this speaker (default 110 Hz) and <CODE>f0_std</CODE> is the standard deviation of F0 for this speaker (default 25 Hz). This second value is used to determine the amount of variation to be put in the generated targets. </P> <P> <A NAME="IDX249"></A> For each Phrase in the given utterance an F0 is generated starting at <CODE>f0_code+(f0_std*0.6)</CODE> and declines <CODE>f0_std</CODE> Hz over the length of the phrase until the last syllable whose end is set to <CODE>f0_code-f0_std</CODE>. An imaginary line called <CODE>baseline</CODE> is drawn from start to the end (minus the final extra fall), For each syllable that is accented (i.e. has an IntEvent related to it) three targets are added. One at the start, one in mid vowel, and one at the end. The start and end are at position <CODE>baseline</CODE> Hz (as declined for that syllable) and the mid vowel is set to <CODE>baseline+f0_std</CODE>. </P> <P> Note this model is not supposed to be complex or comprehensive but it offers a very quick and easy way to generate something other than a fixed line F0. Something similar to this has been for Spanish and Welsh without (too many) people complaining. However it is not designed as a serious intonation module. </P> <H2><A NAME="SEC67" HREF="festival_toc.html#TOC67">18.3 Tree intonation</A></H2> <P> This module is more flexible. Two different CART trees can be used to predict `accents' and `endtones'. Although at present this module is used for an implementation of the ToBI intonation labelling system it could be used for many different types of intonation system. </P> <P> The target module for this method uses a Linear Regression model to predict start mid-vowel and end targets for each syllable using arbitrarily specified features. This follows the work described in <CITE>black96</CITE>. The LR models are held as as described below See section <A HREF="festival_25.html#SEC115">25.5 Linear regression</A>. Three models are used in the variables <CODE>f0_lr_start</CODE>, <CODE>f0_lr_mid</CODE> and <CODE>f0_lr_end</CODE>. </P> <H2><A NAME="SEC68" HREF="festival_toc.html#TOC68">18.4 Tilt intonation</A></H2> <P> Tilt description to be inserted. </P> <H2><A NAME="SEC69" HREF="festival_toc.html#TOC69">18.5 General intonation</A></H2> <P> As there seems to be a number of intonation theories that predict F0 contours by rule (possibly using trained parameters) this module aids the external specification of such rules for a wide class of intonation theories (through primarily those that might be referred to as the ToBI group). This is designed to be multi-lingual and offer a quick way to port often pre-existing rules into Festival without writing new C++ code. </P> <P> The accent prediction part uses the same mechanisms as the Simple intonation method described above, a decision tree for accent prediction, thus the tree in the variable <CODE>int_accent_cart_tree</CODE> is used on each syllable to predict an <CODE>IntEvent</CODE>. </P> <P> The target part calls a specified Scheme function which returns a list of target points for a syllable. In this way any arbitrary tests may be done to produce the target points. For example here is a function which returns three target points for each syllable with an <CODE>IntEvent</CODE> related to it (i.e. accented syllables). <PRE> (define (targ_func1 utt syl) "(targ_func1 UTT STREAMITEM) Returns a list of targets for the given syllable." (let ((start (item.feat syl 'syllable_start)) (end (item.feat syl 'syllable_end))) (if (equal? (item.feat syl "R:Intonation.daughter1.name") "Accented") (list (list start 110) (list (/ (+ start end) 2.0) 140) (list end 100))))) </PRE> <P> This function may be identified as the function to call by the following setup parameters. <PRE> (Parameter.set 'Int_Method 'General) (Parameter.set 'Int_Target_Method Int_Targets_General) (set! int_general_params (list (list 'targ_func targ_func1))) </PRE> <H2><A NAME="SEC70" HREF="festival_toc.html#TOC70">18.6 Using ToBI</A></H2> <P> An example implementation of a ToBI to F0 target module is included in <TT>`lib/tobi_rules.scm'</TT> based on the rules described in <CITE>jilka96</CITE>. This uses the general intonation method discussed in the previous section. This is designed to be useful to people who are experimenting with ToBI (<CITE>silverman92</CITE>), rather than general text to speech. </P> <P> To use this method you need to load <TT>`lib/tobi_rules.scm'</TT> and call <CODE>setup_tobi_f0_method</CODE>. The default is in a male's pitch range, i.e. for <CODE>voice_rab_diphone</CODE>. You can change it for other pitch ranges by changing the folwoing variables. <PRE> (Parameter.set 'Default_Topline 110) (Parameter.set 'Default_Start_Baseline 87) (Parameter.set 'Default_End_Baseline 83) (Parameter.set 'Current_Topline (Parameter.get 'Default_Topline)) (Parameter.set 'Valley_Dip 75) </PRE> <P> An example using this from STML is given in <TT>`examples/tobi.stml'</TT>. But it can also be used from Scheme. For example before defining an utterance you should execute the following either from teh command line on in some setup file <PRE> (voice_rab_diphone) (require 'tobi_rules) (setup_tobi_f0_method) </PRE> <P> In order to allow specification of accents, tones, and break levels you must use an utterance type that allows such specification. For example <PRE> (Utterance Words (boy (saw ((accent H*))) the (girl ((accent H*))) in the (park ((accent H*) (tone H-))) with the (telescope ((accent H*) (tone H-H%))))) (Utterance Words (The (boy ((accent L*))) saw the (girl ((accent H*) (tone L-))) with the (telescope ((accent H*) (tone H-H%)))))) </PRE> <P> You can display the the synthesized form of these utterance in Xwaves. Start an Xwaves and an Xlabeller and call the function <CODE>display</CODE> on the synthesized utterance. </P> <P><HR><P> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_17.html">previous</A>, <A HREF="festival_19.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. </BODY> </HTML>