Sophie: festival-2.1-10.1.mga6 x86

festival-2.1-10.1.mga6.x86_64.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 18  Intonation</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_17.html">previous</A>, <A HREF="festival_19.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC64" HREF="festival_toc.html#TOC64">18  Intonation</A></H1>

<P>
<A NAME="IDX245"></A>
A number of different intonation modules are available with
varying levels of control.  In general intonation is generated
in two steps.

<OL>
<LI>Prediction of accents (and/or end tones) on a per

syllable basis.
<LI>Prediction of F0 target values, this must be done after

durations are predicted.
</OL>

<P>
Reflecting this split there are two main intonation modules that call
sub-modules depending on the desired intonation methods.  The
<CODE>Intonation</CODE> and <CODE>Int_Targets</CODE> modules are defined in Lisp
(<TT>`lib/intonation.scm'</TT>) and call sub-modules which are (so far) in
C++.

</P>



<H2><A NAME="SEC65" HREF="festival_toc.html#TOC65">18.1  Default intonation</A></H2>

<P>
<A NAME="IDX246"></A>
<A NAME="IDX247"></A>
This is the simplest form of intonation and offers the modules
<CODE>Intonation_Default</CODE> and <CODE>Intonation_Targets_Default</CODE>.  The
first of which actually does nothing at all.
<CODE>Intonation_Targets_Default</CODE> simply creates a target at the start
of the utterance, and one at the end.  The values of which, by default
are 130 Hz and 110 Hz.  These values may be set through the 
parameter <CODE>duffint_params</CODE> for example the following will
general a monotone at 150Hz.

<PRE>
(set! duffint_params '((start 150) (end 150)))
(Parameter.set 'Int_Method 'DuffInt)
(Parameter.set 'Int_Target_Method Int_Targets_Default)
</PRE>



<H2><A NAME="SEC66" HREF="festival_toc.html#TOC66">18.2  Simple intonation</A></H2>

<P>
<A NAME="IDX248"></A>
This module uses the CART tree in <CODE>int_accent_cart_tree</CODE> to predict
if each syllable is accented or not.  A predicted value of <CODE>NONE</CODE>
means no accent is generated by the corresponding <CODE>Int_Targets_Simple</CODE>
function.  Any other predicted value will cause a `hat' accent to be
put on that syllable.

</P>
<P>
A default <CODE>int_accent_cart_tree</CODE> is available in the value
<CODE>simple_accent_cart_tree</CODE> in <TT>`lib/intonation.scm'</TT>.  It simply
predicts accents on the stressed syllables on content words in
poly-syllabic words, and on the only syllable in single syllable content
words.  Its form is

<PRE>
(set! simple_accent_cart_tree
 '
  ((R:SylStructure.parent.gpos is content)
   ((stress is 1)
    ((Accented))
    ((position_type is single)
     ((Accented))
     ((NONE))))
   ((NONE))))
</PRE>

<P>
The function <CODE>Int_Targets_Simple</CODE> uses parameters in the a-list
in variable <CODE>int_simple_params</CODE>.  There are two interesting
parameters <CODE>f0_mean</CODE> which gives the mean F0 for this speaker
(default 110 Hz) and <CODE>f0_std</CODE> is the standard deviation of
F0 for this speaker (default 25 Hz).  This second value is used
to determine the amount of variation to be put in the generated
targets.

</P>
<P>
<A NAME="IDX249"></A>
For each Phrase in the given utterance an F0 is generated starting at
<CODE>f0_code+(f0_std*0.6)</CODE> and declines <CODE>f0_std</CODE> Hz over the
length of the phrase until the last syllable whose end is set to
<CODE>f0_code-f0_std</CODE>.  An imaginary line called <CODE>baseline</CODE> is
drawn from start to the end (minus the final extra fall), For each
syllable that is accented (i.e. has an IntEvent related to it) three
targets are added.  One at the start, one in mid vowel, and one at the
end.  The start and end are at position <CODE>baseline</CODE> Hz (as declined
for that syllable) and the mid vowel is set to <CODE>baseline+f0_std</CODE>.

</P>
<P>
Note this model is not supposed to be complex or comprehensive but it
offers a very quick and easy way to generate something other than a
fixed line F0.  Something similar to this has been for Spanish and Welsh
without (too many) people complaining.  However it is not designed as a
serious intonation module.

</P>


<H2><A NAME="SEC67" HREF="festival_toc.html#TOC67">18.3  Tree intonation</A></H2>

<P>
This module is more flexible.  Two different CART trees can be used to
predict `accents' and `endtones'.  Although at present this module is
used for an implementation of the ToBI intonation labelling system it
could be used for many different types of intonation system.

</P>
<P>
The target module for this method uses a Linear Regression model to
predict start mid-vowel and end targets for each syllable using
arbitrarily specified features.  This follows the work described in
<CITE>black96</CITE>.  The LR models are held as as described below
See section <A HREF="festival_25.html#SEC115">25.5  Linear regression</A>.  Three models are used in the variables
<CODE>f0_lr_start</CODE>, <CODE>f0_lr_mid</CODE> and <CODE>f0_lr_end</CODE>.

</P>


<H2><A NAME="SEC68" HREF="festival_toc.html#TOC68">18.4  Tilt intonation</A></H2>

<P>
Tilt description to be inserted.

</P>


<H2><A NAME="SEC69" HREF="festival_toc.html#TOC69">18.5  General intonation</A></H2>

<P>
As there seems to be a number of intonation theories that predict
F0 contours by rule (possibly using trained parameters) this
module aids the external specification of such rules for a wide
class of intonation theories (through primarily those that might
be referred to as the ToBI group).  This is designed to be multi-lingual
and offer a quick way to port often pre-existing rules into Festival
without writing new C++ code.

</P>
<P>
The accent prediction part uses the same mechanisms as the Simple
intonation method described above, a decision tree for
accent prediction, thus the tree in the variable
<CODE>int_accent_cart_tree</CODE> is used on each syllable to predict
an <CODE>IntEvent</CODE>.

</P>
<P>
The target part calls a specified Scheme function which returns
a list of target points for a syllable.  In this way any arbitrary
tests may be done to produce the target points.  For example
here is a function which returns three target points
for each syllable with an <CODE>IntEvent</CODE> related to it (i.e. 
accented syllables).

<PRE>
(define (targ_func1 utt syl)
  "(targ_func1 UTT STREAMITEM)
Returns a list of targets for the given syllable."
  (let ((start (item.feat syl 'syllable_start))
        (end (item.feat syl 'syllable_end)))
    (if (equal? (item.feat syl "R:Intonation.daughter1.name") "Accented")
        (list
         (list start 110)
         (list (/ (+ start end) 2.0) 140)
         (list end 100)))))
</PRE>

<P>
This function may be identified as the function to call by
the following setup parameters.

<PRE>
(Parameter.set 'Int_Method 'General)
(Parameter.set 'Int_Target_Method Int_Targets_General)

(set! int_general_params
      (list 
       (list 'targ_func targ_func1)))
</PRE>



<H2><A NAME="SEC70" HREF="festival_toc.html#TOC70">18.6  Using ToBI</A></H2>

<P>
An example implementation of a ToBI to F0 target module is included in
<TT>`lib/tobi_rules.scm'</TT> based on the rules described in <CITE>jilka96</CITE>.
This uses the general intonation method discussed in the previous
section.  This is designed to be useful to people who are experimenting
with ToBI (<CITE>silverman92</CITE>), rather than general text to speech.

</P>
<P>
To use this method you need to load <TT>`lib/tobi_rules.scm'</TT> and
call <CODE>setup_tobi_f0_method</CODE>.  The default is in a male's
pitch range, i.e. for <CODE>voice_rab_diphone</CODE>.  You can change
it for other pitch ranges by changing the folwoing variables.

<PRE>
(Parameter.set 'Default_Topline 110)
(Parameter.set 'Default_Start_Baseline 87)
(Parameter.set 'Default_End_Baseline 83)
(Parameter.set 'Current_Topline (Parameter.get 'Default_Topline))
(Parameter.set 'Valley_Dip 75)
</PRE>

<P>
An example using this from STML is given in <TT>`examples/tobi.stml'</TT>.
But it can also be used from Scheme.  For example before
defining an utterance you should execute the following either
from teh command line on in some setup file

<PRE>
(voice_rab_diphone)
(require 'tobi_rules)
(setup_tobi_f0_method)
</PRE>

<P>
In order to allow specification of accents, tones, and break levels
you must use an utterance type that allows such specification.  For
example

<PRE>
(Utterance 
 Words
 (boy
  (saw ((accent H*)))
   the
   (girl ((accent H*)))
   in the 
   (park ((accent H*) (tone H-)))
   with the 
   (telescope ((accent H*) (tone H-H%)))))

(Utterance Words 
 (The
  (boy ((accent L*)))
  saw
  the
  (girl ((accent H*) (tone L-)))
  with 
  the
  (telescope ((accent H*) (tone H-H%))))))
</PRE>

<P>
You can display the the synthesized form of these utterance in
Xwaves.  Start an Xwaves and an Xlabeller and call the function
<CODE>display</CODE> on the synthesized utterance.

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_17.html">previous</A>, <A HREF="festival_19.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>