<HTML> <HEAD> <!-- This HTML file has been created by texi2html 1.52 from ../festival.texi on 2 August 2001 --> <TITLE>Festival Speech Synthesis System - 14 Utterances</TITLE> </HEAD> <BODY bgcolor="#ffffff"> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_13.html">previous</A>, <A HREF="festival_15.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. <P><HR><P> <H1><A NAME="SEC48" HREF="festival_toc.html#TOC48">14 Utterances</A></H1> <P> <A NAME="IDX187"></A> The utterance structure lies at the heart of Festival. This chapter describes its basic form and the functions available to manipulate it. </P> <H2><A NAME="SEC49" HREF="festival_toc.html#TOC49">14.1 Utterance structure</A></H2> <P> <A NAME="IDX188"></A> <A NAME="IDX189"></A> Festival's basic object for synthesis is the <EM>utterance</EM>. An represents some chunk of text that is to be rendered as speech. In general you may think of it as a sentence but in many cases it wont actually conform to the standard linguistic syntactic form of a sentence. In general the process of text to speech is to take an utterance which contaisn a simple string of characters and convert it step by step, filling out the utterance structure with more information until a waveform is built that says what the text contains. </P> <P> The processes involved in convertion are, in general, as follows <DL COMPACT> <DT><EM>Tokenization</EM> <DD> Converting the string of characters into a list of tokens. Typically this means whitespace separated tokesn of the original text string. <DT><EM>Token identification</EM> <DD> identification of general types for the tokens, usually this is trivial but requires some work to identify tokens of digits as years, dates, numbers etc. <DT><EM>Token to word</EM> <DD> Convert each tokens to zero or more words, expanding numbers, abbreviations etc. <DT><EM>Part of speech</EM> <DD> Identify the syntactic part of speech for the words. <DT><EM>Prosodic phrasing</EM> <DD> Chunk utterance into prosodic phrases. <DT><EM>Lexical lookup</EM> <DD> Find the pronucnation of each word from a lexicon/letter to sound rule system including phonetic and syllable structure. <DT><EM>Intonational accents</EM> <DD> Assign intonation accents to approrpiate syllables. <DT><EM>Assign duration</EM> <DD> Assign duration to each phone in the utterance. <DT><EM>Generate F0 contour (tune)</EM> <DD> Generate tune based on accents etc. <DT><EM>Render waveform</EM> <DD> Render waveform from phones, duration and F) target values, this itself may take several steps including unit selection (be they diphones or other sized units), imposition of dsesired prosody (duration and F0) and waveform reconstruction. </DL> <P> The number of steps and what actually happens may vary and is dependent on the particular voice selected and the utterance's <EM>type</EM>, see below. </P> <P> Each of these steps in Festival is achived by a <EM>module</EM> which will typically add new information to the utterance structure. </P> <P> <A NAME="IDX190"></A> <A NAME="IDX191"></A> <A NAME="IDX192"></A> An utterance structure consists of a set of <EM>items</EM> which may be part of one or more <EM>relations</EM>. Items represent things like words and phones, though may also be used to represent less concrete objects like noun phrases, and nodes in metrical trees. An item contains a set of features, (name and value). Relations are typically simple lists of items or trees of items. For example the the <CODE>Word</CODE> relation is a simple list of items each of which represent a word in the utternace. Those words will also be in other relations, such as the <EM>SylStructure</EM> relation where the word will be the top of a tree structure containing its syllables and segments. </P> <P> Unlike previous versions of the system items (then called stream items) are not in any particular relations (or stream). And are merely part of the relations they are within. Importantly this allows much more general relations to be made over items that was allowed in the previous system. This new architecture is the continuation of our goal of providing a general efficient structure for representing complex interrelated utterance objects. </P> <P> <A NAME="IDX193"></A> The architecture is fully general and new items and relations may be defined at run time, such that new modules may use any relations they wish. However within our standard English (and other voices) we have used a specific set of relations ass follows. <DL COMPACT> <DT><EM>Token</EM> <DD> a list of trees. This is first formed as a list of tokens found in a character text string. Each root's daughters are the <EM>Word</EM>'s that the token is related to. <DT><EM>Word</EM> <DD> a list of words. These items will also appear as daughters (leaf nodes) of the <CODE>Token</CODE> relation. They may also appear in the <CODE>Syntax</CODE> relation (as leafs) if the parser is used. They will also be leafs of the <CODE>Phrase</CODE> relation. <DT><EM>Phrase</EM> <DD> a list of trees. This is a list of phrase roots whose daughters are the <CODE>Word's</CODE> within those phrases. <DT><EM>Syntax</EM> <DD> a single tree. This, if the probabilistic parser is called, is a syntactic binary branching tree over the members of the <CODE>Word</CODE> relation. <DT><EM>SylStructure</EM> <DD> a list of trees. This links the <CODE>Word</CODE>, <CODE>Syllable</CODE> and <CODE>Segment</CODE> relations. Each <CODE>Word</CODE> is the root of a tree whose immediate daughters are its syllables and their daughters in turn as its segments. <DT><EM>Syllable</EM> <DD> a list of syllables. Each member will also be in a the <CODE>SylStructure</CODE> relation. In that relation its parent will be the word it is in and its daughters will be the segments that are in it. Syllables are also in the <CODE>Intonation</CODE> relation giving links to their related intonation events. <DT><EM>Segment</EM> <DD> a list of segments (phones). Each member (except silences) will be leaf nodes in the <CODE>SylStructure</CODE> relation. These may also be in the <CODE>Target</CODE> relation linking them to F0 target points. <DT><EM>IntEvent</EM> <DD> a list of intonation events (accents and bounaries). These are related to syllables through the <CODE>Intonation</CODE> relation as leafs on that relation. Thus their parent in the <CODE>Intonation</CODE> relation is the syllable these events are attached to. <DT><EM>Intonation</EM> <DD> a list of trees relating syllables to intonation events. Roots of the trees in <CODE>Intonation</CODE> are <CODE>Syllables</CODE> and their daughters are <CODE>IntEvents</CODE>. <DT><EM>Wave</EM> <DD> a single item with a feature called <CODE>wave</CODE> whose value is the generated waveform. </DL> <P> This is a non-exhaustive list some modules may add other relations and not all utterance may have all these relations, but the above is the general case. </P> <H2><A NAME="SEC50" HREF="festival_toc.html#TOC50">14.2 Utterance types</A></H2> <P> <A NAME="IDX194"></A> <A NAME="IDX195"></A> <A NAME="IDX196"></A> The primary purpose of types is to define which modules are to be applied to an utterance. <CODE>UttTypes</CODE> are defined in <TT>`lib/synthesis.scm'</TT>. The function <CODE>defUttType</CODE> defines which modules are to be applied to an utterance of that type. The function <CODE>utt.synth</CODE> is called applies this list of module to an utterance before waveform synthesis is called. </P> <P> For example when a <CODE>Segment</CODE> type Utterance is synthesized it needs only have its values loaded into a <CODE>Segment</CODE> relation and a <CODE>Target</CODE> relation, then the low level waveform synthesis module <CODE>Wave_Synth</CODE> is called. This is defined as follows <PRE> (defUttType Segments (Initialize utt) (Wave_Synth utt)) </PRE> <P> A more complex type is <CODE>Text</CODE> type utterance which requires many more modules to be called before a waveform can be synthesized <PRE> (defUttType Text (Initialize utt) (Text utt) (Token utt) (POS utt) (Phrasify utt) (Word utt) (Intonation utt) (Duration utt) (Int_Targets utt) (Wave_Synth utt) ) </PRE> <P> <A NAME="IDX197"></A> The <CODE>Initialize</CODE> module should normally be called for all types. It loads the necessary relations from the input form and deletes all other relations (if any exist) ready for synthesis. </P> <P> Modules may be directly defined as C/C++ functions and declared with a Lisp name or simple functions in Lisp that check some global parameter before calling a specific module (e.g. choosing between different intonation modules). </P> <P> These types are used when calling the function <CODE>utt.synth</CODE> and individual modules may be called explicitly by hand if required. </P> <P> <A NAME="IDX198"></A> <A NAME="IDX199"></A> Because we expect waveform synthesis methods to themselves become complex with a defined set of functions to select, join, and modify units we now support an addition notion of <CODE>SynthTypes</CODE> like <CODE>UttTypes</CODE> these define a set of functions to apply to an utterance. These may be defined using the <CODE>defSynthType</CODE> function. For example <PRE> (defSynthType Festival (print "synth method Festival") (print "select") (simple_diphone_select utt) (print "join") (cut_unit_join utt) (print "impose") (simple_impose utt) (simple_power utt) (print "synthesis") (frames_lpc_synthesis utt) ) </PRE> <P> A <CODE>SynthType</CODE> is selected by naming as the value of the parameter <CODE>Synth_Method</CODE>. </P> <P> <A NAME="IDX200"></A> <A NAME="IDX201"></A> <A NAME="IDX202"></A> <A NAME="IDX203"></A> <A NAME="IDX204"></A> Duration the application of the function <CODE>utt.synth</CODE> there are three hooks applied. This allows addition control of the synthesis process. <CODE>before_synth_hooks</CODE> is applied before any modules are applied. <CODE>after_analysis_hooks</CODE> is applied at the start of <CODE>Wave_Synth</CODE> when all text, linguistic and prosodic processing have been done. <CODE>after_synth_hooks</CODE> is applied after all modules have been applied. These are useful for things such as, altering the volume of a voice that happens to be quieter than others, or for example outputing information for a talking head before waveform synthesis occurs so preparation of the facial frames and synthesizing the waveform may be done in parallel. (see <TT>`festival/examples/th-mode.scm'</TT> for an example use of these hooks for a talking head text mode.) </P> <H2><A NAME="SEC51" HREF="festival_toc.html#TOC51">14.3 Example utterance types</A></H2> <P> <A NAME="IDX205"></A> A number of utterance types are currently supported. It is easy to add new ones but the standard distribution includes the following. </P> <DL COMPACT> <DT><CODE>Text</CODE> <DD> <A NAME="IDX206"></A> Raw text as a string. <PRE> (Utterance Text "This is an example") </PRE> <DT><CODE>Words</CODE> <DD> <A NAME="IDX207"></A> A list of words <PRE> (Utterance Words (this is an example)) </PRE> Words may be atomic or lists if further features need to be specified. For example to specify a word and its part of speech you can use <PRE> (Utterance Words (I (live (pos v)) in (Reading (pos n) (tone H-H%)))) </PRE> Note: the use of the tone feature requires an intonation mode that supports it. Any feature and value named in the input will be added to the Word item. <DT><CODE>Phrase</CODE> <DD> This allows explicit phrasing and features on Tokens to be specified. The input consists of a list of phrases each contains a list of tokens. <PRE> (Utterance Phrase ((Phrase ((name B)) I saw the man (in ((EMPH 1))) the park) (Phrase ((name BB)) with the telescope))) </PRE> ToBI tones and accents may also be specified on Tokens but these will only take effect if the selected intonation method uses them. <DT><CODE>Segments</CODE> <DD> <A NAME="IDX208"></A> This allows specification of segments, durations and F0 target values. <PRE> (Utterance Segments ((# 0.19 ) (h 0.055 (0 115)) (@ 0.037 (0.018 136)) (l 0.064 ) (ou 0.208 (0.0 134) (0.100 135) (0.208 123)) (# 0.19))) </PRE> Note the times are in <EM>seconds</EM> NOT milliseconds. The format of each segment entry is segment name, duration in seconds, and list of target values. Each target value consists of a pair of point into the segment (in seconds) and F0 value in Hz. <DT><CODE>Phones</CODE> <DD> <A NAME="IDX209"></A> This allows a simple specification of a list of phones. Synthesis specifies fixed durations (specified in <CODE>FP_duration</CODE>, default 100 ms) and monotone intonation (specified in <CODE>FP_F0</CODE>, default 120Hz). This may be used for simple checks for waveform synthesizers etc. <PRE> (Utterance Phones (# h @ l ou #)) </PRE> <A NAME="IDX210"></A> Note the function <CODE>SayPhones</CODE> allows synthesis and playing of lists of phones through this utterance type. <DT><CODE>Wave</CODE> <DD> <A NAME="IDX211"></A> A waveform file. Synthesis here simply involves loading the file. <PRE> (Utterance Wave fred.wav) </PRE> </DL> <P> <A NAME="IDX212"></A> <A NAME="IDX213"></A> Others are supported, as defined in <TT>`lib/synthesis.scm'</TT> but are used internally by various parts of the system. These include <CODE>Tokens</CODE> used in TTS and <CODE>SegF0</CODE> used by <CODE>utt.resynth</CODE>. </P> <H2><A NAME="SEC52" HREF="festival_toc.html#TOC52">14.4 Utterance modules</A></H2> <P> <A NAME="IDX214"></A> The module is the basic unit that does the work of synthesis. Within Festival there are duration modules, intonation modules, wave synthesis modules etc. As stated above the utterance type defines the set of modules which are to be applied to the utterance. These modules in turn will create relations and items so that ultimately a waveform is generated, if required. </P> <P> <A NAME="IDX215"></A> Many of the chapters in this manual are solely concerned with particular modules in the system. Note that many modules have internal choices, such as which duration method to use or which intonation method to use. Such general choices are often done through the <CODE>Parameter</CODE> system. Parameters may be set for different features like <CODE>Duration_Method</CODE>, <CODE>Synth_Method</CODE> etc. Formerly the values for these parameters were atomic values but now they may be the functions themselves. For example, to select the Klatt duration rules <PRE> (Parameter.set 'Duration_Method Duration_Klatt) </PRE> <P> This allows new modules to be added without requiring changes to the central Lisp functions such as <CODE>Duration</CODE>, <CODE>Intonation</CODE>, and <CODE>Wave_Synth</CODE>. </P> <H2><A NAME="SEC53" HREF="festival_toc.html#TOC53">14.5 Accessing an utterance</A></H2> <P> There are a number of standard functions that allow one to access parts of an utterance and traverse through it. </P> <P> <A NAME="IDX216"></A> <A NAME="IDX217"></A> Functions exist in Lisp (and of course C++) for accessing an utterance. The Lisp access functions are <DL COMPACT> <DT><SAMP>`(utt.relationnames UTT)'</SAMP> <DD> returns a list of the names of the relations currently created in <CODE>UTT</CODE>. <DT><SAMP>`(utt.relation.items UTT RELATIONNAME)'</SAMP> <DD> returns a list of all items in <CODE>RELATIONNAME</CODE> in <CODE>UTT</CODE>. This is nil if no relation of that name exists. Note for tree relation will give the items in pre-order. <DT><SAMP>`(utt.relation_tree UTT RELATIONNAME)'</SAMP> <DD> A Lisp tree presentation of the items <CODE>RELATIONNAME</CODE> in <CODE>UTT</CODE>. The Lisp bracketing reflects the tree structure in the relation. <DT><SAMP>`(utt.relation.leafs UTT RELATIONNAME)'</SAMP> <DD> A list of all the leafs of the items in <CODE>RELATIONNAME</CODE> in <CODE>UTT</CODE>. Leafs are defined as those items with no daughters within that relation. For simple list relations <CODE>utt.relation.leafs</CODE> and <CODE>utt.relation.items</CODE> will return the same thing. <DT><SAMP>`(utt.relation.first UTT RELATIONNAME)'</SAMP> <DD> returns the first item in <CODE>RELATIONNAME</CODE>. Returns <CODE>nil</CODE> if this relation contains no items <DT><SAMP>`(utt.relation.last UTT RELATIONNAME)'</SAMP> <DD> returns the last (the most next) item in <CODE>RELATIONNAME</CODE>. Returns <CODE>nil</CODE> if this relation contains no items <DT><SAMP>`(item.feat ITEM FEATNAME)'</SAMP> <DD> returns the value of feature <CODE>FEATNAME</CODE> in <CODE>ITEM</CODE>. <CODE>FEATNAME</CODE> may be a feature name, feature function name, or pathname (see below). allowing reference to other parts of the utterance this item is in. <DT><SAMP>`(item.features ITEM)'</SAMP> <DD> Returns an assoc list of feature-value pairs of all local features on this item. <DT><SAMP>`(item.name ITEM)'</SAMP> <DD> Returns the name of this <CODE>ITEM</CODE>. This could also be accessed as <CODE>(item.feat ITEM 'name)</CODE>. <DT><SAMP>`(item.set_name ITEM NEWNAME)'</SAMP> <DD> Sets name on <CODE>ITEM</CODE> to be <CODE>NEWNAME</CODE>. This is equivalent to <CODE>(item.set_feat ITEM 'name NEWNAME)</CODE> <DT><SAMP>`(item.set_feat ITEM FEATNAME FEATVALUE)'</SAMP> <DD> set the value of <CODE>FEATNAME</CODE> to <CODE>FEATVALUE</CODE> in <CODE>ITEM</CODE>. <CODE>FEATNAME</CODE> should be a simple name and not refer to next, previous or other relations via links. <DT><SAMP>`(item.relation ITEM RELATIONNAME)'</SAMP> <DD> Return the item as viewed from <CODE>RELATIONNAME</CODE>, or <CODE>nil</CODE> if <CODE>ITEM</CODE> is not in that relation. <DT><SAMP>`(item.relationnames ITEM)'</SAMP> <DD> Return a list of relation names that this item is in. <DT><SAMP>`(item.relationname ITEM)'</SAMP> <DD> Return the relation name that this item is currently being viewed as. <DT><SAMP>`(item.next ITEM)'</SAMP> <DD> Return the next item in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there is no next. <DT><SAMP>`(item.prev ITEM)'</SAMP> <DD> Return the previous item in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there is no previous. <DT><SAMP>`(item.parent ITEM)'</SAMP> <DD> Return the parent of <CODE>ITEM</CODE> in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there is no parent. <DT><SAMP>`(item.daughter1 ITEM)'</SAMP> <DD> Return the first daughter of <CODE>ITEM</CODE> in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there are no daughters. <DT><SAMP>`(item.daughter2 ITEM)'</SAMP> <DD> Return the second daughter of <CODE>ITEM</CODE> in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there is no second daughter. <DT><SAMP>`(item.daughtern ITEM)'</SAMP> <DD> Return the last daughter of <CODE>ITEM</CODE> in <CODE>ITEM</CODE>'s current relation, or <CODE>nil</CODE> if there are no daughters. <DT><SAMP>`(item.leafs ITEM)'</SAMP> <DD> Return a list of all lefs items (those with no daughters) dominated by this item. <DT><SAMP>`(item.next_leaf ITEM)'</SAMP> <DD> Find the next item in this relation that has no daughters. Note this may traverse up the tree from this point to search for such an item. </DL> <P> As from 1.2 the utterance structure may be fully manipulated from Scheme. Relations and items may be created and deleted, as easily as they can in C++; <DL COMPACT> <DT><SAMP>`(utt.relation.present UTT RELATIONNAME)'</SAMP> <DD> returns <CODE>t</CODE> if relation named <CODE>RELATIONNAME</CODE> is present, <CODE>nil</CODE> otherwise. <DT><SAMP>`(utt.relation.create UTT RELATIONNAME)'</SAMP> <DD> Creates a new relation called <CODE>RELATIONNAME</CODE>. If this relation already exists it is deleted first and items in the relation are derefenced from it (deleting the items if they are no longer referenced by any relation). Thus create relation guarantees an empty relation. <DT><SAMP>`(utt.relation.delete UTT RELATIONNAME)'</SAMP> <DD> Deletes the relation called <CODE>RELATIONNAME</CODE> in utt. All items in that relation are derefenced from the relation and if they are no longer in any relation the items themselves are deleted. <DT><SAMP>`(utt.relation.append UTT RELATIONNAME ITEM)'</SAMP> <DD> Append <CODE>ITEM</CODE> to end of relation named <CODE>RELATIONNAME</CODE> in <CODE>UTT</CODE>. Returns <CODE>nil</CODE> if there is not relation named <CODE>RELATIONNAME</CODE> in <CODE>UTT</CODE> otherwise returns the item appended. This new item becomes the last in the top list. <CODE>ITEM</CODE> item may be an item itself (in this or another relation) or a LISP description of an item, which consist of a list containing a name and a set of feature vale pairs. It <CODE>ITEM</CODE> is <CODE>nil</CODE> or inspecified an new empty item is added. If <CODE>ITEM</CODE> is already in this relation it is dereferenced from its current position (and an emtpy item re-inserted). <DT><SAMP>`(item.insert ITEM1 ITEM2 DIRECTION)'</SAMP> <DD> Insert <CODE>ITEM2</CODE> into <CODE>ITEM1</CODE>'s relation in the direction specified by <CODE>DIRECTION</CODE>. <CODE>DIRECTION</CODE> may take the value, <CODE>before</CODE>, <CODE>after</CODE>, <CODE>above</CODE> and <CODE>below</CODE>. If unspecified, <CODE>after</CODE> is assumed. Note it is not recommended to insert above and below and the functions <CODE>item.insert_parent</CODE> and <CODE>item.append_daughter</CODE> should normally be used for tree building. Inserting using <CODE>before</CODE> and <CODE>after</CODE> within daughters is perfectly safe. <DT><SAMP>`(item.append_daughter PARENT DAUGHTER)'</SAMP> <DD> Append <CODE>DAUGHTER</CODE>, an item or a description of an item to the item <CODE>PARENT</CODE> in the <CODE>PARENT</CODE>'s relation. <DT><SAMP>`(item.insert_parent DAUGHTER NEWPARENT)'</SAMP> <DD> Insert a new parent above <CODE>DAUGHTER</CODE>. <CODE>NEWPARENT</CODE> may be a item or the description of an item. <DT><SAMP>`(item.delete ITEM)'</SAMP> <DD> Delete this item from all relations it is in. All daughters of this item in each relations are also removed from the relation (which may in turn cause them to be deleted if they cease to be referenced by any other relation. <DT><SAMP>`(item.relation.remove ITEM)'</SAMP> <DD> Remove this item from this relation, and any of its daughters. Other relations this item are in remain untouched. <DT><SAMP>`(item.move_tree FROM TO)'</SAMP> <DD> Move the item <CODE>FROM</CODE> to the position of <CODE>TO</CODE> in <CODE>TO</CODE>'s relation. <CODE>FROM</CODE> will often be in the same relation as <CODE>TO</CODE> but that isn't necessary. The contents of <CODE>TO</CODE> are dereferenced. its daughters are saved then descendants of <CODE>FROM</CODE> are recreated under the new <CODE>TO</CODE>, then <CODE>TO</CODE>'s previous daughters are derefenced. The order of this is important as <CODE>FROM</CODE> may be part of <CODE>TO</CODE>'s descendants. Note that if <CODE>TO</CODE> is part of <CODE>FROM</CODE>'s descendants no moving occurs and <CODE>nil</CODE> is returned. For example to remove all punction terminal nodes in the Syntax relation the call would be something like <PRE> (define (syntax_relation_punc p) (if (string-equal "punc" (item.feat (item.daughter2 p) "pos")) (item.move_tree (item.daughter1 p) p) (mapcar syntax_remove_punc (item.daughters p)))) </PRE> <DT><SAMP>`(item.exchange_trees ITEM1 ITEM2)'</SAMP> <DD> Exchange <CODE>ITEM1</CODE> and <CODE>ITEM2</CODE> and their descendants in <CODE>ITEM2</CODE>'s relation. If <CODE>ITEM1</CODE> is within <CODE>ITEM2</CODE>'s descendents or vice versa <CODE>nil</CODE> is returns and no exchange takes place. If <CODE>ITEM1</CODE> is not in <CODE>ITEM2</CODE>'s relation, no exchange takes place. </DL> <P> Daughters of a node are actually represented as a list whose first daughter is double linked to the parent. Although being aware of this structure may be useful it is recommended that all access go through the tree specific functions <CODE>*.parent</CODE> and <CODE>*.daughter*</CODE> which properly deal with the structure, thus is the internal structure ever changes in the future only these tree access function need be updated. </P> <P> With the above functions quite elaborate utterance manipulations can be performed. For example in post-lexical rules where modifications to the segments are required based on the words and their context. See section <A HREF="festival_13.html#SEC47">13.8 Post-lexical rules</A> for an example of using various utterance access functions. </P> <H2><A NAME="SEC54" HREF="festival_toc.html#TOC54">14.6 Features</A></H2> <P> <A NAME="IDX218"></A> In previous versions items had a number of predefined features. This is no longer the case and all features are optional. Particularly the <CODE>start</CODE> and <CODE>end</CODE> features are no longer fixed, though those names are still used in the relations where yjeu are appropriate. Specific functions are provided for the <CODE>name</CODE> feature but they are just short hand for normal feature access. Simple features directly access the features in the underlying <CODE>EST_Feature</CODE> class in an item. </P> <P> In addition to simple features there is a mechanism for relating functions to names, thus accessing a feature may actually call a function. For example the features <CODE>num_syls</CODE> is defined as a feature function which will count the number of syllables in the given word, rather than simple access a pre-existing feature. Feature functions are usually dependent on the particular realtion the item is in, e.g. some feature functions are only appropriate for items in the <CODE>Word</CODE> relation, or only appropriate for those in the <CODE>IntEvent</CODE> relation. </P> <P> The third aspect of feature names is a path component. These are parts of the name (preceding in <CODE>.</CODE>) that indicated some trversal of the utterance structure. For example the features <CODE>name</CODE> will access the name feature on the given item. The feature <CODE>n.name</CODE> will return the name feature on the next item (in that item's relation). A number of basic direction operators are defined. <DL COMPACT> <DT><CODE>n.</CODE> <DD> next <DT><CODE>p.</CODE> <DD> previous <DT><CODE>nn.</CODE> <DD> next next <DT><CODE>pp.</CODE> <DD> previous <DT><CODE>parent.</CODE> <DD> <DT><CODE>daughter1.</CODE> <DD> first daughter <DT><CODE>daughter2.</CODE> <DD> second daughter <DT><CODE>daughtern.</CODE> <DD> last daughter <DT><CODE>first.</CODE> <DD> most previous item <DT><CODE>last.</CODE> <DD> most next item </DL> <P> Also you may specific traversal to another relation relation, though the <CODE>R:<relationame>.</CODE> operator. For example given an Item in the syllable relation <CODE>R:SylStructure.parent.name</CODE> would give the name of word the syllable is in. </P> <P> Some more complex examples are as follows, assuming we are starting form an item in the <CODE>Syllable</CODE> relation. <DL COMPACT> <DT><SAMP>`stress'</SAMP> <DD> This item's lexical stress <DT><SAMP>`n.stress'</SAMP> <DD> The next syllable's lexical stress <DT><SAMP>`p.stress'</SAMP> <DD> The previous syllable's lexical stress <DT><SAMP>`R:SylStructure.parent.name'</SAMP> <DD> The word this syllable is in <DT><SAMP>`R:SylStructure.parent.R:Word.n.name'</SAMP> <DD> The word next to the word this syllable is in <DT><SAMP>`n.R:SylStructure.parent.name'</SAMP> <DD> The word the next syllable is in <DT><SAMP>`R:SylStructure.daughtern.ph_vc'</SAMP> <DD> The phonetic feature <CODE>vc</CODE> of the final segment in this syllable. </DL> <P> A list of all feature functions is given in an appendix of this document. See section <A HREF="festival_32.html#SEC141">32 Feature functions</A>. New functions may also be added in Lisp. </P> <P> In C++ feature values are of class <EM>EST_Val</EM> which may be a string, int, or a float (or any arbitrary object). In Scheme this distinction cannot not always be made and sometimes when you expect an int you actually get a string. Care should be take to ensure the right matching functions are use in Scheme. It is recommended you use <CODE>string-append</CODE> or <CODE>string-match</CODE> as they will always work. </P> <P> If a pathname does not identify a valid path for the particular item (e.g. there is no next) <CODE>"0"</CODE> is returned. </P> <P> <A NAME="IDX219"></A> <A NAME="IDX220"></A> When collecting data from speech databases it is often useful to collect a whole set of features from all utterances in a database. These features can then be used for building various models (both CART tree models and linear regression modules use these feature names), </P> <P> A number of functions exist to help in this task. For example <PRE> (utt.features utt1 'Word '(name pos p.pos n.pos)) </PRE> <P> will return a list of word, and part of speech context for each word in the utterance. </P> <P> See section <A HREF="festival_26.html#SEC118">26.2 Extracting features</A> for an example of extracting sets of features from a database for use in building stochastic models. </P> <H2><A NAME="SEC55" HREF="festival_toc.html#TOC55">14.7 Utterance I/O</A></H2> <P> A number of functions are available to allow an utterance's structure to be made available for other programs. </P> <P> <A NAME="IDX221"></A> <A NAME="IDX222"></A> The whole structure, all relations, items and features may be saved in an ascii format using the function <CODE>utt.save</CODE>. This file may be reloaded using the <CODE>utt.load</CODE> function. Note the waveform is not saved using the form. </P> <P> <A NAME="IDX223"></A> <A NAME="IDX224"></A> <A NAME="IDX225"></A> <A NAME="IDX226"></A> <A NAME="IDX227"></A> <A NAME="IDX228"></A> Individual aspects of an utterance may be selectively saved. The waveform itself may be saved using the function <CODE>utt.save.wave</CODE>. This will save the waveform in the named file in the format specified in the <CODE>Parameter</CODE> <CODE>Wavefiletype</CODE>. All formats supported by the Edinburgh Speech Tools are valid including <CODE>nist</CODE>, <CODE>esps</CODE>, <CODE>sun</CODE>, <CODE>riff</CODE>, <CODE>aiff</CODE>, <CODE>raw</CODE> and <CODE>ulaw</CODE>. Note the functions <CODE>utt.wave.rescale</CODE> and <CODE>utt.wave.resample</CODE> may be used to change the gain and sample frequency of the waveform before saving it. A waveform may be imported into an existing utterance with the function <CODE>utt.import.wave</CODE>. This is specifically designed to allow external methods of waveform synthesis. However if you just wish to play an external wave or make it into an utterance you should consider the utterance <CODE>Wave</CODE> type. </P> <P> <A NAME="IDX229"></A> <A NAME="IDX230"></A> <A NAME="IDX231"></A> The segments of an utterance may be saved in a file using the function <CODE>utt.save.segs</CODE> which saves the segments of the named utterance in xlabel format. Any other stream may also be saved using the more general <CODE>utt.save.relation</CODE> which takes the additional argument of a relation name. The names of each item and the end feature of each item are saved in the named file, again in Xlabel format, other features are saved in extra fields. For more elaborated saving methods you can easily write a Scheme function to save data in an utterance in whatever format is required. See the file <TT>`lib/mbrola.scm'</TT> for an example. </P> <P> <A NAME="IDX232"></A> <A NAME="IDX233"></A> A simple function to allow the displaying of an utterance in Entropic's Xwaves tool is provided by the function <CODE>display</CODE>. It simply saves the waveform and the segments and sends appropriate commands to (the already running) Xwaves and xlabel programs. </P> <P> <A NAME="IDX234"></A> <A NAME="IDX235"></A> A function to synthesize an externally specified utterance is provided for by <CODE>utt.resynth</CODE> which takes two filename arguments, an xlabel segment file and an F0 file. This function loads, synthesizes and plays an utterance synthesized from these files. The loading is provided by the underlying function <CODE>utt.load.segf0</CODE>. </P> <P><HR><P> Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_13.html">previous</A>, <A HREF="festival_15.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>. </BODY> </HTML>