Sophie: festival-speechtools-devel-1.2.96-18.fc14 i686

festival-speechtools-devel-1.2.96-18.fc14.i686.rpm

  <sect1>
	<title>EST_Track class example code</title>

    <toc depth='1'></toc>
    <para>
 Some examples of track manipulations.
    </para>
    <sect2>
      <title>Initialising and Resizing a Track</title>
      <para>
The constructor functions can be used to create a track with
zero frames and channels or a track with a specfied number of
frames and channels
      </para>
      <programlisting arch='c'>    EST_Track tr;           // <lineannotation>default track declaration</lineannotation>
    EST_Track tra(500, 10); // <lineannotation>allocate track with 500 frames and 10 channels</lineannotation>      </programlisting>
      <para>
tracks can be resized at any time:
      </para>
      <programlisting arch='c'>    tr.resize(10, 500); // <lineannotation>resize track to have 10 frames and 500 channels</lineannotation>
    tr.resize(500, 10); // <lineannotation>resize track to have 500 frames and 10 channels</lineannotation>      </programlisting>
      <para>
by default, resizing preserves values in the track. This
may involve copying some information, so if the existing values
are not needed, a flag can be set which usually results in 
quicker resizing
      </para>
      <programlisting arch='c'>    tr.resize(250, 5, 0);  // <lineannotation>throw away any existing values</lineannotation>      </programlisting>
      <para>
If only the number of channels or the number of frames needs
to be changed, this an be done with the following functions:
      </para>
      <programlisting arch='c'>    tr.set_num_channels(10);   // <lineannotation>makes 10 channels, keeps same no of frames</lineannotation>

    tr.set_num_frames(400);    // <lineannotation>makes 400 frames, keeps same no of channels</lineannotation>      </programlisting>
      <para>
The preserve flag works in the same way with these functions 
      </para>
    </sect2>
    <sect2>
      <title>Simple Access</title>
      <para>
Values in the track can be accessed and set by frame
number and channel number.
The following resizes a track to have 500 frames and 10 channels
and fills every position with -5.
      </para>
      <programlisting arch='c'>    tr.resize(500, 10); 

    for (i = 0; i &lt; tr.num_frames(); ++i)
	for (j = 0; j &lt; tr.num_channels(); ++j)
	    tr.a(i, j) = -5.0;      </programlisting>
      <para>
A well formed track will have a time value, specified in seconds,
for every frame. The time array can be filled directly:
      </para>
      <programlisting arch='c'>    for (i = 0; i &lt; tr.num_frames(); ++i)
	tr.t(i) = (float) i * 0.01;      </programlisting>
      <para>
which fills the time array with values 0.01, 0.02,
0.03... 5.0. However, A shortcut function is provded for fixed
frame spacing: 
      </para>
      <programlisting arch='c'>    tr.fill_time(0.1);      </programlisting>
      <para>
which performs the same operation as above. Frames do not have
to be evenly spaced, in pitch sychronous processing the time
array holds the time position of each pitch period. In such
cases each position in the time array must obviously be set
individually.</para><para>
Some representations have undefined values during certain
sections of the track, for example the F0 value during
unvoiced speech.</para><para>
The break/value array can be used to specify if a frame has an
undefined value.<para></para>. If a frame in this array is 1,
that means the amplitude is defined at that point. If 0, the
amplitude is undefined. By default, every frame has a value.
</para><para>
Breaks (undefined values) can be set by <method>set_break()
</method>. The following sets every frame from 50 to 99 as a
break:
      </para>
      <programlisting arch='c'>    for (i = 50; i &lt; 100; ++i)
	tr.set_break(i);      </programlisting>
      <para>
frames can be turned back to values as follows:
      </para>
      <programlisting arch='c'>    for (i = 50; i &lt; 100; ++i)
	tr.set_value(i);      </programlisting>
      <para>
It is up to individual functions to decide how to interpret breaks.
</para><para>
A frame's status can be checked as follows:
      </para>
      <programlisting arch='c'>    if (tr.val(60))
	cout &lt;&lt; "Frame 60 is not a break\n";

    if (tr.track_break(60))
	cout &lt;&lt; "Frame 60 is a break\n";      </programlisting>
    </sect2>
    <sect2 id='tr-example-naming-channels'>
      <title>Naming Channels</title>
      <para>
While channels can be accessed by their index, it is often useful
to give them names and refer to them by those names.
The set_channel_name() function sets the name of a single channel:
      </para>
      <programlisting arch='c'>    tr.set_channel_name("F0", 0); 
    tr.set_channel_name("energy", 1);       </programlisting>
      <para>
An alternative is to use a predefined set of channel names
stored in a <emphasis>map</emphasis>.A track map
is simply a String List strings which describe a channel name
configuration. The <method>resize</method> function can take
this and resize the number of channels to the number of channels
indicated in the map, and give each channel its name from the 
map. For exmaple:
      </para>
      <programlisting arch='c'>    EST_StrList map;
    map.append("F0");
    map.append("energy");

    tr.resize(500, map); // <lineannotation>this makes a 2 channel track and sets the names to F0 and energy</lineannotation>      </programlisting>
      <para>
A convention is used for channels which comprise
components of a multi-dimensional analysis such as
cepstra. In such cases the channels are named
<replacable>TYPE_I</replacable>.  The last coefficient is
always names <replacable>TYPE_N</replacable> regardless of
the number of coefficients. This is very useful in extracting
a set of related  channels without needing to know the order
of the analysis.
For example, a track map might look like:
      </para>
      <programlisting arch='c'>    map.clear();
    map.append("F0");
    map.append("energy");

    map.append("cep_0");
    map.append("cep_1");
    map.append("cep_2");
    map.append("cep_3");
    map.append("cep_4");
    map.append("cep_5");
    map.append("cep_6");
    map.append("cep_7");
    map.append("cep_N");

    tr.resize(500, map); // <lineannotation>makes a 11 channel track and sets the names</lineannotation>      </programlisting>
      <para>
This obviously gets unwieldy quite quickly, so the mapping
mechanism provides a short hand for multi-dimensional data.
      </para>
      <programlisting arch='c'>    map.clear();
    map.append("F0");
    map.append("energy");

    map.append("$cep-0+8");

    tr.resize(500, map); // <lineannotation>does exactly as above</lineannotation>      </programlisting>
      <para>
Here $ indicates the special status, "cep" the name of the
coefficents, "-0" that the first is number 0 and "+8" that
there are 8 more to follow. 
      </para>
    </sect2>
    <sect2 id='tr-example-frames-and-channels'>
      <title>Access single frames or single channels.</title>
      <para>
Often functions perform their operations on only a single
frame or channel, and the track class provides a general
mechanism for doing this.
Single frames or channels can be accessed as EST_FVectors:
Given a track with 500 frames and 10 channels, the 50th frame
can be accesed as:
      </para>
      <programlisting arch='c'>    EST_FVector tmp_frame;

    tr.frame(tmp_frame, 50);      </programlisting>
      <para>
now tmp_frame is 10 element vector, which is
a window into tr: any changes to the contents of tmp_frame will
change tr. tmp_frame cannot be resized. (This operation can
be thought in standard C terms as tmp_frame being a pointer
to the 5th frame of tr).
</para>	<para>
Likewise with channels:
      </para>
      <programlisting arch='c'>    EST_FVector tmp_channel;

    tr.channel(tmp_channel, 5);      </programlisting>
      <para>
Again, tmp_channel is 500 element vector, which is
a window into tr: any changes to the contents of tmp_channel will
change tr. tmp_channel cannot be resized. 
</para><para>
Channels can also be extracted by name:
      </para>
      <programlisting arch='c'>    tr.channel(tmp_channel, "energy");      </programlisting>
      <para>
not all the channels need be put into the temporary frame.
Imagine we have a track with a F0 channel,a energy channel and
10 cepstrum channels. The following makes a frame from the
50th frame, which only includes the cepstral information in
channels 2 through 11 
      </para>
      <programlisting arch='c'>    tr.frame(tmp_frame, 50, 2, 9);      </programlisting>
      <para>
Likewse, the 5th channel with only the last 100 frames can be set up
as: 
      </para>
      <programlisting arch='c'>    tr.channel(tmp_channel, 5, 400, 100);      </programlisting>
    </sect2>
    <sect2 id='tr-example-sub-tracks'>
      <title>Access multiple frames or channels.</title>
      <para>
In addition to extracting single frames and channels, multiple
frame and channel portions can be extacted in a similar
way. In the following example, we make a sub-track sub, which
points to the entire cepstrum portion of a track (channels 2
through 11) 
      </para>
      <programlisting arch='c'>    EST_Track sub;

    tr.sub_track(sub, 0, EST_ALL, 2, 9);      </programlisting>
      <para>
<parameter>sub</parameter> behaves exactlty like a normal
track in every way, except that it cannot be resized. Its
contents behave like a point into the designated portion of
<parameter>tr</parameter>, so changing
<parameter>sub</parameter> will change<parameter>
tr</parameter>.
</para><para> The first argument is the
<parameter>sub</parameter> track. The second states the start
frame and the total number of frames required. EST_ALL is a
special constant that specifies that all the frames are
required here. The next argument is the start channel number
(remember channels are numbered from 0), and the last argument
is the total number of channels required.  </para><para>
This facility is particularly useful for using standard
signal processing functions efficiently. For example,
the <function>melcep</function> in the signal procesing library
takes a waveform and produces a mel-scale cepstrum. It determines
the order of the cepstral analysis by the number of channels in
the track it is given, which has already been allocated to have
the correct number of frames and channels.
</para><para> The following will process the waveform
<parameter>sig</parameter>, produce a 10th order mel cepstrum
and place the output in <parameter>sub</parameter>. (For
explanation of the other options see
<function>melcep</function> 
      </para>
      <programlisting arch='c'>    EST_Wave sig;

    melcep(sig, sub, 1.0, 20, 22);       </programlisting>
      <para>
because we have made<parameter>sub</parameter> a window
into<parameter> tr</parameter>, the melcep function writes its
output into the correct location, i.e. channels 2-11 of tr. If
it were no for the sub_track facility, either a separate track
of the right size would be passed into melcep and then it
would be copied into tr (wasteful), or else tr would be passed
in and other arguments would have to specify which channels
should be written to (messy).  </para><para> 
Sub-tracks can also be set using channel names. The
following example does exactly as above, but is referenced by
the name of the first channel required and the number of
channels to follow: 
      </para>
      <programlisting arch='c'>    tr.sub_track(sub, 0, EST_ALL, "cep_0", "cep_N");      </programlisting>
      <para>
and this specfies the end by a string also:
      </para>
      <programlisting arch='c'>    tr.sub_track(sub, 0, EST_ALL, "cep_0", "cep_N");      </programlisting>
      <para>
sub_tracks can be any set of continuous frames and
channels. For example if a word started at frame 43 and ended
and frame 86, the following would set a sub track to that
portion: 
      </para>
      <programlisting arch='c'>    tr.sub_track(sub, 47, 39, "cep_0", "cep_N");      </programlisting>
      <para>
We can step through the frames of a Track using a standard
 itterator. The frames are returned as one-frame sub-tracks.
      </para>
      <programlisting arch='c'>    EST_Track::Entries frames;

    // <lineannotation>print out the time of every 50th track</lineannotation>
    cout &lt;&lt; "Times:";

    for (frames.begin(tr); frames; ++frames)
      {
	const EST_Track &amp;frame = *frames;
	if (frames.n() % 50 ==0)
	    cout &lt;&lt; " " &lt;&lt; frames.n() &lt;&lt; "[" &lt;&lt; frame.t() &lt;&lt; "]";
    }
    cout &lt;&lt; "\n";
	         </programlisting>
      <para>
The <function>channel</function>, <function>frame</function>
and <function>sub_track</function> functions are most commonly
used to write into a track using a convenient
sub-portion. Sometimes, however a simple copy is required
whose contents can be written without affecting the original.
The <member>copy_cub_track</member> function does this 
      </para>
      <programlisting arch='c'>    EST_Track tr_copy;
    
// <lineannotation>tr.copy_sub_track(tr_copy, 47, 39, "cep_0", "cep_N");</lineannotation>      </programlisting>
      <para>
Indvidual frames and channels can be copied out into
pre-allocated float * arrays as follows:
      </para>
      <programlisting arch='c'>    float *channel_buf, *frame_buf;
    channel_buf = new float[tr.num_frames()];
    frame_buf = new float[tr.num_channels()];

    tr.copy_channel_out(5, channel_buf);   // <lineannotation>copy channel 5 into channel_buf</lineannotation>
    tr.copy_frame_out(43, frame_buf);      // <lineannotation>copy frame 4 into frame_buf</lineannotation>      </programlisting>
      <para>
Indvidual frames and channels can be copied into the track
from float * arrays as follows:
      </para>
      <programlisting arch='c'>    tr.copy_channel_in(5, channel_buf);    // <lineannotation>copy channel_buf into channel 5 </lineannotation>
    tr.copy_frame_in(43, frame_buf);       // <lineannotation>copy frame_buf into frame 4</lineannotation>      </programlisting>
    </sect2>
    <sect2>
      <title>Auxiliary Channels</title>
      <para>
Auxiliary channels are used for storing frame information other than
amplitude coefficients, for example voicing decsions and points of
interest in the track.
Auxiliary channels always have the same number of frames as the
amplitude channels. They are resized by assigning names to the
channels that need to be created:
      </para>
      <programlisting arch='c'>    EST_StrList aux_names;

    aux_names.append("voicing");
    aux_names.append("join_points");
    aux_names.append("cost");

    tr.resize_aux(aux_names);      </programlisting>
      <para>
The following fills in these three channels with some values:
      </para>
      <programlisting arch='c'>    for (i = 0; i &lt; 500; ++i)
    {
	tr.aux(i, "voicing") = i;
	tr.aux(i, "join_points") = EST_String("stuff");
	tr.aux(i, "cost") =  0.111;
    }      </programlisting>
    </sect2>
    <sect2>
      <title>File I/O </title>
      <para>
Tracks in various formats can be saved and loaded:
Save as a HTK file:
      </para>
      <programlisting arch='c'>    if (tr.save("tmp/track.htk", "htk") != write_ok)
	EST_error("can't save htk file\n");      </programlisting>
      <para>
Save as a EST file:
      </para>
      <programlisting arch='c'>    if (tr.save("tmp/track.est", "est") != write_ok)
	EST_error("can't save est file\n");      </programlisting>
      <para>
Save as an ascii file:
      </para>
      <programlisting arch='c'>    if (tr.save("tmp/track.ascii", "ascii") != write_ok)
	EST_error("can't save ascii file\n");      </programlisting>
      <para>
The file type is automatically determined from the file's
header during loading:
      </para>
      <programlisting arch='c'>    EST_Track tr2;
    if (tr2.load("tmp/track.htk") != read_ok)
	EST_error("can't reload htk\n");      </programlisting>
      <para>
If no header is found, the function assumes the
file is ascii data, with a fixed frame shift, arranged with rows
representing frames and columns channels. In this case, the
frame shift must be specified as an argument to this function:
      </para>
      <programlisting arch='c'>    if (tr.load("tmp/track.ascii", 0.01) != read_ok)
	EST_error("can't reload ascii file\n");      </programlisting>
    </sect2>
  </sect1>