Sophie: festival-speechtools-devel-1.2.96-18.fc14 i686

festival-speechtools-devel-1.2.96-18.fc14.i686.rpm

  <sect1 id='pitchmark-manual'>
	<title><command> pitchmark </command> <emphasis> Find instants of glottal closure in Largynograph file</emphasis></title>

    <para>
* @toc 
    </para>
    <sect2>
      <title>Synopsis</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pitchmark -sgml_synopsis -->
        <para>
<cmdsynopsis><command>pitchmark</command>[input file] -o [output file] [options]Summary: pitchmark laryngograph (lx) files<arg>-h </arg>
<arg>-itype <replaceable>string</replaceable></arg>
<arg>-n <replaceable>int</replaceable></arg>
<arg>-f <replaceable>int</replaceable></arg>
<arg>-ibo <replaceable>string</replaceable></arg>
<arg>-iswap </arg>
<arg>-istype <replaceable>string</replaceable></arg>
<arg>-c <replaceable>string</replaceable></arg>
<arg>-start <replaceable>float</replaceable></arg>
<arg>-end <replaceable>float</replaceable></arg>
<arg>-from <replaceable>int</replaceable></arg>
<arg>-to <replaceable>int</replaceable></arg>
<arg>-otype <replaceable>string</replaceable> " {ascii}"</arg>
<arg>-S <replaceable>float</replaceable></arg>
<arg>-o <replaceable>ofile</replaceable></arg>
<arg>-lx_lf <replaceable>int</replaceable></arg>
<arg>-lx_lo <replaceable>int</replaceable></arg>
<arg>-lx_hf <replaceable>int</replaceable></arg>
<arg>-lx_ho <replaceable>int</replaceable></arg>
<arg>-df_lf <replaceable>int</replaceable></arg>
<arg>-df_lo <replaceable>int</replaceable></arg>
<arg>-med_o <replaceable>int</replaceable></arg>
<arg>-mean_o <replaceable>int</replaceable></arg>
<arg>-inv </arg>
<arg>-fill </arg>
<arg>-min <replaceable>float</replaceable></arg>
<arg>-max <replaceable>float</replaceable></arg>
<arg>-def <replaceable>float</replaceable></arg>
<arg>-pm <replaceable>ifile</replaceable></arg>
<arg>-f0 <replaceable>ofile</replaceable></arg>
<arg>-end <replaceable>float</replaceable></arg>
<arg>-wave_end </arg>
<arg>-inter </arg>
<arg>-style <replaceable>string</replaceable></arg>
</cmdsynopsis>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pitchmark -sgml_synopsis -->
      <para>

<command>pitchmark</command> locates instants of glottal closure in a
laryngograph waveform, and performs post-processing to produce even
pitchmarks. EST does not currently provide any means of pitchmarking a
speech waveform.
Pitchmarking is performed by calling the
<function>pitchmark()</function> function, which carries out the
following operations: 
<orderedlist> <listitem><para>Double low pass filter the signal. This
removes noise in the signal. The parameter
<parameter>lx_lf</parameter> specifies the low pass cutoff frequency,
and <parameter>lx_lo</parameter> specifies the order. Double filtering
(feeding the waveform through the filter, then reversing the waveform
and feeding it through again) is performed to reduce any phase shift
beween the input and output of the filtering operation.
</para></listitem>
<listitem><para>Double high pass filter the signal. This removes the
very low freqency swell that is often observed in laryngograph
waveforms.  The parameter <parameter>lx_hf</parameter> specifies the high pass cutoff frequency,
and <parameter>lx_ho</parameter> specifies the order.
Double filtering is performed to reduce any phase shift
beween the input and output of the filtering operation.
</para></listitem>
<listitem><para>Calculate the delta signal. The filtered waveform is
differentiated using the <function>delta()</function>
function.</para></listitem>
<listitem><para>Low pass filter the delta signal. Some noise may still
be present in the signal, and this is removed by further low pass
filtering. Experimentation has shown that simple mean smoothing is
often more effective than FIR smoothing at this point.  The parameter
<parameter>mo</parameter> is used to specify the size of the mean
smoothing window.  If FIR smoothing is chosen, the parameter
<parameter>df_lf</parameter> specifies the low pass cutoff frequency,
and <parameter>df_lo</parameter> specifies the order. Double filtering
is again used to avoid phase distortion.
</para></listitem>
<listitem><para>Pick zero crossings. Now simple zero-crossing is used
to find the pitchmarks themselves.  </para></listitem>
</orderedlist>
<command>pitchmark</command> also performs post-processing on the pitchmarks. 
This can be used to eliminate pitchmarks which occur too closely together, 
or to provide estimated evenly spaced pitchmarks during unvoiced regions.
The -fill option switches <action>this facility on</action>, 
and -min, -max, -def, 
-end and -wave_end control its operation.
      </para>
    </sect2>
    <sect2>
      <title>OPTIONS</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pitchmark -sgml_options -->
        <para>
<variablelist>
<varlistentry><term>-h</term>
<LISTITEM><PARA>

Options help 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-itype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input file type (optional). If set to raw, this 
indicates that the input file does not have a header. While 
this can be used to specify file types other than raw, this is 
rarely used for other purposes 
as the file type of all the existing supported 
types can be determined automatically from the 
file's header. If the input file is unheadered, 
files are assumed to be shorts (16bit). 
Supported types are 
nist, est, esps, snd, riff, aiff, audlab, raw, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-n</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Number of channels in an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-f</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Sample rate in Hertz for an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-ibo</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input byte order in an unheadered input file: 
possibliities are: MSB , LSB, native or nonnative. 
Suns, HP, SGI Mips, M68000 are MSB (big endian) 
Intel, Alpha, DEC Mips, Vax are LSB (little 
endian) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-iswap</term>
<LISTITEM><PARA>

Swap bytes. (For use on an unheadered input file) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-istype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Sample type in an unheadered input file: 
short, mulaw, byte, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-c</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Select a single channel (starts from 0). 
Waveforms can have multiple channels. This option 
extracts a single channel for progcessing and 
discards the rest. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-start</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave starting at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-end</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave ending at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-from</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave starting at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-to</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave ending at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-otype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>
 " {ascii}"
Output file type, if unspecified ascii is 
assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, ssff, xmg, xgraph, ema, ema_swapped, ascii, label 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-S</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Frame spacing of output in seconds. If this is 
different from the internal spacing, the contour is 
resampled at this spacing 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-o</term>
<LISTITEM><PARA>
<replaceable>ofile</replaceable>

Output filename, defaults to stdout 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lx_lf</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

lx low frequency cutoff 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lx_lo</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

lx low order 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lx_hf</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

lx high frequency cutoff 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lx_ho</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

lx high order 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-df_lf</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

df low frequeny cutoff 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-df_lo</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

df low order 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-med_o</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

median smoothing order 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-mean_o</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

mean smoothing order 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-inv</term>
<LISTITEM><PARA>

Invert polarity of lx signal. Often the lx signal 
is upside down. This option inverts the signal prior to 
processing. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-fill</term>
<LISTITEM><PARA>

Insert and remove pitchmarks according to min, max 
and def period values. Often it is desirable to place limits 
on the values of the pitchmarks. This option enforces a 
minimum and maximum pitch period (specified by -man and -max). 
If the maximum pitch setting is low enough, this will 
esnure that unvoiced regions have evenly spaced pitchmarks 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-min</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Minimum allowed pitch period, in seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-max</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Maximum allowed pitch period, in seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-def</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Default pitch period in seconds, used for a guide 
as to what length pitch periods should be in unvoiced 
sections 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-pm</term>
<LISTITEM><PARA>
<replaceable>ifile</replaceable>

Input is raw pitchmark file. This option is 
used to perform filling operations on an already existing 
set of pitchmarks 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-f0</term>
<LISTITEM><PARA>
<replaceable>ofile</replaceable>

Calculate F0 from pitchmarks and save to file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-end</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Specify the end time of the last pitchmark, for use 
with the -fill option 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-wave_end</term>
<LISTITEM><PARA>

Use the end of a waveform to specify when the 
last pitchmark position should be. The waveform file is only 
read to determine its end, no processing is performed 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-inter</term>
<LISTITEM><PARA>

Output intermediate waveforms. This will output the 
signal at various stages of processing. Examination of these 
waveforms is extremely useful in setting the parameters for 
similar waveforms 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-style</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

"track" or "lab" </PARA></LISTITEM>
</varlistentry>
</variablelist>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pitchmark -sgml_options -->
    </sect2>
    <sect2>
      <title>Examples</title>
      <para>
</para>
<formalpara><title>Basic Pitchmarking</title>
<para>
<screen>
$ pitchmark kdt_010.lar -o kdt_010.pm -otype est
</screen>
</para> 
</formalpara>
<formalpara><title>Pitchmarking with unvoiced regions
filled</title> <para> The following fills unvoiced regions with pitch
periods that are about 0.01 seconds long. It also post-processes the
set of pitchmarks and ensures that noe are above 0.02 seconds long and
none below 0.003. A final unvoiced region extending to the end of the
wave is specified by using the -wave_end option.
</para> </formalpara><para>
<screen>
$ pitchmark kdt_010.lar -o kdt_010.pm -otype est -fill -min 0.003  \
-max 0.02 -def 0.01 -wave_end
</screen>
      </para>
    </sect2>
  </sect1>