Sophie

Sophie

distrib > Fedora > 14 > x86_64 > media > updates > by-pkgid > 0e54ba0ee564ce6063a5e83aa86060c5 > files > 625

festival-speechtools-devel-1.2.96-18.fc14.i686.rpm

  <sect1 id='pda-manual'>
	<title><command>pda</command> <emphasis>Pitch Detection Algorithm</emphasis></title>

    <toc depth='1'></toc>
    <para>
    </para>
    <sect2>
      <title>Synopsis</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pda -sgml_synopsis -->
        <para>
<cmdsynopsis><command>pda</command>[input file] -o [output file] [options]<arg>-h </arg>
<arg>-itype <replaceable>string</replaceable></arg>
<arg>-n <replaceable>int</replaceable></arg>
<arg>-f <replaceable>int</replaceable></arg>
<arg>-ibo <replaceable>string</replaceable></arg>
<arg>-iswap </arg>
<arg>-istype <replaceable>string</replaceable></arg>
<arg>-c <replaceable>string</replaceable></arg>
<arg>-start <replaceable>float</replaceable></arg>
<arg>-end <replaceable>float</replaceable></arg>
<arg>-from <replaceable>int</replaceable></arg>
<arg>-to <replaceable>int</replaceable></arg>
<arg>-L </arg>
<arg>-P </arg>
<arg>-fmin <replaceable>float</replaceable></arg>
<arg>-fmax <replaceable>float</replaceable></arg>
<arg>-shift <replaceable>float</replaceable></arg>
<arg>-length <replaceable>float</replaceable></arg>
<arg>-lpfilter <replaceable>int</replaceable></arg>
<arg>-forder <replaceable>int</replaceable></arg>
<arg>-d <replaceable>float</replaceable></arg>
<arg>-n <replaceable>float</replaceable></arg>
<arg>-h <replaceable>float</replaceable></arg>
<arg>-m <replaceable>float</replaceable></arg>
<arg>-r <replaceable>float</replaceable></arg>
<arg>-t <replaceable>float</replaceable></arg>
<arg>-otype <replaceable>string</replaceable> " {ascii}"</arg>
<arg>-S <replaceable>float</replaceable></arg>
<arg>-o <replaceable>ofile</replaceable></arg>
</cmdsynopsis>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pda -sgml_synopsis -->
      <para>

pda is a pitch detection algorithm that produces a fundamental frequency
contour from a speech waveform file. At present only the
super resolution pitch detetmination algorithm is implemented.
See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed
description of the algorithm.
</para><para>
The default values given below were found to optimise the performance
of the pitch determination algorithm for speech data sampled at 20kHz
using a 16\-bit waveform and low pass filter with a 600Hz cut-off
frequency and more than \-85dB rejection above 700Hz. The best
performances occur if the [\-p] flag is passed.  </para><para>
      </para>
    </sect2>
    <sect2>
      <title>Options</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pda -sgml_options -->
        <para>
<variablelist>
<varlistentry><term>-h</term>
<LISTITEM><PARA>

Options help 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-itype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input file type (optional). If set to raw, this 
indicates that the input file does not have a header. While 
this can be used to specify file types other than raw, this is 
rarely used for other purposes 
as the file type of all the existing supported 
types can be determined automatically from the 
file's header. If the input file is unheadered, 
files are assumed to be shorts (16bit). 
Supported types are 
nist, est, esps, snd, riff, aiff, audlab, raw, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-n</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Number of channels in an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-f</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Sample rate in Hertz for an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-ibo</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input byte order in an unheadered input file: 
possibliities are: MSB , LSB, native or nonnative. 
Suns, HP, SGI Mips, M68000 are MSB (big endian) 
Intel, Alpha, DEC Mips, Vax are LSB (little 
endian) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-iswap</term>
<LISTITEM><PARA>

Swap bytes. (For use on an unheadered input file) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-istype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Sample type in an unheadered input file: 
short, mulaw, byte, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-c</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Select a single channel (starts from 0). 
Waveforms can have multiple channels. This option 
extracts a single channel for progcessing and 
discards the rest. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-start</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave starting at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-end</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave ending at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-from</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave starting at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-to</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave ending at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-L</term>
<LISTITEM><PARA>

Perform low pass filtering on input. This option should always 
be used in normal processing as it usually increases 
performance considerably 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-P</term>
<LISTITEM><PARA>

perform peak tracking 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-fmin</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

miniumum F0 value. Sets the minimum allowed F0 in 
output track. Default is 40.000. 
Changing this to suit the speaker usually increases 
performance. Typical recommended values are 60-90Hz for 
males and 120-150Hz for females 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-fmax</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

maxiumum F0 value. Sets the maximum allowed F0 in 
output track. Default is 400.000. 
Changing this to suit the speaker usually increases 
performance. Typical recommended values are 200Hz for 
males and 300-400Hz for females 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-shift</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

frame spacing in seconds for fixed frame analysis. 
This doesn't have to be the same as the output file spacing - 
the -S option can be used to resample the track before saving 
default: 0.005 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-length</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

analysis frame length in seconds. 
default: 0.010 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lpfilter</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Low pass filter, with cutoff frequency in Hz 
Filtering is performed by a FIR filter which is built at run 
time. The order of the filter can be given by -forder. The 
default value is 199 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-forder</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of FIR filter used for lpfilter and 
hpfilter. This must be ODD. Sensible values range 
from 19 (quick but with a shallow rolloff) to 199 
(slow but with a steep rolloff). The default is 199. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-d</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

decimation factor 
set down-sampling for quicker computation so that only one in 
<parameter>decimation factor</parameter> samples are used in the first instance. 
Must be in the range of one to ten inclusive. Default is four. 
For data sampled at 10kHz, it is advised that a decimation 
factor of two isselected. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-n</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Inoise floor. 
Set the maximum absolute signal amplitude that represents 
silence to <parameter>Inoise floor</parameter>. If the absolute amplitude of 
the first segment in a given frame is below this level at all 
times, then the frame is classified as representing silence. 
Must be a positive number. Default is 120 ADC units. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-h</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

unvoiced to voiced coeff threshold 
set the correlation coefficient threshold which must be 
exceeded in a transition from an unvoiced classified frame 
of speech to a voiced frame as the unvoiced to voiced coeff 
threshold. Must be in the range zero to one inclusive. 
Default is 0.88. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-m</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

min voiced to unvoiced coeff threshold 
set the minimum allowed correlation coefficient threshold 
which must not be exceeded in a transition from a voiced 
classified frame of speech to an unvoiced frame, as 
<parameter>min voiced to unvoiced coeff threshold</parameter>. Must be in the 
range zero to <parameter>unvoiced to voiced coeff threshold</parameter> 
inclusive. Default is 0.75. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-r</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

voiced to unvoiced coeff threshold-ratio 
set the scaling factor used in determining the correlation 
coefficient threshold which must not be exceeded in a voiced 
frame to unvoiced frame transition, as <parameter>voiced to unvoiced</parameter> 
coeff threshold -ratio. The voiced to unvoiced coefficient 
threshold is determined by multiplying this scaling factor 
with the maximum cross-correlation coefficient of the 
previously voiced frame. If this product is less than 
<parameter>min voiced to unvoiced coeff threshold</parameter> then this is used 
instead. Must be in the range zero to one inclusive. 
Default is 0.85. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-t</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

anti pitch doubling/halving threshold 
set the threshold used in eliminating (as far as possible) 
pitch doubling and pitch halving errors as <parameter>anti pitch 
double/halving threshold</parameter>. Must be in the range zero to 
one inclusive. Default is 0.77. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-otype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>
 " {ascii}"
Output file type, if unspecified ascii is 
assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, ssff, xmg, xgraph, ema, ema_swapped, ascii, label 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-S</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Frame spacing of output in seconds. If this is 
different from the internal spacing, the contour is 
resampled at this spacing 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-o</term>
<LISTITEM><PARA>
<replaceable>ofile</replaceable>

Output filename, defaults to stdout </PARA></LISTITEM>
</varlistentry>
</variablelist>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/pda -sgml_options -->
    </sect2>
    <sect2>
      <title>Examples</title>
      <para>
Pitch detection on typical male voice, using low pass filtering:
<screen>
$ pda kdt_010.wav -o kdt_010.f0 -fmin 80 -fmax 200 -L
</screen>
      </para>
    </sect2>
  </sect1>