Sophie: festival-speechtools-devel-1.2.96-16.fc13 i686

festival-speechtools-devel-1.2.96-16.fc13.i686.rpm

  <sect1 id='sigfv-manual'>
	<title><command>sig2fv</command> <emphasis>Generate signal processing coefficients from waveforms</emphasis></title>

    <toc depth='1'></toc>
    <para>
    </para>
    <sect2>
      <title>Synopsis</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/sig2fv -sgml_synopsis -->
        <para>
<cmdsynopsis><command>sig2fv</command>[input file] -o [output file]<arg>-h </arg>
<arg>-itype <replaceable>string</replaceable></arg>
<arg>-n <replaceable>int</replaceable></arg>
<arg>-f <replaceable>int</replaceable></arg>
<arg>-ibo <replaceable>string</replaceable></arg>
<arg>-iswap </arg>
<arg>-istype <replaceable>string</replaceable></arg>
<arg>-c <replaceable>string</replaceable></arg>
<arg>-start <replaceable>float</replaceable></arg>
<arg>-end <replaceable>float</replaceable></arg>
<arg>-from <replaceable>int</replaceable></arg>
<arg>-to <replaceable>int</replaceable></arg>
<arg>-otype <replaceable>string</replaceable> " {ascii}"</arg>
<arg>-S <replaceable>float</replaceable></arg>
<arg>-o <replaceable>ofile</replaceable></arg>
<arg>-shift <replaceable>float</replaceable></arg>
<arg>-factor <replaceable>float</replaceable></arg>
<arg>-pm <replaceable>ifile</replaceable></arg>
<arg>-coefs <replaceable>string</replaceable></arg>
<arg>-delta <replaceable>string</replaceable></arg>
<arg>-acc <replaceable>string</replaceable></arg>
<arg>-window_type <replaceable>string</replaceable></arg>
<arg>-lpc_order <replaceable>int</replaceable></arg>
<arg>-ref_order <replaceable>int</replaceable></arg>
<arg>-cep_order <replaceable>int</replaceable></arg>
<arg>-melcep_order <replaceable>int</replaceable></arg>
<arg>-fbank_order <replaceable>int</replaceable></arg>
<arg>-preemph <replaceable>float</replaceable></arg>
<arg>-lifter <replaceable>float</replaceable></arg>
<arg>-usepower </arg>
<arg>-include_c0 </arg>
<arg>-order <replaceable>string</replaceable></arg>
</cmdsynopsis>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/sig2fv -sgml_synopsis -->
      <para>

sig2fv is used to create signal processing feature vector analysis on speech
waveforms.
The following types of analysis are provided:
<itemizedlist>
<listitem><para>Linear prediction (LPC)</para></listitem>
<listitem><para>Cepstrum coding from lpc coefficients</para></listitem>
<listitem><para>Mel scale cepstrum coding via fbank</para></listitem>
<listitem><para>Mel scale log filterbank analysis</para></listitem>
<listitem><para>Line spectral frequencies</para></listitem>
<listitem><para>Linear prediction reflection coefficients</para></listitem>
<listitem><para>Root mean square energy</para></listitem>
<listitem><para>Power</para></listitem>
<listitem><para>fundamental frequency (pitch)</para></listitem>
<listitem><para>calculation of delta and acceleration coefficients of all of the 
above</para></listitem>
</itemizedlist>
The -coefs option is used to specify a list of the names of what sort
of basic processing is required, and -delta and -acc are used for
delta and acceleration coefficients respectively.
      </para>
    </sect2>
    <sect2>
      <title>Options</title>
      <para>
      </para>
        <!-- /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/sig2fv -sgml_options -->
        <para>
<variablelist>
<varlistentry><term>-h</term>
<LISTITEM><PARA>

Options help 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-itype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input file type (optional). If set to raw, this 
indicates that the input file does not have a header. While 
this can be used to specify file types other than raw, this is 
rarely used for other purposes 
as the file type of all the existing supported 
types can be determined automatically from the 
file's header. If the input file is unheadered, 
files are assumed to be shorts (16bit). 
Supported types are 
nist, est, esps, snd, riff, aiff, audlab, raw, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-n</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Number of channels in an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-f</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Sample rate in Hertz for an unheadered input file 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-ibo</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Input byte order in an unheadered input file: 
possibliities are: MSB , LSB, native or nonnative. 
Suns, HP, SGI Mips, M68000 are MSB (big endian) 
Intel, Alpha, DEC Mips, Vax are LSB (little 
endian) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-iswap</term>
<LISTITEM><PARA>

Swap bytes. (For use on an unheadered input file) 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-istype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Sample type in an unheadered input file: 
short, mulaw, byte, ascii 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-c</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Select a single channel (starts from 0). 
Waveforms can have multiple channels. This option 
extracts a single channel for progcessing and 
discards the rest. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-start</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave starting at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-end</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Extract sub-wave ending at this time, specified in 
seconds 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-from</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave starting at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-to</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Extract sub-wave ending at this sample point 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-otype</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>
 " {ascii}"
Output file type, if unspecified ascii is 
assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, ssff, xmg, xgraph, ema, ema_swapped, ascii, label 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-S</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Frame spacing of output in seconds. If this is 
different from the internal spacing, the contour is 
resampled at this spacing 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-o</term>
<LISTITEM><PARA>
<replaceable>ofile</replaceable>

Output filename, defaults to stdout 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-shift</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

frame spacing in seconds for fixed frame analysis. This 
doesn't have to be the same as the output file spacing - the 
S option can be used to resample the track before saving 
default: 0.010 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-factor</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Frames lengths will be FACTOR times the 
local pitch period. 
default: 2.000 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-pm</term>
<LISTITEM><PARA>
<replaceable>ifile</replaceable>

Pitch mark file name. This is used to 
specify the positions of the analysis frames for pitch 
synchronous analysis. Pitchmark files are just standard 
track files, but the channel information is ignored and 
only the time positions are used 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-coefs</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

list of basic types of processing required. 
Permissable types are: 
lpc linear predictive coding 
cep cepstrum coding from lpc coefficients 
melcep Mel scale cepstrum coding via fbank 
fbank Mel scale log filterbank analysis 
lsf line spectral frequencies 
ref Linear prediction reflection coefficients 
power 
f0 
energy: root mean square energy 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-delta</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

list of delta types of processing required. Basic 
processing does not need to be specfied for this option to work. 
Permissable types are: 
lpc linear predictive coding 
cep cepstrum coding from lpc coefficients 
melcep Mel scale cepstrum coding via fbank 
fbank Mel scale log filterbank analysis 
lsf line spectral frequencies 
ref Linear prediction reflection coefficients 
power 
f0 
energy: root mean square energy 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-acc</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

list of acceleration (delta delta) processing 
required. Basic processing does not need to be specfied for 
this option to work. 
Permissable types are: 
lpc linear predictive coding 
cep cepstrum coding from lpc coefficients 
melcep Mel scale cepstrum coding via fbank 
fbank Mel scale log filterbank analysis 
lsf line spectral frequencies 
ref Linear prediction reflection coefficients 
power 
f0 
energy: root mean square energy 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-window_type</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

Type of window used on waveform. 
Permissable types are: 
none unknown window type 
rectangle Rectangular window 
triangle Triangular window 
hanning Hanning window 
hamming Hamming window 
default: hamming 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lpc_order</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of lpc analysis. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-ref_order</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of lpc reflection coefficient analysis. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-cep_order</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of lpc cepstral analysis. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-melcep_order</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of Mel cepstral analysis. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-fbank_order</term>
<LISTITEM><PARA>
<replaceable>int</replaceable>

Order of filter bank analysis. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-preemph</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

Perform pre-emphasis with this factor. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-lifter</term>
<LISTITEM><PARA>
<replaceable>float</replaceable>

lifter coefficient. 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-usepower</term>
<LISTITEM><PARA>

use power rather than energy in filter bank 
analysis 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-include_c0</term>
<LISTITEM><PARA>

include cepstral coefficient 0 
</PARA></LISTITEM>
</varlistentry>

<varlistentry><term>-order</term>
<LISTITEM><PARA>
<replaceable>string</replaceable>

order of analyses </PARA></LISTITEM>
</varlistentry>
</variablelist>
        </para>
        <!-- DONE /amd/projects/festival/versions/v_mpiro/speech_tools_linux/bin/sig2fv -sgml_options -->
    </sect2>
    <sect2>
      <title>Examples</title>
      <para>
Fixed frame basic linear prediction:
To produce a set of linear prediction coefficients at every 10ms, using
pre-emphasis  and saving in EST format:
<para>
<screen>
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5
</screen>
</para>
<formalpara><title>
Pitch Synchronous linear prediction</title><para>. The following used the set of pitchmarks
in kdt_010.pm as the centres of the analysis windows.
</para>
</formalpara>
<para>
<screen>
$ sig2fv kdt_010.wav -pm kdt_010.pm -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5
</screen>
</para>
<para>
F0, Linear prediction and cepstral coefficients:
<screen>
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -shift 0.01
</screen>
Note that pitchtracking can also be done with the
<command>pda</command> program. Both use the same underlying
technique, but the pda program offers much finer control over the
pitch track specific processing parameters.
</para>
<para>Energy, Linear Prediction and Cepstral coefficients, with a 10ms frame shift
during analis but a 5ms frame shift in the output file:
<para>
<screen>
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -S 0.005
-shift 0.01
</screen>
</para>
<para>Delta  and acc coefficients can be calculated even if ther base form is not 
required. This produces normal energy coefficients and cepstral delta coeficients:
<para>
<screen>
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "energy" -delta "cep" -otype est
</screen>
</para>
<para>Mel-scaled cepstra, Delta and acc coefficients, as is common in speech 
recognition:
<para>
<screen>
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "melcep" -delta "melcep" -acc "melcep" -otype est -preemph 0.96
</screen>
      </para>
    </sect2>
  </sect1>