Sophie: festival-speechtools-devel-1.2.96-18.fc14 i686

festival-speechtools-devel-1.2.96-18.fc14.i686.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML>
<HEAD>
   <TITLE>void srpd</TITLE>
   <META NAME="GENERATOR" CONTENT="DOC++ 3.4.6">
</HEAD>
 <body bgcolor="#ffffff" link="#0000ff" 
	vlink="#dd0000" text="#000088" alink="9000ff">

<A HREF = "http://www.cstr.ed.ac.uk/">
   <IMG align=left BORDER=0 SRC = "cstr.gif"></A> 
<A HREF="http://www.cstr.ed.ac.uk/projects/speech_tools.html">
	<IMG BORDER=0 ALIGN=right SRC="est.jpg" width=150 height=93></A>
<br>

<br clear=left>
<p align=right>

In file ../include/sigpr/EST_sigpr_utt.h:<TABLE BORDER=0><TR>
<TD VALIGN=TOP><H2>void <A HREF="#DOC.DOCU">srpd</A></H2></TD><TD><H2>(<!1><A HREF="EST_Wave.html">EST_Wave</A> &amp;sig,  <!1><A HREF="EST_Track.html">EST_Track</A> &amp;fz,<BR>&nbsp; <!1><A HREF="EST_Features.html">EST_Features</A> &amp;options)</H2></TD></TR></TABLE>
<BLOCKQUOTE>Super resolution pitch trackerer.</BLOCKQUOTE>

<A NAME="DOC.DOCU"></A>
<HR>
<H2>Documentation</H2>
<BLOCKQUOTE>Super resolution pitch trackerer.

<P>srpd is a pitch detection algorithm that produces a fundamental
frequency contour from a speech waveform. At present only the super
resolution pitch detetmination algorithm is implemented.  See (Medan,
Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed
description of the algorithm.  &lt;/para&gt;&lt;para&gt;

<P>Frames of data are read in from &lt;parameter&gt;sig&lt;/parameter&gt; in
chronological order such that each frame is shifted in time from its
predecessor by &lt;parameter&gt;pda_frame_shift&lt;/parameter&gt;. Each frame is
analysed in turn.

<P>&lt;/para&gt;&lt;para&gt; 

<P>The maximum and minimum signal amplitudes are initially found over the
duration of two segments, each of length N_min samples. If the sum of
their absolute values is below two times
&lt;parameter&gt;noise_floor&lt;/parameter&gt;, the frame is classified as
representing silence and no coefficients are calculated. Otherwise, a
cross correlation coefficient is calculated for all n from a period in
samples corresponding to &lt;parameter&gt;min_pitch
&lt;/parameter&gt; to a period in samples corresponding to
&lt;parameter&gt;max_pitch&lt;/parameter&gt;, in steps
of &lt;parameter&gt;decimation_factor&lt;/parameter&gt;. In calculating the
coefficient only one in &lt;parameter&gt;decimation_factor&lt;/parameter&gt;
samples of the two segments are used. Such down-sampling permits rapid
estimates of the coefficients to be calculated over the range 
N_min &lt;= n &lt;= N_max. This results in a cross-correlation track for the
frame being analysed.  &lt;/para&gt;&lt;para&gt;

<P>Local maxima of the track with a coefficient value above a specified
threshold form candidates for the fundamental period. The threshold is
adaptive and dependent upon the values &lt;parameter&gt;v2uv_coeff_thresh
&lt;/parameter&gt;, &lt;parameter&gt;min_v2uv_coef_thresh &lt;/parameter&gt;, and
&lt;parameter&gt; v2uv_coef_thresh_rati_ratio&lt;/parameter&gt;. If the previously
analysed frame was classified as unvoiced or silent (which is the
initial state) then the threshold is set to
&lt;parameter&gt;v2uv_coef_thresh&lt;/parameter&gt;. Otherwise, the previous
frame was classified as being voiced, and the threshold is set equal
to [\-r] &lt;parameter&gt;v2uv_coef_thresh_rati_ratio
&lt;/parameter&gt; times the cross-correlation coefficient
value at the point of the previous fundamental period in the former
coefficients track. This product is not permitted to drop below
&lt;parameter&gt;v2uv_coef_thresh&lt;/parameter&gt;.

<P>&lt;/para&gt;&lt;para&gt;

<P>If no candidates for the fundamental period are found, the frame is classified
as being unvoiced. Otherwise, the candidates are further processed to identify
the most likely true pitch period. During this additional processing, a
threshold given by &lt;parameter&gt;anti_doubling_thres&lt;/parameter&gt; is used.

<P>&lt;/para&gt;&lt;para&gt;

<P>If the &lt;parameter&gt;peak_tracking&lt;/parameter&gt; flag is set to true,
biasing is applied to the cross-correlation track as described in
(Bagshaw et al., 1993).  &lt;/para&gt;&lt;para&gt; &lt;/para&gt;&lt;para&gt;

<P>
</BLOCKQUOTE>
<DL><DT><DT><B>Parameters:</B><DD><B>sig</B> - : input waveform
<BR><B>op</B> - :  options regarding pitch tracking parameters
<BR><B>op.min_pitch</B> - : minimum permitted F0 value
<BR><B>op.max_pitch</B> - : maximum permitted F0 value
<BR><B>op.pda_frame_shift</B> - : analysis <!1><A HREF="EST_Track.html#DOC.71.4.1">frame</A> <!1><A HREF="EST_Track.html#DOC.71.7.8">shift</A>
<BR><B>op.pda_frame_length</B> - : analysis <!1><A HREF="EST_Track.html#DOC.71.4.1">frame</A> <!1><A HREF="EST_TVector.html#DOC.15.1.20.2">length</A>
<BR><B>op.lpf_cutoff</B> - : cut off <!1><A HREF="EST_DiscreteProbDistribution.html#DOC.85.18">frequency</A> for low pass filtering
<BR><B>op.lpf_order</B> - : order of low pass filtering (must be odd)
<BR><B>op.decimation</B> - 
<BR><B>op.noise_floor</B> - 
<BR><B>op.min_v2uv_coef_thresh</B> - 
<BR><B>op.v2uv_coef_thresh_ratio</B> - 
<BR><B>op.v2uv_coef_thresh</B> - 
<BR><B>op.anti_doubling_thresh</B> - 
<BR><B>op.peak_tracking</B> - 
<BR><DD></DL><P><P><I><A HREF="index.html">Alphabetic index</A></I> <I><A HREF="HIER.html">HTML hierarchy of classes</A> or <A HREF="HIERjava.html">Java</A></I></P><HR>
<A HREF = "http://www.ed.ac.uk/">
   <IMG align=right BORDER=0 SRC = "edcrest.gif"></A>

<P Align=left><I>This page is part of the 
<A HREF="http://www.cstr.ed.ac.uk/projects/speech_tools.html">
Edinburgh Speech Tools Library</A> documentation
<br>
Copyright <A HREF="http://www.ed.ac.uk"> University of Edinburgh</A> 1997
<br>
Contact: <A HREF="mailto:speech_toolss@cstr.ed.ac.uk"> 
         speech_tools@cstr.ed.ac.uk </a>
</P>
<br clear=right>