Sophie: bigloo-doc-3.2b-3.fc12 i686

bigloo-doc-3.2b-3.fc12.i686.rpm

<!-- 95% W3C COMPLIANT, 95% CSS FREE, RAW HTML -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title>BiglooA ``practical Scheme compiler''User manual for version 3.2bJune 2009</title>
 <style type="text/css">
  <!--
  pre { font-family: monospace }
  tt { font-family: monospace }
  code { font-family: monospace }
  p.flushright { text-align: right }
  p.flushleft { text-align: left }
  span.sc { font-variant: small-caps }
  span.sf { font-family: sans-serif }
  span.skribetitle { font-family: sans-serif; font-weight: bolder; font-size: x-large; }
  span.refscreen { }
  span.refprint { display: none; }
  -->
 </style>
</head>

<body class="chapter" bgcolor="#ffffff">
<table width="100%" class="skribetitle" cellspacing="0" cellpadding="0"><tbody>
<tr><td align="center" bgcolor="#8381de"><div class="skribetitle"><strong><big><big><big>13. Bigloo<br/>A ``practical Scheme compiler''<br/>User manual for version 3.2b<br/>June 2009 -- Posix Regular Expressions</big></big></big></strong></div><center>
</center>
</td></tr></tbody></table>
<table cellpadding="3" cellspacing="0" width="100%" class="skribe-margins"><tr>
<td align="left" valign="top" class="skribe-left-margin" width="20%" bgcolor="#dedeff"><div class="skribe-left-margin">
<br/><center id='center28159'
><table width="97%" border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse;" frame="box" rules="none"><tbody>
<tr bgcolor="#8381de"><th id="tc28149" align="center" colspan="1"><font color="#ffffff"><strong id='bold28147'
>main page</strong></font></th></tr>
<tr bgcolor="#ffffff"><td id="tc28156" align="center" colspan="1"><table width="100%" border="0" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc28152" align="left" valign="top" colspan="1"><strong id='bold28151'
>top:</strong></td><td id="tc28153" align="right" valign="top" colspan="1"><a href="bigloo.html#Bigloo-A-``practical-Scheme-compiler''-User-manual-for-version-3.2b-June-2009" class="inbound">Bigloo<br/>A ``practical Scheme compiler''<br/>User manual for version 3.2b<br/>June 2009</a></td></tr>
</tbody></table>
</td></tr>
</tbody></table>
</center>
<br/><br/><center id='center28169'
><table width="97%" border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse;" frame="box" rules="none"><tbody>
<tr bgcolor="#8381de"><th id="tc28163" align="center" colspan="1"><font color="#ffffff"><strong id='bold28161'
>Posix Regular Expressions</strong></font></th></tr>
<tr bgcolor="#ffffff"><td id="tc28166" align="center" colspan="1"><table cellspacing="1" cellpadding="1" width="100%" class="toc">
<tbody>
 <tr><td valign="top" align="left">13.1</td><td colspan="4" width="100%"><a href="bigloo-14.html#Regular-Expressions-Procedures">Regular Expressions Procedures</a></td></tr>
 <tr><td valign="top" align="left">13.2</td><td colspan="4" width="100%"><a href="bigloo-14.html#Regular-Expressions-Pattern-Language">Regular Expressions Pattern Language</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.1</td><td colspan="3" width="100%"><a href="bigloo-14.html#Basic-assertions">Basic assertions</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.2</td><td colspan="3" width="100%"><a href="bigloo-14.html#Characters-and-character-classes">Characters and character classes</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.3</td><td colspan="3" width="100%"><a href="bigloo-14.html#Some-frequently-used-character-classes">Some frequently used character classes</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.4</td><td colspan="3" width="100%"><a href="bigloo-14.html#POSIX-character-classes">POSIX character classes</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.5</td><td colspan="3" width="100%"><a href="bigloo-14.html#Quantifiers">Quantifiers</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.6</td><td colspan="3" width="100%"><a href="bigloo-14.html#Numeric-quantifiers">Numeric quantifiers</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.7</td><td colspan="3" width="100%"><a href="bigloo-14.html#Non-greedy-quantifiers">Non-greedy quantifiers</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.8</td><td colspan="3" width="100%"><a href="bigloo-14.html#Clusters">Clusters</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.9</td><td colspan="3" width="100%"><a href="bigloo-14.html#Backreferences">Backreferences</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.10</td><td colspan="3" width="100%"><a href="bigloo-14.html#Non-capturing-clusters">Non-capturing clusters</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.11</td><td colspan="3" width="100%"><a href="bigloo-14.html#Cloisters">Cloisters</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.12</td><td colspan="3" width="100%"><a href="bigloo-14.html#Alternation">Alternation</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.13</td><td colspan="3" width="100%"><a href="bigloo-14.html#Backtracking">Backtracking</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.14</td><td colspan="3" width="100%"><a href="bigloo-14.html#Disabling-backtracking">Disabling backtracking</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.15</td><td colspan="3" width="100%"><a href="bigloo-14.html#Looking-ahead-and-behind">Looking ahead and behind</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.16</td><td colspan="3" width="100%"><a href="bigloo-14.html#Lookahead">Lookahead</a></td></tr>
 <tr><td></td><td valign="top" align="left">13.2.17</td><td colspan="3" width="100%"><a href="bigloo-14.html#Lookbehind">Lookbehind</a></td></tr>
 <tr><td valign="top" align="left">13.3</td><td colspan="4" width="100%"><a href="bigloo-14.html#An-Extended-Example">An Extended Example</a></td></tr>
</tbody>
</table>
</td></tr>
</tbody></table>
</center>
<br/><br/><center id='center28179'
><table width="97%" border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse;" frame="box" rules="none"><tbody>
<tr bgcolor="#8381de"><th id="tc28173" align="center" colspan="1"><font color="#ffffff"><strong id='bold28171'
>Chapters</strong></font></th></tr>
<tr bgcolor="#ffffff"><td id="tc28176" align="center" colspan="1"><table cellspacing="1" cellpadding="1" width="100%" class="toc">
<tbody>
 <tr><td valign="top" align="left"></td><td colspan="4" width="100%"><a href="bigloo-1.html#Acknowledgements">Acknowledgements</a></td></tr>
 <tr><td valign="top" align="left">1</td><td colspan="4" width="100%"><a href="bigloo-2.html#Table-of-contents">Table of contents</a></td></tr>
 <tr><td valign="top" align="left">2</td><td colspan="4" width="100%"><a href="bigloo-3.html#Overview-of-Bigloo">Overview of Bigloo</a></td></tr>
 <tr><td valign="top" align="left">3</td><td colspan="4" width="100%"><a href="bigloo-4.html#Modules">Modules</a></td></tr>
 <tr><td valign="top" align="left">4</td><td colspan="4" width="100%"><a href="bigloo-5.html#Core-Language">Core Language</a></td></tr>
 <tr><td valign="top" align="left">5</td><td colspan="4" width="100%"><a href="bigloo-6.html#DSSSL-support">DSSSL support</a></td></tr>
 <tr><td valign="top" align="left">6</td><td colspan="4" width="100%"><a href="bigloo-7.html#Standard-Library">Standard Library</a></td></tr>
 <tr><td valign="top" align="left">7</td><td colspan="4" width="100%"><a href="bigloo-8.html#Pattern-Matching">Pattern Matching</a></td></tr>
 <tr><td valign="top" align="left">8</td><td colspan="4" width="100%"><a href="bigloo-9.html#Fast-search">Fast search</a></td></tr>
 <tr><td valign="top" align="left">9</td><td colspan="4" width="100%"><a href="bigloo-10.html#Structures-and-Records">Structures and Records</a></td></tr>
 <tr><td valign="top" align="left">10</td><td colspan="4" width="100%"><a href="bigloo-11.html#Object-System">Object System</a></td></tr>
 <tr><td valign="top" align="left">11</td><td colspan="4" width="100%"><a href="bigloo-12.html#Regular-parsing">Regular parsing</a></td></tr>
 <tr><td valign="top" align="left">12</td><td colspan="4" width="100%"><a href="bigloo-13.html#Lalr(1)-parsing">Lalr(1) parsing</a></td></tr>
 <tr><td valign="top" align="left">13</td><td colspan="4" width="100%"><a href="bigloo-14.html#Posix-Regular-Expressions">Posix Regular Expressions</a></td></tr>
 <tr><td valign="top" align="left">14</td><td colspan="4" width="100%"><a href="bigloo-15.html#Command-Line-Parsing">Command Line Parsing</a></td></tr>
 <tr><td valign="top" align="left">15</td><td colspan="4" width="100%"><a href="bigloo-16.html#Cryptography">Cryptography</a></td></tr>
 <tr><td valign="top" align="left">16</td><td colspan="4" width="100%"><a href="bigloo-17.html#Errors-Assertions-and-Traces">Errors, Assertions, and Traces</a></td></tr>
 <tr><td valign="top" align="left">17</td><td colspan="4" width="100%"><a href="bigloo-18.html#Threads">Threads</a></td></tr>
 <tr><td valign="top" align="left">18</td><td colspan="4" width="100%"><a href="bigloo-19.html#Database-library">Database library</a></td></tr>
 <tr><td valign="top" align="left">19</td><td colspan="4" width="100%"><a href="bigloo-20.html#Multimedia-library">Multimedia library</a></td></tr>
 <tr><td valign="top" align="left">20</td><td colspan="4" width="100%"><a href="bigloo-21.html#Mail-library">Mail library</a></td></tr>
 <tr><td valign="top" align="left">21</td><td colspan="4" width="100%"><a href="bigloo-22.html#Eval-and-code-interpretation">Eval and code interpretation</a></td></tr>
 <tr><td valign="top" align="left">22</td><td colspan="4" width="100%"><a href="bigloo-23.html#Macro-expansion">Macro expansion</a></td></tr>
 <tr><td valign="top" align="left">23</td><td colspan="4" width="100%"><a href="bigloo-24.html#Parameters">Parameters</a></td></tr>
 <tr><td valign="top" align="left">24</td><td colspan="4" width="100%"><a href="bigloo-25.html#Explicit-typing">Explicit typing</a></td></tr>
 <tr><td valign="top" align="left">25</td><td colspan="4" width="100%"><a href="bigloo-26.html#The-C-interface">The C interface</a></td></tr>
 <tr><td valign="top" align="left">26</td><td colspan="4" width="100%"><a href="bigloo-27.html#The-Java-interface">The Java interface</a></td></tr>
 <tr><td valign="top" align="left">27</td><td colspan="4" width="100%"><a href="bigloo-28.html#Bigloo-Libraries">Bigloo Libraries</a></td></tr>
 <tr><td valign="top" align="left">28</td><td colspan="4" width="100%"><a href="bigloo-29.html#Extending-the-Runtime-System">Extending the Runtime System</a></td></tr>
 <tr><td valign="top" align="left">29</td><td colspan="4" width="100%"><a href="bigloo-30.html#SRFIs">SRFIs</a></td></tr>
 <tr><td valign="top" align="left">30</td><td colspan="4" width="100%"><a href="bigloo-31.html#Compiler-description">Compiler description</a></td></tr>
 <tr><td valign="top" align="left">31</td><td colspan="4" width="100%"><a href="bigloo-32.html#User-Extensions">User Extensions</a></td></tr>
 <tr><td valign="top" align="left">32</td><td colspan="4" width="100%"><a href="bigloo-33.html#Bigloo-Development-Environment">Bigloo Development Environment</a></td></tr>
 <tr><td valign="top" align="left">33</td><td colspan="4" width="100%"><a href="bigloo-34.html#Global-Index">Global Index</a></td></tr>
 <tr><td valign="top" align="left">34</td><td colspan="4" width="100%"><a href="bigloo-35.html#Library-Index">Library Index</a></td></tr>
 <tr><td valign="top" align="left"></td><td colspan="4" width="100%"><a href="bigloo-36.html#Bibliography">Bibliography</a></td></tr>
</tbody>
</table>
</td></tr>
</tbody></table>
</center>
</div></td>
<td align="left" valign="top" class="skribe-body"><div class="skribe-body">
<a name="Posix-Regular-Expressions" class="mark"></a><a name="g16497" class="mark"></a>
This whole section has been written by <strong id='bold16499'
>Dorai Sitaram</strong>. 
It consists in the documentation of the <code id='code16500'
>pregexp</code> package that may be 
found at <a href="http://www.ccs.neu.edu/~dorai/pregexp/pregexp.html">http://www.ccs.neu.edu/~dorai/pregexp/pregexp.html</a>.<br/><br/><br/>
The regexp notation supported is modeled on Perl's, and includes such
powerful directives as numeric and nongreedy quantifiers, capturing and
non-capturing clustering, POSIX character classes, selective case- and
space-insensitivity, backreferences, alternation, backtrack pruning,
positive and negative lookahead and lookbehind, in addition to the more
basic directives familiar to all regexp users.  A <em id='emph16503'
>regexp</em> is a
string that describes a pattern.  A regexp matcher tries to <em id='emph16504'
>match</em>
this pattern against (a portion of) another string, which we will call
the <em id='emph16505'
>text string</em>.  The text string is treated as raw text and not
as a pattern.<br/><br/>Most of the characters in a regexp pattern are meant to match
occurrences of themselves in the text string.  Thus, the pattern
<code id='code16507'
>&quot;abc&quot;</code> matches a string that contains the characters <code id='code16508'
>a</code>, <code id='code16509'
>b</code>,
<code id='code16510'
>c</code> in succession.<br/><br/>In the regexp pattern, some characters  act as 
<em id='emph16512'
>metacharacters</em>, and some character sequences act as
<em id='emph16513'
>metasequences</em>.  That is, they specify something
other than their literal selves.  For example, in the
pattern <code id='code16514'
>&quot;a.c&quot;</code>, the characters <code id='code16515'
>a</code> and <code id='code16516'
>c</code> do
stand for themselves but the <em id='emph16517'
>metacharacter</em> <code id='code16518'
>.</code>
can match <em id='emph16519'
>any</em> character (other than
newline).  Therefore, the pattern <code id='code16520'
>&quot;a.c&quot;</code>
matches an <code id='code16521'
>a</code>, followed by <em id='emph16522'
>any</em> character,
followed by a <code id='code16523'
>c</code>. <br/><br/>If we needed to match the character <code id='code16525'
>.</code> itself,
we <em id='emph16526'
>escape</em> it, ie, precede it with a backslash
(<code id='code16527'
>\</code>).  The character sequence <code id='code16528'
>\.</code> is thus a 
<em id='emph16529'
>metasequence</em>, since it doesn't match itself but rather
just <code id='code16530'
>.</code>.  So, to match <code id='code16531'
>a</code> followed by a literal
<code id='code16532'
>.</code> followed by <code id='code16533'
>c</code>, we use the regexp pattern
<code id='code16534'
>&quot;a\\.c&quot;</code>.<a href="#footnote-footnote16536"><sup><small>1</small></sup></a>
Another example of a metasequence is <code id='code16537'
>\t</code>, which is a
readable way to represent the tab character.<br/><br/>We will call the string representation of a regexp the
<em id='emph16539'
>U-regexp</em>, where <em id='emph16540'
>U</em> can be taken to mean <em id='emph16541'
>Unix-style</em> or 
<em id='emph16542'
>universal</em>, because this
notation for regexps is universally familiar.  Our
implementation uses an intermediate tree-like
representation called the <em id='emph16543'
>S-regexp</em>, where <em id='emph16544'
>S</em>
can stand for <em id='emph16545'
>Scheme</em>, <em id='emph16546'
>symbolic</em>, or 
<em id='emph16547'
>s-expression</em>.  S-regexps are more verbose
and less readable than U-regexps, but they are much
easier for Scheme's recursive procedures to navigate. <br/><br/>
<!-- Regular Expressions Procedures -->
<a name="Regular-Expressions-Procedures"></a>
<div class="section-atitle"><table width="100%"><tr><td bgcolor="#dedeff"><h3><font color="black">13.1 Regular Expressions Procedures</font>
</h3></td></tr></table>
</div><div class="section">
<a name="Regular-Expressions-Procedures" class="mark"></a>
Four procedures <code id='code16549'
>pregexp</code>, <code id='code16550'
>pregexp-match-positions</code>,
<code id='code16551'
>pregexp-match</code>, <code id='code16552'
>pregexp-replace</code>, and
<code id='code16553'
>pregexp-replace*</code> enable compilation and matching of regular
expressions.<br/><br/><table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16556" class="mark"></a><a name="pregexp" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16560" align="left" colspan="1"><strong id='bold16558'
>pregexp</strong><em id='it16559'
> U-regexp</em></td><td id="tc16561" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>
The procedure <code id='code16564'
>pregexp</code> takes a U-regexp, which is a
string, and returns an S-regexp, which is a tree.  <br/><br/><center id='center16573'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16571'
>(pregexp <font color="red">&quot;c.r&quot;</font>) =&gt; (<strong id='bold28181'
>:sub</strong> (<strong id='bold28183'
>:or</strong> (<strong id='bold28185'
>:seq</strong> #\c <strong id='bold28187'
>:any</strong> #\r)))
</pre>
</td></tr>
</tbody></table></center>

There is rarely any need to look at the S-regexps returned by <code id='code16574'
>pregexp</code>.
</td></tr>
</tbody></table><br/><br/><br/><table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16579" class="mark"></a><a name="pregexp-match-positions" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16583" align="left" colspan="1"><strong id='bold16581'
>pregexp-match-positions</strong><em id='it16582'
> regexp string</em></td><td id="tc16584" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>

The procedure <code id='code16587'
>pregexp-match-positions</code> takes a
regexp pattern and a text string, and returns a <em id='emph16588'
>match</em> 
if the pattern <em id='emph16589'
>matches</em> the text string.
The pattern may be either a U- or an S-regexp.
(<code id='code16590'
>pregexp-match-positions</code> will internally compile a
U-regexp to an S-regexp before proceeding with the
matching.  If you find yourself calling
<code id='code16591'
>pregexp-match-positions</code> repeatedly with the same
U-regexp, it may be advisable to explicitly convert the
latter into an S-regexp once beforehand, using
<code id='code16592'
>pregexp</code>, to save needless recompilation.)<br/><br/><code id='code16594'
>pregexp-match-positions</code> returns <code id='code16595'
>#f</code> if the pattern did not
match the string; and a list of <em id='emph16596'
>index pairs</em> if it
did match. Eg,<br/><br/><center id='center16604'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16602'
>(pregexp-match-positions <font color="red">&quot;brain&quot;</font> <font color="red">&quot;bird&quot;</font>)
 =&gt; #f
(pregexp-match-positions <font color="red">&quot;needle&quot;</font> <font color="red">&quot;hay needle stack&quot;</font>)
 =&gt; ((4 . 10))
</pre>
</td></tr>
</tbody></table></center>

In the second example, the integers 4 and 10 identifythe substring that was matched. 1 is the starting
(inclusive) index and 2 the ending (exclusive) index of
the matching substring.<br/><br/><center id='center16610'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16608'
>(substring <font color="red">&quot;hay needle stack&quot;</font> 4 10)
 =&gt; <font color="red">&quot;needle&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

Here, <code id='code16611'
>pregexp-match-positions</code>'s return list contains only 
one index pair, and that pair represents the entire
substring matched by the regexp.  When we discuss
<em id='emph16612'
>subpatterns</em> later, we will see how a single match
operation can yield a list of <em id='emph16613'
>submatches</em>.<br/><br/><code id='code16615'
>pregexp-match-positions</code> takes optional third
and fourth arguments that specify the indices of
the text string within which the matching should
take place.   <br/><br/><center id='center16621'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16619'
>(pregexp-match-positions <font color="red">&quot;needle&quot;</font> 
  <font color="red">&quot;his hay needle stack -- my hay needle stack -- her hay needle stack&quot;</font>
  24 43)
 =&gt; ((31 . 37))
</pre>
</td></tr>
</tbody></table></center>

Note that the returned indices are still reckoned
relative to the full text string.  
</td></tr>
</tbody></table><br/>
<table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16625" class="mark"></a><a name="pregexp-match" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16629" align="left" colspan="1"><strong id='bold16627'
>pregexp-match</strong><em id='it16628'
> regexp string</em></td><td id="tc16630" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>
The procedure <code id='code16633'
>pregexp-match</code> is called like 
<code id='code16634'
>pregexp-match-positions</code>
but instead of returning index pairs it returns the
matching substrings:<br/><br/><center id='center16643'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16641'
>(pregexp-match <font color="red">&quot;brain&quot;</font> <font color="red">&quot;bird&quot;</font>)
 =&gt; #f
(pregexp-match <font color="red">&quot;needle&quot;</font> <font color="red">&quot;hay needle stack&quot;</font>)
 =&gt; (<font color="red">&quot;needle&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

<code id='code16644'
>pregexp-match</code> also takes optional third and
fourth arguments, with the same meaning as does
<code id='code16645'
>pregexp-match-positions</code>.
</td></tr>
</tbody></table><br/>
<table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16649" class="mark"></a><a name="pregexp-replace" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16653" align="left" colspan="1"><strong id='bold16651'
>pregexp-replace</strong><em id='it16652'
> regexp string1 string2</em></td><td id="tc16654" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>
The procedure <code id='code16657'
>pregexp-replace</code> replaces the
matched portion of the text string by another
string.  The first argument is the regexp,
the second the text string, and the third
is the <em id='emph16658'
>insert string</em> (string to be inserted).<br/><br/><center id='center16666'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16664'
>(pregexp-replace <font color="red">&quot;te&quot;</font> <font color="red">&quot;liberte&quot;</font> <font color="red">&quot;ty&quot;</font>) 
 =&gt; <font color="red">&quot;liberty&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

If the pattern doesn't occur in the text string, the returned string is 
identical (<code id='code16667'
>eq?</code>) to the text string.
</td></tr>
</tbody></table><br/>
<table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16671" class="mark"></a><a name="pregexp-replace*" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16675" align="left" colspan="1"><strong id='bold16673'
>pregexp-replace*</strong><em id='it16674'
> regexp string1 string2</em></td><td id="tc16676" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>
The procedure <code id='code16679'
>pregexp-replace*</code> replaces <em id='emph16680'
>all</em> matches in the
text <code id='code16682'
><em id='it16681'
>string1</em></code> by the insert <code id='code16684'
><em id='it16683'
>string2</em></code>:<br/><br/><center id='center16692'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16690'
>(pregexp-replace* <font color="red">&quot;te&quot;</font> <font color="red">&quot;liberte egalite fraternite&quot;</font> <font color="red">&quot;ty&quot;</font>)
 =&gt; <font color="red">&quot;liberty egality fratyrnity&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

As with <code id='code16693'
>pregexp-replace</code>, if the pattern doesn't occur in the text
string, the returned string is identical (<code id='code16694'
>eq?</code>) to the text string.
</td></tr>
</tbody></table><br/>
<table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16698" class="mark"></a><a name="pregexp-split" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16702" align="left" colspan="1"><strong id='bold16700'
>pregexp-split</strong><em id='it16701'
> regexp string</em></td><td id="tc16703" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>
The procedure <code id='code16706'
>pregexp-split</code> takes two arguments, a
regexp pattern and a text string, and returns a list of
substrings of the text string, where the pattern identifies the 
delimiter separating the substrings.<br/><br/><center id='center16721'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16719'
>(pregexp-split <font color="red">&quot;:&quot;</font> <font color="red">&quot;/bin:/usr/bin:/usr/bin/X11:/usr/local/bin&quot;</font>)
 =&gt; (<font color="red">&quot;/bin&quot;</font> <font color="red">&quot;/usr/bin&quot;</font> <font color="red">&quot;/usr/bin/X11&quot;</font> <font color="red">&quot;/usr/local/bin&quot;</font>)<br/><br/>(pregexp-split <font color="red">&quot; &quot;</font> <font color="red">&quot;pea soup&quot;</font>)
 =&gt; (<font color="red">&quot;pea&quot;</font> <font color="red">&quot;soup&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

If the first argument can match an empty string, then
the list of all the single-character substrings is returned.<br/><br/><center id='center16738'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16736'
>(pregexp-split <font color="red">&quot;&quot;</font> <font color="red">&quot;smithereens&quot;</font>)
 =&gt; (<font color="red">&quot;s&quot;</font> <font color="red">&quot;m&quot;</font> <font color="red">&quot;i&quot;</font> <font color="red">&quot;t&quot;</font> <font color="red">&quot;h&quot;</font> <font color="red">&quot;e&quot;</font> <font color="red">&quot;r&quot;</font> <font color="red">&quot;e&quot;</font> <font color="red">&quot;e&quot;</font> <font color="red">&quot;n&quot;</font> <font color="red">&quot;s&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

To identify one-or-more spaces as the delimiter,
take care to use the regexp <code id='code16739'
>&quot; +&quot;</code>, not <code id='code16740'
>&quot; *&quot;</code>.<br/><br/><center id='center16764'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16762'
>(pregexp-split <font color="red">&quot; +&quot;</font> <font color="red">&quot;split pea     soup&quot;</font>)
 =&gt; (<font color="red">&quot;split&quot;</font> <font color="red">&quot;pea&quot;</font> <font color="red">&quot;soup&quot;</font>)<br/><br/>(pregexp-split <font color="red">&quot; *&quot;</font> <font color="red">&quot;split pea     soup&quot;</font>)
 =&gt; (<font color="red">&quot;s&quot;</font> <font color="red">&quot;p&quot;</font> <font color="red">&quot;l&quot;</font> <font color="red">&quot;i&quot;</font> <font color="red">&quot;t&quot;</font> <font color="red">&quot;p&quot;</font> <font color="red">&quot;e&quot;</font> <font color="red">&quot;a&quot;</font> <font color="red">&quot;s&quot;</font> <font color="red">&quot;o&quot;</font> <font color="red">&quot;u&quot;</font> <font color="red">&quot;p&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>
</td></tr>
</tbody></table><br/>
<table cellspacing="0" class="frame" cellpadding="10" border="1" width="100%"><tbody>
<tr><td><a name="g16768" class="mark"></a><a name="pregexp-quote" class="mark"></a><table width="100%" style="border-collapse: collapse;" frame="void" rules="none"><tbody>
<tr><td id="tc16772" align="left" colspan="1"><strong id='bold16770'
>pregexp-quote</strong><em id='it16771'
> string</em></td><td id="tc16773" align="right" colspan="1">bigloo procedure</td></tr>
</tbody></table>

The procedure <code id='code16776'
>pregexp-quote</code> takes an arbitrary <code id='code16778'
><em id='it16777'
>string</em></code> and 
returns a U-regexp (string) that precisely represents it. In particular, 
characters in the input string that could serve as regexp metacharacters are 
escaped with a backslash, so that they safely match only themselves.<br/><br/><center id='center16787'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16785'
>(pregexp-quote <font color="red">&quot;cons&quot;</font>)
 =&gt; <font color="red">&quot;cons&quot;</font><br/><br/>(pregexp-quote <font color="red">&quot;list?&quot;</font>)
 =&gt; <font color="red">&quot;list\\?&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

<code id='code16788'
>pregexp-quote</code> is useful when building a composite regexp 
from a mix of regexp strings and verbatim strings. 
</td></tr>
</tbody></table><br/>
</div><br>
<!-- Regular Expressions Pattern Language -->
<a name="Regular-Expressions-Pattern-Language"></a>
<div class="section-atitle"><table width="100%"><tr><td bgcolor="#dedeff"><h3><font color="black">13.2 Regular Expressions Pattern Language</font>
</h3></td></tr></table>
</div><div class="section">
<a name="The-Regular-Expressions-Pattern-Language" class="mark"></a>

Here is a complete description of the regexp pattern
language recognized by the <code id='code16791'
>pregexp</code> procedures.<br/><br/><!-- Basic assertions -->
<a name="Basic-assertions"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.1 Basic assertions</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Basic-assertions" class="mark"></a>
The <em id='emph16793'
>assertions</em> <code id='code16794'
>^</code> and <code id='code16795'
>$</code> identify the beginning and
the end of the text string respectively.  They ensure that their
adjoining regexps match at one or other end of the text string.
Examples:<br/><br/><center id='center16801'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16799'
>(pregexp-match-positions <font color="red">&quot;^contact&quot;</font> <font color="red">&quot;first contact&quot;</font>) =&gt; #f 
</pre>
</td></tr>
</tbody></table></center>

The regexp fails to match because <code id='code16802'
>contact</code> does notoccur at the beginning of the text string.<br/><br/><center id='center16808'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16806'
>(pregexp-match-positions <font color="red">&quot;laugh$&quot;</font> <font color="red">&quot;laugh laugh laugh laugh&quot;</font>) =&gt; ((18 . 23))
</pre>
</td></tr>
</tbody></table></center>

The regexp matches the <em id='emph16809'
>last</em> <code id='code16810'
>laugh</code>.
The metasequence <code id='code16811'
>\b</code> asserts that
a <em id='emph16812'
>word boundary</em> exists. <br/><br/><center id='center16818'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16816'
>(pregexp-match-positions <font color="red">&quot;yack\\b&quot;</font> <font color="red">&quot;yackety yack&quot;</font>) =&gt; ((8 . 12))
</pre>
</td></tr>
</tbody></table></center>

The <code id='code16819'
>yack</code> in <code id='code16820'
>yackety</code> doesn't end at a wordboundary so it isn't matched.  The second <code id='code16821'
>yack</code> does and is.<br/><br/>The metasequence <code id='code16823'
>\B</code> has the opposite effect to <code id='code16824'
>\b</code>.  It
asserts that a word boundary does not exist.<br/><br/><center id='center16830'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16828'
>(pregexp-match-positions <font color="red">&quot;an\\B&quot;</font> <font color="red">&quot;an analysis&quot;</font>) =&gt; ((3 . 5))
</pre>
</td></tr>
</tbody></table></center>

The <code id='code16831'
>an</code> that doesn't end in a word boundaryis matched.<br/><br/></div>
<!-- Characters and character classes -->
<a name="Characters-and-character-classes"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.2 Characters and character classes</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Characters-and-character-classes" class="mark"></a>
Typically a character in the regexp matches the same character in the
text string.  Sometimes it is necessary or convenient to use a regexp
metasequence to refer to a single character.  Thus, metasequences
<code id='code16833'
>\n</code>, <code id='code16834'
>\r</code>, <code id='code16835'
>\t</code>, and <code id='code16836'
>\.</code>  match the newline,
return, tab and period characters respectively.<br/><br/>The <em id='emph16838'
>metacharacter</em> period (<code id='code16839'
>.</code>) matches
<em id='emph16840'
>any</em> character other than newline.<br/><br/><center id='center16847'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16845'
>(pregexp-match <font color="red">&quot;p.t&quot;</font> <font color="red">&quot;pet&quot;</font>) =&gt; (<font color="red">&quot;pet&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

It also matches <code id='code16848'
>pat</code>, <code id='code16849'
>pit</code>, <code id='code16850'
>pot</code>, <code id='code16851'
>put</code>,and <code id='code16852'
>p8t</code> but not <code id='code16853'
>peat</code> or <code id='code16854'
>pfffft</code>.<br/><br/>A <em id='emph16856'
>character class</em> matches any one character from a set of
characters.  A typical format for this is the <em id='emph16857'
>bracketed character
class</em> <code id='code16858'
>[</code>...<code id='code16859'
>]</code>, which matches any one character from the
non-empty sequence of characters enclosed within the
brackets.<a href="#footnote-footnote16860"><sup><small>2</small></sup></a>  Thus <code id='code16861'
>&quot;p[aeiou]t&quot;</code> matches
<code id='code16862'
>pat</code>, <code id='code16863'
>pet</code>, <code id='code16864'
>pit</code>, <code id='code16865'
>pot</code>, <code id='code16866'
>put</code> and nothing
else.<br/><br/>Inside the brackets, a hyphen (<code id='code16868'
>-</code>) between two characters
specifies the ascii range between the characters.  Eg,
<code id='code16869'
>&quot;ta[b-dgn-p]&quot;</code> matches <code id='code16870'
>tab</code>, <code id='code16871'
>tac</code>, <code id='code16872'
>tad</code>,
<em id='emph16873'
>and</em> <code id='code16874'
>tag</code>, <em id='emph16875'
>and</em> <code id='code16876'
>tan</code>, <code id='code16877'
>tao</code>, <code id='code16878'
>tap</code>.<br/><br/>An initial caret (<code id='code16880'
>^</code>) after the left bracket inverts the set
specified by the rest of the contents, ie, it specifies the set of
characters <em id='emph16881'
>other than</em> those identified in the brackets.  Eg,
<code id='code16882'
>&quot;do[^g]&quot;</code> matches all three-character sequences starting with
<code id='code16883'
>do</code> except <code id='code16884'
>dog</code>.<br/><br/>Note that the metacharacter <code id='code16886'
>^</code> inside brackets means something
quite different from what it means outside.  Most other metacharacters
(<code id='code16887'
>.</code>, <code id='code16888'
>*</code>, <code id='code16889'
>+</code>, <code id='code16890'
>?</code>, etc) cease to be metacharacters
when inside brackets, although you may still escape them for peace of
mind.  <code id='code16891'
>-</code> is a metacharacter only when it's inside brackets, and
neither the first nor the last character.<br/><br/>Bracketed character classes cannot contain other bracketed character
classes (although they contain certain other types of character classes
--- see below).  Thus a left bracket (<code id='code16893'
>[</code>) inside a bracketed
character class doesn't have to be a metacharacter; it can stand for
itself.  Eg, <code id='code16894'
>&quot;[a[b]&quot;</code> matches <code id='code16895'
>a</code>, <code id='code16896'
>[</code>, and <code id='code16897'
>b</code>.<br/><br/>Furthermore, since empty bracketed character classes are disallowed, a
right bracket (<code id='code16899'
>]</code>) immediately occurring after the opening left
bracket also doesn't need to be a metacharacter.  Eg, <code id='code16900'
>&quot;[]ab]&quot;</code>
matches <code id='code16901'
>]</code>, <code id='code16902'
>a</code>, and <code id='code16903'
>b</code>.<br/><br/></div>
<!-- Some frequently used character classes -->
<a name="Some-frequently-used-character-classes"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.3 Some frequently used character classes</font>
</h3></td></tr></table>
</div><div class="subsection">

Some standard character classes can be conveniently represented as
metasequences instead of as explicit bracketed expressions.  <code id='code16905'
>\d</code>
matches a digit (<code id='code16906'
>[0-9]</code>); <code id='code16907'
>\s</code> matches a whitespace
character; and <code id='code16908'
>\w</code> matches a character that could be part of a
``word''.<a href="#footnote-footnote16910"><sup><small>3</small></sup></a><br/><br/>The upper-case versions of these metasequences stand for the inversions
of the corresponding character classes.  Thus <code id='code16912'
>\D</code> matches a
non-digit, <code id='code16913'
>\S</code> a non-whitespace character, and <code id='code16914'
>\W</code> a
non-``word'' character.<br/><br/>Remember to include a double backslash when putting these metasequences
in a Scheme string:<br/><br/><center id='center16922'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16920'
>(pregexp-match <font color="red">&quot;\\d\\d&quot;</font> <font color="red">&quot;0 dear, 1 have 2 read catch 22 before 9&quot;</font>) =&gt; (<font color="red">&quot;22&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

These character classes can be used inside 
a bracketed expression.  Eg,
<code id='code16923'
>&quot;[a-z\\d]&quot;</code> matches a lower-case letter
or a digit.<br/><br/></div>
<!-- POSIX character classes -->
<a name="POSIX-character-classes"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.4 POSIX character classes</font>
</h3></td></tr></table>
</div><div class="subsection">

A <em id='emph16925'
>POSIX character class</em> is a special metasequence
of the form <code id='code16926'
>[:</code>...<code id='code16927'
>:]</code> that can be used only
inside a bracketed expression.  The POSIX classes
supported are  <br/><br/><center id='center16953'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ccccff"><pre class="prog" id='prog16951'
><code id='code16929'
>[:alnum:]</code>  letters and digits 
<code id='code16930'
>[:alpha:]</code>  letters  
<code id='code16931'
>[:algor:]</code>  the letters <code id='code16932'
>c</code>, <code id='code16933'
>h</code>, <code id='code16934'
>a</code> and <code id='code16935'
>d</code> 
<code id='code16936'
>[:ascii:]</code>  7-bit ascii characters 
<code id='code16937'
>[:blank:]</code>  widthful whitespace, ie, space and tab 
<code id='code16938'
>[:cntrl:]</code>  ``control'' characters, viz, those with code <code id='code16939'
>&lt;</code> 32 
<code id='code16940'
>[:digit:]</code>  digits, same as <code id='code16941'
>\d</code> 
<code id='code16942'
>[:graph:]</code>  characters that use ink 
<code id='code16943'
>[:lower:]</code>  lower-case letters 
<code id='code16944'
>[:print:]</code>  ink-users plus widthful whitespace  
<code id='code16945'
>[:space:]</code>  whitespace, same as <code id='code16946'
>\s</code> 
<code id='code16947'
>[:upper:]</code>  upper-case letters 
<code id='code16948'
>[:word:]</code>   letters, digits, and underscore, same as <code id='code16949'
>\w</code> 
<code id='code16950'
>[:xdigit:]</code> hex digits 
</pre>
</td></tr>
</tbody></table></center>

For example, the regexp  <code id='code16954'
>&quot;[[:alpha:]_]&quot;</code>matches a letter or underscore.  <br/><br/><center id='center16966'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16964'
>(pregexp-match <font color="red">&quot;[[:alpha:]_]&quot;</font> <font color="red">&quot;--x--&quot;</font>) =&gt; (<font color="red">&quot;x&quot;</font>)
(pregexp-match <font color="red">&quot;[[:alpha:]_]&quot;</font> <font color="red">&quot;--_--&quot;</font>) =&gt; (<font color="red">&quot;_&quot;</font>)
(pregexp-match <font color="red">&quot;[[:alpha:]_]&quot;</font> <font color="red">&quot;--:--&quot;</font>) =&gt; #f
</pre>
</td></tr>
</tbody></table></center>

The POSIX class notation is valid <em id='emph16967'
>only</em> inside a
bracketed expression.  For instance, <code id='code16968'
>[:alpha:]</code>,
when not inside a bracketed expression, will <em id='emph16969'
>not</em>
be read as the letter class.
Rather it is (from previous principles) the character
class containing the characters <code id='code16970'
>:</code>, <code id='code16971'
>a</code>, <code id='code16972'
>l</code>,
<code id='code16973'
>p</code>, <code id='code16974'
>h</code>.<br/><br/><center id='center16983'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog16981'
>(pregexp-match <font color="red">&quot;[[:alpha:]]&quot;</font> <font color="red">&quot;--a--&quot;</font>) =&gt; (<font color="red">&quot;a&quot;</font>)
(pregexp-match <font color="red">&quot;[[:alpha:]]&quot;</font> <font color="red">&quot;--_--&quot;</font>) =&gt; #f
</pre>
</td></tr>
</tbody></table></center>

By placing a caret (<code id='code16984'
>^</code>) immediately after
<code id='code16985'
>[:</code>, you get the inversion of that POSIX
character class.  Thus, <code id='code16986'
>[:^alpha]</code> 
is the class containing all characters 
except the letters.<br/><br/></div>
<!-- Quantifiers -->
<a name="Quantifiers"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.5 Quantifiers</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Quantifiers" class="mark"></a>
The <em id='emph16988'
>quantifiers</em> <code id='code16989'
>*</code>, <code id='code16990'
>+</code>, and <code id='code16991'
>?</code> match
respectively: zero or more, one or more, and zero or one instances of
the preceding subpattern.<br/><br/><center id='center17011'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17009'
>(pregexp-match-positions <font color="red">&quot;c[ad]*r&quot;</font> <font color="red">&quot;cadaddadddr&quot;</font>) =&gt; ((0 . 11))
(pregexp-match-positions <font color="red">&quot;c[ad]*r&quot;</font> <font color="red">&quot;cr&quot;</font>)          =&gt; ((0 . 2))<br/><br/>(pregexp-match-positions <font color="red">&quot;c[ad]+r&quot;</font> <font color="red">&quot;cadaddadddr&quot;</font>) =&gt; ((0 . 11))
(pregexp-match-positions <font color="red">&quot;c[ad]+r&quot;</font> <font color="red">&quot;cr&quot;</font>)          =&gt; #f<br/><br/>(pregexp-match-positions <font color="red">&quot;c[ad]?r&quot;</font> <font color="red">&quot;cadaddadddr&quot;</font>) =&gt; #f
(pregexp-match-positions <font color="red">&quot;c[ad]?r&quot;</font> <font color="red">&quot;cr&quot;</font>)          =&gt; ((0 . 2))
(pregexp-match-positions <font color="red">&quot;c[ad]?r&quot;</font> <font color="red">&quot;car&quot;</font>)         =&gt; ((0 . 3))
</pre>
</td></tr>
</tbody></table></center>

</div>
<!-- Numeric quantifiers -->
<a name="Numeric-quantifiers"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.6 Numeric quantifiers</font>
</h3></td></tr></table>
</div><div class="subsection">

You can use braces to specify much finer-tuned quantification than is
possible with <code id='code17012'
>*</code>, <code id='code17013'
>+</code>, <code id='code17014'
>?</code>.<br/><br/>The quantifier <code id='code17016'
>{m}</code> matches <em id='emph17017'
>exactly</em> <code id='code17018'
>m</code>
instances of the preceding <em id='emph17019'
>subpattern</em>.  <code id='code17020'
>m</code>
must be a nonnegative integer.<br/><br/>The quantifier <code id='code17022'
>{m,n}</code> matches at least <code id='code17023'
>m</code> and at most
<code id='code17024'
>n</code> instances.  <code id='code17025'
>m</code> and <code id='code17026'
>n</code> are nonnegative integers with
<code id='code17027'
>m &lt;= n</code>.  You may omit either or both numbers, in which case
<code id='code17028'
>m</code> defaults to 0 and <code id='code17029'
>n</code> to infinity.<br/><br/>It is evident that <code id='code17031'
>+</code> and <code id='code17032'
>?</code> are abbreviations for
<code id='code17033'
>{1,}</code> and <code id='code17034'
>{0,1}</code> respectively.  <code id='code17035'
>*</code> abbreviates
<code id='code17036'
>{,}</code>, which is the same as <code id='code17037'
>{0,}</code>.<br/><br/><center id='center17047'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17045'
>(pregexp-match &quot;[aeiou]{3}<font color="red">&quot; &quot;</font>vacuous&quot;)  =&gt; (<font color="red">&quot;uou&quot;</font>)
(pregexp-match &quot;[aeiou]{3}<font color="red">&quot; &quot;</font>evolve&quot;)   =&gt; #f
(pregexp-match &quot;[aeiou]{2,3}<font color="red">&quot; &quot;</font>evolve&quot;) =&gt; #f
(pregexp-match &quot;[aeiou]{2,3}<font color="red">&quot; &quot;</font>zeugma&quot;) =&gt; (<font color="red">&quot;eu&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

</div>
<!-- Non-greedy quantifiers -->
<a name="Non-greedy-quantifiers"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.7 Non-greedy quantifiers</font>
</h3></td></tr></table>
</div><div class="subsection">

The quantifiers described above are <em id='emph17048'
>greedy</em>, ie, they match the
maximal number of instances that would still lead to an overall match
for the full pattern.<br/><br/><center id='center17055'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17053'
>(pregexp-match <font color="red">&quot;&lt;.*&gt;&quot;</font> <font color="red">&quot;&lt;tag1&gt; &lt;tag2&gt; &lt;tag3&gt;&quot;</font>)
 =&gt; (<font color="red">&quot;&lt;tag1&gt; &lt;tag2&gt; &lt;tag3&gt;&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

To make these quantifiers <em id='emph17056'
>non-greedy</em>, append a <code id='code17057'
>?</code> to them.
Non-greedy quantifiers match the minimal number of instances needed to
ensure an overall match.<br/><br/><center id='center17064'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17062'
>(pregexp-match <font color="red">&quot;&lt;.*?&gt;&quot;</font> <font color="red">&quot;&lt;tag1&gt; &lt;tag2&gt; &lt;tag3&gt;&quot;</font>) =&gt; (<font color="red">&quot;&lt;tag1&gt;&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

The non-greedy quantifiers are respectively:
<code id='code17065'
>*?</code>, <code id='code17066'
>+?</code>, <code id='code17067'
>??</code>, <code id='code17068'
>{m}?</code>, <code id='code17069'
>{m,n}?</code>.
Note the two uses of the metacharacter <code id='code17070'
>?</code>.<br/><br/></div>
<!-- Clusters -->
<a name="Clusters"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.8 Clusters</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Clusters" class="mark"></a>
<em id='emph17072'
>Clustering</em>, ie, enclosure within parens <code id='code17073'
>(</code>...<code id='code17074'
>)</code>,
identifies the enclosed <em id='emph17075'
>subpattern</em> as a single entity.  It causes
the matcher to <em id='emph17076'
>capture</em> the <em id='emph17077'
>submatch</em>, or the portion of the
string matching the subpattern, in addition to the overall match.<br/><br/><center id='center17087'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17085'
>(pregexp-match <font color="red">&quot;([a-z]+) ([0-9]+), ([0-9]+)&quot;</font> <font color="red">&quot;jan 1, 1970&quot;</font>)
 =&gt; (<font color="red">&quot;jan 1, 1970&quot;</font> <font color="red">&quot;jan&quot;</font> <font color="red">&quot;1&quot;</font> <font color="red">&quot;1970&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

Clustering also causes a following quantifier to treat
the entire enclosed subpattern as an entity.<br/><br/><center id='center17095'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17093'
>(pregexp-match <font color="red">&quot;(poo )*&quot;</font> <font color="red">&quot;poo poo platter&quot;</font>) =&gt; (<font color="red">&quot;poo poo &quot;</font> <font color="red">&quot;poo &quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

The number of submatches returned is always equal to the number of
subpatterns specified in the regexp, even if a particular subpattern
happens to match more than one substring or no substring at all.<br/><br/><center id='center17103'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17101'
>(pregexp-match <font color="red">&quot;([a-z ]+;)*&quot;</font> <font color="red">&quot;lather; rinse; repeat;&quot;</font>)
 =&gt; (<font color="red">&quot;lather; rinse; repeat;&quot;</font> <font color="red">&quot; repeat;&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

Here the <code id='code17104'
>*</code>-quantified subpattern matches threetimes, but it is the last submatch that is returned.<br/><br/>It is also possible for a quantified subpattern to
fail to match, even if the overall pattern matches. 
In such cases, the failing submatch is represented
by <code id='code17106'
>#f</code>.<br/><br/><center id='center17123'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17121'
>(<font color="#6959cf"><strong id='bold28323'
>define</strong></font> <font color="#6959cf"><strong id='bold28325'
>date-re</strong></font>
  ;match `month year' or `month day, year'.
  ;subpattern matches day, if present 
  (pregexp <font color="red">&quot;([a-z]+) +([0-9]+,)? *([0-9]+)&quot;</font>))<br/><br/>(pregexp-match date-re <font color="red">&quot;jan 1, 1970&quot;</font>)
 =&gt; (<font color="red">&quot;jan 1, 1970&quot;</font> <font color="red">&quot;jan&quot;</font> <font color="red">&quot;1,&quot;</font> <font color="red">&quot;1970&quot;</font>)<br/><br/>(pregexp-match date-re <font color="red">&quot;jan 1970&quot;</font>)
 =&gt; (<font color="red">&quot;jan 1970&quot;</font> <font color="red">&quot;jan&quot;</font> #f <font color="red">&quot;1970&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

</div>
<!-- Backreferences -->
<a name="Backreferences"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.9 Backreferences</font>
</h3></td></tr></table>
</div><div class="subsection">

Submatches can be used in the insert string argument of the procedures
<code id='code17124'
>pregexp-replace</code> and <code id='code17125'
>pregexp-replace*</code>.  The insert string
can use <code id='code17126'
>\n</code> as a <em id='emph17127'
>backreference</em> to refer back to the
<em id='emph17128'
>n</em>th submatch, ie, the substring that matched the <em id='emph17129'
>n</em>th
subpattern.  <code id='code17130'
>\0</code> refers to the entire match, and it can also be
specified as <code id='code17131'
>\&amp;</code>.<br/><br/><center id='center17150'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17148'
>(pregexp-replace <font color="red">&quot;_(.+?)_&quot;</font> 
  <font color="red">&quot;the _nina_, the _pinta_, and the _santa maria_&quot;</font>
  <font color="red">&quot;*\\1*&quot;</font>)
 =&gt; <font color="red">&quot;the *nina*, the _pinta_, and the _santa maria_&quot;</font><br/><br/>(pregexp-replace* <font color="red">&quot;_(.+?)_&quot;</font> 
  <font color="red">&quot;the _nina_, the _pinta_, and the _santa maria_&quot;</font>
  <font color="red">&quot;*\\1*&quot;</font>)
 =&gt; <font color="red">&quot;the *nina*, the *pinta*, and the *santa maria*&quot;</font><br/><br/>;recall: \S stands for non-whitespace character<br/><br/>(pregexp-replace <font color="red">&quot;(\\S+) (\\S+) (\\S+)&quot;</font>
  <font color="red">&quot;eat to live&quot;</font>
  <font color="red">&quot;\\3 \\2 \\1&quot;</font>)
 =&gt; <font color="red">&quot;live to eat&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

Use <code id='code17151'
>\\</code> in the insert string to specify a literal
backslash.  Also, <code id='code17152'
>\$</code> stands for an empty string,
and is useful for separating a backreference <code id='code17153'
>\n</code>
from an immediately following number.<br/><br/>Backreferences can also be used within the regexp
pattern to refer back to an already matched subpattern
in the pattern.  <code id='code17155'
>\n</code> stands for an exact repeat
of the <em id='emph17156'
>n</em>th submatch.<a href="#footnote-footnote17158"><sup><small>4</small></sup></a> <br/><br/><center id='center17166'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17164'
>(pregexp-match <font color="red">&quot;([a-z]+) and \\1&quot;</font>
  <font color="red">&quot;billions and billions&quot;</font>)
 =&gt; (<font color="red">&quot;billions and billions&quot;</font> <font color="red">&quot;billions&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

Note that the backreference is not simply a repeatof the previous subpattern.  Rather it is a repeat of
<em id='emph17167'
>the particular  substring already matched by the
subpattern</em>. <br/><br/>In the above example, the backreference can only match
<code id='code17169'
>billions</code>.  It will not match <code id='code17170'
>millions</code>, even
though the subpattern it harks back to --- <code id='code17171'
>([a-z]+)</code>
---  would have had no problem doing so: <br/><br/><center id='center17177'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17175'
>(pregexp-match <font color="red">&quot;([a-z]+) and \\1&quot;</font>
  <font color="red">&quot;billions and millions&quot;</font>)
 =&gt; #f 
</pre>
</td></tr>
</tbody></table></center>

The following corrects doubled words:<br/><br/><center id='center17185'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17183'
>(pregexp-replace* <font color="red">&quot;(\\S+) \\1&quot;</font>
  <font color="red">&quot;now is the the time for all good men to to come to the aid of of the party&quot;</font>
  <font color="red">&quot;\\1&quot;</font>)
 =&gt; <font color="red">&quot;now is the time for all good men to come to the aid of the party&quot;</font>
</pre>
</td></tr>
</tbody></table></center>

The following marks all immediately repeating patterns
in a number string:<br/><br/><center id='center17191'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17189'
>(pregexp-replace* <font color="red">&quot;(\\d+)\\1&quot;</font>
  <font color="red">&quot;123340983242432420980980234&quot;</font>
  &quot;{\\1,\\1}&quot;)
 =&gt; &quot;12{3,3}40983{24,24}3242{098,098}0234&quot;
</pre>
</td></tr>
</tbody></table></center>
<br/><br/></div>
<!-- Non-capturing clusters -->
<a name="Non-capturing-clusters"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.10 Non-capturing clusters</font>
</h3></td></tr></table>
</div><div class="subsection">

It is often required to specify a cluster
(typically for quantification) but without triggering
the capture of submatch information.  Such
clusters are called <em id='emph17193'
>non-capturing</em>.  In such cases,
use <code id='code17194'
>(?:</code> instead of <code id='code17195'
>(</code> as the cluster opener.  In
the following example, the  non-capturing cluster 
eliminates the ``directory'' portion of a given
pathname, and the capturing cluster  identifies the
basename.<br/><br/><center id='center17203'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17201'
>(pregexp-match <font color="red">&quot;^(?:[a-z]*/)*([a-z]+)$&quot;</font> 
  <font color="red">&quot;/usr/local/bin/mzscheme&quot;</font>)
 =&gt; (<font color="red">&quot;/usr/local/bin/mzscheme&quot;</font> <font color="red">&quot;mzscheme&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

</div>
<!-- Cloisters -->
<a name="Cloisters"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.11 Cloisters</font>
</h3></td></tr></table>
</div><div class="subsection">

The location between the <code id='code17204'
>?</code> and the <code id='code17205'
>:</code> of a non-capturing
cluster is called a <em id='emph17206'
>cloister</em>.<a href="#footnote-footnote17207"><sup><small>5</small></sup></a>  You can put <em id='emph17208'
>modifiers</em> there
that will cause the enclustered subpattern to be treated specially.  The
modifier <code id='code17209'
>i</code> causes the subpattern to match
<em id='emph17210'
>case-insensitively</em>:<br/><br/><center id='center17217'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17215'
>(pregexp-match <font color="red">&quot;(?i:hearth)&quot;</font> <font color="red">&quot;HeartH&quot;</font>) =&gt; (<font color="red">&quot;HeartH&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

The modifier <code id='code17218'
>x</code> causes the subpattern to match
<em id='emph17219'
>space-insensitively</em>, ie, spaces and
comments within the
subpattern are ignored.  Comments are introduced
as usual with a semicolon (<code id='code17220'
>;</code>) and extend till
the end of the line.  If you need
to include a literal space or semicolon in
a space-insensitized subpattern, escape it
with a backslash.<br/><br/><center id='center17234'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17232'
>(pregexp-match <font color="red">&quot;(?x: a   lot)&quot;</font> <font color="red">&quot;alot&quot;</font>)
 =&gt; (<font color="red">&quot;alot&quot;</font>)<br/><br/>(pregexp-match <font color="red">&quot;(?x: a  \\  lot)&quot;</font> <font color="red">&quot;a lot&quot;</font>)
 =&gt; (<font color="red">&quot;a lot&quot;</font>)<br/><br/>(pregexp-match &quot;(?x:
   a \\ man  \\; \\   ; ignore
   a \\ plan \\; \\   ; me
   a \\ canal         ; completely
   )&quot; 
 <font color="red">&quot;a man; a plan; a canal&quot;</font>)
 =&gt; (<font color="red">&quot;a man; a plan; a canal&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

The global variable <code id='code17235'
>*pregexp-comment-char*</code>contains the comment character (<code id='code17236'
>#\;</code>).  
For Perl-like comments,  <br/><br/><center id='center17241'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17239'
>(<strong id='bold28376'
>set!</strong> *pregexp-comment-char* #\#)
</pre>
</td></tr>
</tbody></table></center>

You can put more than one modifier in the cloister.<br/><br/><center id='center17247'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17245'
>(pregexp-match &quot;(?ix:
   a \\ man  \\; \\   ; ignore
   a \\ plan \\; \\   ; me
   a \\ canal         ; completely
   )&quot; 
 <font color="red">&quot;A Man; a Plan; a Canal&quot;</font>)
 =&gt; (<font color="red">&quot;A Man; a Plan; a Canal&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

A minus sign before a modifier inverts its meaning.
Thus, you can use <code id='code17248'
>-i</code> and <code id='code17249'
>-x</code> in a 
<em id='emph17250'
>subcluster</em> to overturn the insensitivities caused by an
enclosing cluster.<br/><br/><center id='center17257'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17255'
>(pregexp-match <font color="red">&quot;(?i:the (?-i:TeX)book)&quot;</font>
  <font color="red">&quot;The TeXbook&quot;</font>)
 =&gt; (<font color="red">&quot;The TeXbook&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

This regexp will allow any casing for <code id='code17258'
>the</code>and <code id='code17259'
>book</code> but insists that <code id='code17260'
>TeX</code> not be 
differently cased.<br/><br/></div>
<!-- Alternation -->
<a name="Alternation"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.12 Alternation</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Alternation" class="mark"></a>
You can specify a list of <em id='emph17262'
>alternate</em>
subpatterns by separating them by <code id='code17263'
>|</code>.   The <code id='code17264'
>|</code>
separates subpatterns in the nearest enclosing cluster 
(or in the entire pattern string if there are no
enclosing parens).  <br/><br/><center id='center17275'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17273'
>(pregexp-match <font color="red">&quot;f(ee|i|o|um)&quot;</font> <font color="red">&quot;a small, final fee&quot;</font>)
 =&gt; (<font color="red">&quot;fi&quot;</font> <font color="red">&quot;i&quot;</font>)<br/><br/>(pregexp-replace* <font color="red">&quot;([yi])s(e[sdr]?|ing|ation)&quot;</font>
   &quot;it is energising to analyse an organisation 
   pulsing with noisy organisms&quot;
   <font color="red">&quot;\\1z\\2&quot;</font>)
 =&gt; &quot;it is energizing to analyze an organization 
   pulsing with noisy organisms&quot;
</pre>
</td></tr>
</tbody></table></center>
 
Note again that if you wish
to use clustering merely to specify a list of alternate
subpatterns but do not want the submatch, use <code id='code17276'
>(?:</code>
instead of <code id='code17277'
>(</code>. <br/><br/><center id='center17284'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17282'
>(pregexp-match <font color="red">&quot;f(?:ee|i|o|um)&quot;</font> <font color="red">&quot;fun for all&quot;</font>)
 =&gt; (<font color="red">&quot;fo&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

An important thing to note about alternation is that
the leftmost matching alternate is picked regardless of
its length.  Thus, if one of the alternates is a prefix
of a later alternate, the latter may not have 
a chance to match.<br/><br/><center id='center17291'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17289'
>(pregexp-match <font color="red">&quot;call|call-with-current-continuation&quot;</font> 
  <font color="red">&quot;call-with-current-continuation&quot;</font>)
 =&gt; (<font color="red">&quot;call&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

To allow the longer alternate to have a shot at 
matching, place it before the shorter one:<br/><br/><center id='center17298'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17296'
>(pregexp-match <font color="red">&quot;call-with-current-continuation|call&quot;</font>
  <font color="red">&quot;call-with-current-continuation&quot;</font>)
 =&gt; (<font color="red">&quot;call-with-current-continuation&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

In any case, an overall match for the entire regexp is
always preferred to an overall nonmatch.  In the
following, the longer alternate still wins, because its
preferred shorter prefix fails to yield an overall
match.<br/><br/><center id='center17305'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17303'
>(pregexp-match <font color="red">&quot;(?:call|call-with-current-continuation) constrained&quot;</font>
  <font color="red">&quot;call-with-current-continuation constrained&quot;</font>)
 =&gt; (<font color="red">&quot;call-with-current-continuation constrained&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

</div>
<!-- Backtracking -->
<a name="Backtracking"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.13 Backtracking</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Backtracking" class="mark"></a>
We've already seen that greedy quantifiers match
the maximal number of times, but the overriding priority
is that the overall match succeed.  Consider<br/><br/><center id='center17311'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17309'
>(pregexp-match <font color="red">&quot;a*a&quot;</font> <font color="red">&quot;aaaa&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

The regexp consists of two subregexps,<code id='code17312'
>a*</code> followed by <code id='code17313'
>a</code>.
The subregexp <code id='code17314'
>a*</code> cannot be allowed to match
all four <code id='code17315'
>a</code>'s in the text string <code id='code17316'
>&quot;aaaa&quot;</code>, even though
<code id='code17317'
>*</code> is a greedy quantifier.  It may match only the first
three, leaving the last one for the second subregexp.
This ensures that the full regexp matches successfully.<br/><br/>The regexp matcher accomplishes this via a process
called <em id='emph17319'
>backtracking</em>.  The matcher
tentatively allows the greedy quantifier 
to match all four <code id='code17320'
>a</code>'s, but then when it becomes
clear that the overall match is in jeopardy, it 
<em id='emph17321'
>backtracks</em> to a less greedy match of 
<em id='emph17322'
>three</em> <code id='code17323'
>a</code>'s.  If even this fails, as in the
call<br/><br/><center id='center17329'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17327'
>(pregexp-match <font color="red">&quot;a*aa&quot;</font> <font color="red">&quot;aaaa&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

the matcher backtracks even further.  Overallfailure is conceded only when all possible backtracking
has been tried with no success. <br/><br/>Backtracking is not restricted to greedy quantifiers.
Nongreedy quantifiers match as few instances as
possible, and progressively backtrack to more and more
instances in order to attain an overall match.  There
is backtracking in alternation too, as the more
rightward alternates are tried when locally successful
leftward ones fail to yield an overall match.<br/><br/></div>
<!-- Disabling backtracking -->
<a name="Disabling-backtracking"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.14 Disabling backtracking</font>
</h3></td></tr></table>
</div><div class="subsection">

Sometimes it is efficient to disable backtracking.  For
example, we may wish  to  <em id='emph17332'
>commit</em> to a choice, or
we know that trying alternatives is fruitless.  A
nonbacktracking regexp is enclosed in <code id='code17333'
>(?&gt;</code>...<code id='code17334'
>)</code>.<br/><br/><center id='center17340'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17338'
>(pregexp-match <font color="red">&quot;(?&gt;a+).&quot;</font> <font color="red">&quot;aaaa&quot;</font>)
 =&gt; #f
</pre>
</td></tr>
</tbody></table></center>

In this call, the subregexp <code id='code17341'
>?&gt;a*</code> greedily matches
all four <code id='code17342'
>a</code>'s, and is denied the opportunity to
backpedal.  So the overall match is denied.  The effect
of the regexp is therefore to match one or more <code id='code17343'
>a</code>'s
followed by something that is definitely non-<code id='code17344'
>a</code>.<br/><br/></div>
<!-- Looking ahead and behind -->
<a name="Looking-ahead-and-behind"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.15 Looking ahead and behind</font>
</h3></td></tr></table>
</div><div class="subsection">
<a name="Looking-ahead-and-behind" class="mark"></a>
You can have assertions in your pattern that look 
<em id='emph17346'
>ahead</em> or <em id='emph17347'
>behind</em> to ensure that a subpattern does
or does not occur.   These ``look around'' assertions are
specified by putting the subpattern checked for in a
cluster whose leading characters are: <code id='code17348'
>?=</code> (for positive
lookahead), <code id='code17349'
>?!</code> (negative lookahead), <code id='code17350'
>?&lt;=</code>
(positive lookbehind), <code id='code17351'
>?&lt;!</code> (negative lookbehind).
Note that the subpattern in the assertion  does not
generate a match in the final result.  It merely allows
or disallows the rest of the match.<br/><br/></div>
<!-- Lookahead -->
<a name="Lookahead"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.16 Lookahead</font>
</h3></td></tr></table>
</div><div class="subsection">

Positive lookahead (<code id='code17353'
>?=</code>) peeks ahead to ensure that
its subpattern <em id='emph17354'
>could</em> match.  <br/><br/><center id='center17360'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17358'
>(pregexp-match-positions <font color="red">&quot;grey(?=hound)&quot;</font> 
  <font color="red">&quot;i left my grey socks at the greyhound&quot;</font>) 
 =&gt; ((28 . 32))
</pre>
</td></tr>
</tbody></table></center>

The regexp <code id='code17361'
>&quot;grey(?=hound)&quot;</code> matches <code id='code17362'
>grey</code>, but<em id='emph17363'
>only</em> if it is followed by <code id='code17364'
>hound</code>.  Thus, the first
<code id='code17365'
>grey</code> in the text string is not matched. <br/><br/>Negative lookahead (<code id='code17367'
>?!</code>) peeks ahead
to ensure that its subpattern could not possibly match.  <br/><br/><center id='center17373'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17371'
>(pregexp-match-positions <font color="red">&quot;grey(?!hound)&quot;</font>
  <font color="red">&quot;the gray greyhound ate the grey socks&quot;</font>) 
 =&gt; ((27 . 31))
</pre>
</td></tr>
</tbody></table></center>

The regexp <code id='code17374'
>&quot;grey(?!hound)&quot;</code> matches <code id='code17375'
>grey</code>, butonly if it is <em id='emph17376'
>not</em> followed by <code id='code17377'
>hound</code>.  Thus 
the <code id='code17378'
>grey</code> just before <code id='code17379'
>socks</code> is matched.<br/><br/></div>
<!-- Lookbehind -->
<a name="Lookbehind"></a>
<div class="subsection-atitle"><table width="100%"><tr><td bgcolor="#ffffff"><h3><font color="#8381de">13.2.17 Lookbehind</font>
</h3></td></tr></table>
</div><div class="subsection">

Positive lookbehind (<code id='code17381'
>?&lt;=</code>) checks that its subpattern <em id='emph17382'
>could</em> match
immediately to the left of the current position in
the text string.  <br/><br/><center id='center17388'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17386'
>(pregexp-match-positions <font color="red">&quot;(?&lt;=grey)hound&quot;</font>
  <font color="red">&quot;the hound in the picture is not a greyhound&quot;</font>) 
 =&gt; ((38 . 43))
</pre>
</td></tr>
</tbody></table></center>

The regexp <code id='code17389'
>(?&lt;=grey)hound</code> matches <code id='code17390'
>hound</code>, but only if it is preceded by <code id='code17391'
>grey</code>.  <br/><br/>Negative lookbehind
(<code id='code17393'
>?&lt;!</code>) checks that its subpattern
could not possibly match immediately to the left.  <br/><br/><center id='center17399'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17397'
>(pregexp-match-positions <font color="red">&quot;(?&lt;!grey)hound&quot;</font>
  <font color="red">&quot;the greyhound in the picture is not a hound&quot;</font>)
 =&gt; ((38 . 43))
</pre>
</td></tr>
</tbody></table></center>

The regexp <code id='code17400'
>(?&lt;!grey)hound</code> matches <code id='code17401'
>hound</code>, but only if
it is <em id='emph17402'
>not</em> preceded by <code id='code17403'
>grey</code>.<br/><br/>Lookaheads and lookbehinds can be convenient when they
are not confusing.  <br/><br/></div>
</div><br>
<!-- An Extended Example -->
<a name="An-Extended-Example"></a>
<div class="section-atitle"><table width="100%"><tr><td bgcolor="#dedeff"><h3><font color="black">13.3 An Extended Example</font>
</h3></td></tr></table>
</div><div class="section">
<a name="An-Extended-Example" class="mark"></a>
Here's an extended example from Friedl that covers many of the features
described above.  The problem is to fashion a regexp that will match any
and only IP addresses or <em id='emph17406'
>dotted quads</em>, ie, four numbers separated
by three dots, with each number between 0 and 255.  We will use the
commenting mechanism to build the final regexp with clarity.  First, a
subregexp <code id='code17407'
>n0-255</code> that matches 0 through 255.<br/><br/><center id='center17412'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17410'
>(<font color="#6959cf"><strong id='bold28414'
>define</strong></font> <font color="#6959cf"><strong id='bold28416'
>n0-255</strong></font>
  &quot;(?x:
  \\d          ;  0 through   9
  | \\d\\d     ; 00 through  99
  | [01]\\d\\d ;000 through 199
  | 2[0-4]\\d  ;200 through 249
  | 25[0-5]    ;250 through 255
  )&quot;)
</pre>
</td></tr>
</tbody></table></center>

The first two alternates simply get all single- and
double-digit numbers.  Since 0-padding is allowed, we
need to match both 1 and 01.  We need to be careful
when getting 3-digit numbers, since numbers above 255
must be excluded.  So we fashion alternates to get 000
through 199, then 200 through 249, and finally 250
through 255.<a href="#footnote-footnote17414"><sup><small>6</small></sup></a><br/><br/>An IP-address is a string that consists of
four <code id='code17416'
>n0-255</code>s with three dots separating
them.<br/><br/><center id='center17426'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17424'
>(<font color="#6959cf"><strong id='bold28418'
>define</strong></font> <font color="#6959cf"><strong id='bold28420'
>ip-re1</strong></font>
  (string-append
    <font color="red">&quot;^&quot;</font>        ;nothing before
    n0-255     ;the first n0-255,
    <font color="red">&quot;(?x:&quot;</font>     ;then the subpattern of
    <font color="red">&quot;\\.&quot;</font>      ;a dot followed by
    n0-255     ;an n0-255,
    <font color="red">&quot;)&quot;</font>        ;which is
    &quot;{3}&quot;      ;repeated exactly 3 times
    <font color="red">&quot;$&quot;</font>        ;with nothing following
    ))
</pre>
</td></tr>
</tbody></table></center>

Let's try it out.<br/><br/><center id='center17433'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17431'
>(pregexp-match ip-re1 <font color="red">&quot;1.2.3.4&quot;</font>)        =&gt; (<font color="red">&quot;1.2.3.4&quot;</font>)
(pregexp-match ip-re1 <font color="red">&quot;55.155.255.265&quot;</font>) =&gt; #f
</pre>
</td></tr>
</tbody></table></center>

which is fine, except that we also have<br/><br/><center id='center17439'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17437'
>(pregexp-match ip-re1 <font color="red">&quot;0.00.000.00&quot;</font>) =&gt; (<font color="red">&quot;0.00.000.00&quot;</font>)
</pre>
</td></tr>
</tbody></table></center>

All-zero sequences are not valid IP addresses!  Lookahead to the rescue.
Before starting to match <code id='code17440'
>ip-re1</code>, we look ahead to ensure we don't
have all zeros.  We could use positive lookahead to ensure there
<em id='emph17441'
>is</em> a digit other than zero.<br/><br/><center id='center17447'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17445'
>(<font color="#6959cf"><strong id='bold28432'
>define</strong></font> <font color="#6959cf"><strong id='bold28434'
>ip-re</strong></font>
  (string-append
    <font color="red">&quot;(?=.*[1-9])&quot;</font> ;ensure there's a non-0 digit
    ip-re1))
</pre>
</td></tr>
</tbody></table></center>

Or we could use negative lookahead to ensure that what's ahead isn't
composed of <em id='emph17448'
>only</em> zeros and dots.<br/><br/><center id='center17454'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17452'
>(<font color="#6959cf"><strong id='bold28437'
>define</strong></font> <font color="#6959cf"><strong id='bold28439'
>ip-re</strong></font>
  (string-append
    <font color="red">&quot;(?![0.]*$)&quot;</font> ;not just zeros and dots
                 ;(note: dot is not metachar inside [])
    ip-re1))
</pre>
</td></tr>
</tbody></table></center>

The regexp <code id='code17455'
>ip-re</code> will match all and only valid IP addresses.<br/><br/><center id='center17462'
><table cellspacing="0" class="color" cellpadding="0" width="95%"><tbody>
<tr><td bgcolor="#ffffcc"><pre class="prog" id='prog17460'
>(pregexp-match ip-re <font color="red">&quot;1.2.3.4&quot;</font>) =&gt; (<font color="red">&quot;1.2.3.4&quot;</font>)
(pregexp-match ip-re <font color="red">&quot;0.0.0.0&quot;</font>) =&gt; #f
</pre>
</td></tr>
</tbody></table></center>
<br/><br/><br/><br/> 

</div><br>
<div class="footnote"><br><br>
<hr width='20%' size='2' align='left'>
<a name="footnote-footnote16536"><sup><small>1</small></sup></a>: The double backslash is an artifact of
Scheme strings, not the regexp pattern itself.  When we
want a literal backslash inside a Scheme string, we
must escape it so that it shows up in the string at
all. Scheme strings use backslash as the escape
character, so we end up with two backslashes --- one
Scheme-string backslash to escape the regexp backslash,
which then escapes the dot.  Another character that
would need escaping inside a Scheme string is <code id='code16535'
>&quot;</code>.
<br>
<a name="footnote-footnote16860"><sup><small>2</small></sup></a>: Requiring a bracketed character class to be non-empty is not
a limitation, since an empty character class can be more easily
represented by an empty string.
<br>
<a name="footnote-footnote16910"><sup><small>3</small></sup></a>: Following regexp custom, we identify ``word'' characters as
<code id='code16909'
>[A-Za-z0-9_]</code>, although these are too restrictive for what a
Schemer might consider a ``word''.
<br>
<a name="footnote-footnote17158"><sup><small>4</small></sup></a>: <code id='code17157'
>\0</code>, which is useful in
an insert string, makes no  sense within the regexp
pattern, because the entire regexp has not matched yet
that you could refer back to it.
<br>
<a name="footnote-footnote17207"><sup><small>5</small></sup></a>: A useful, if terminally cute,
coinage from the abbots of Perl.
<br>
<a name="footnote-footnote17414"><sup><small>6</small></sup></a>: Note that <code id='code17413'
>n0-255</code> lists prefixes as
preferred alternates, something we cautioned against in
section <a href="bigloo-14.html#Alternation" class="inbound">Alternation</a>. However, since we intend
to anchor this subregexp explicitly to force an overall
match, the order of the alternates does not matter.
<br>
<div></div></td>
</tr></table><div class="skribe-ending">
<hr> 
<p class="ending" id='paragraph28450'
><font size="-1">
This <span class="sc">Html</span> page has been produced by 
<a href="http://www.inria.fr/mimosa/fp/Skribe" class="http">Skribe</a>.
<br/>
Last update <em id='it28448'
>Tue Jun  2 11:43:27 2009</em>.</font></p></div>
</body>
</html>