Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > d07d7ab417d79053e7e0155c99e1a1c8 > files > 2690

mlton-20100608-3.fc15.i686.rpm

<!-- regexp-intro.mldoc -->

<!DOCTYPE ML-DOC SYSTEM>

<AUTHOR
  NAME="Riccardo Pucella"
  EMAIL="riccardo@research.bell-labs.com"
  YEAR=1998>
<COPYRIGHT OWNER="Bell Labs, Lucent Technologies" YEAR=1998>
<VERSION VERID="1.0" YEAR=1998 MONTH=6 DAY=1>

<TITLE>Introduction</TITLE>

<SECTION>
<HEAD>Introduction</HEAD>
<PP>
This is a regular expressions library. It is based on a decoupling
of the surface syntax used to specify regular expressions (the
frontend) and the engine that implements the matcher (the matcher). An
abstract syntax is used to communicate between the front end and the
back end of the system.
<PP>
Given a structure <CD/S1/ describing a surface syntax and a structure
<CD/S2/ describing a matching engine, a regular expression package can
be defined by applying the functor <FCTREF/RegExpFn/ like so:
<CODE>
RegExpFn (structure P=S1  structure E=S2)
</CODE>
<PP>
To match a regular expression, one first needs to compile a
representation in the surface syntax. The type of a compiled regular
expression is given in the <SIGREF/REGEXP/ signature as: 
<CODE>
type regexp
</CODE>
<PP>
Once a regular expression has been compiled, three functions are
provided to perform the matching, <CD/find/, <CD/prefix/ and
<CD/match/. These functions operate on readers as defined in the
<CD/StringCvt/ structure of the Basis Library. A reader
of type <CD/('a,'b) reader/ is a function <CD/'b -> ('a,'b) option/
taking a stream of type <CD/'b/ and returning an element of type
<CD/'a/ and the remainder of the stream, or <CD/NONE/ if the end of
the stream is reached.
<PP> 
The function <CD/find/ returns a reader that searches a stream and
attempts to match the given regular expression. The function
<CD/prefix/ returns a reader that attempts to match the regular
expression at the current position in the stream. The function
<CD/match/ takes a list of regular expressions and functions and
returns a reader that attempts to match one of the regular expressions
at the current position in the stream. The function corresponding to
the matched regular expression is invoked on the matching
information. 
<PP>
Once a match is found, it is returned as a <CD/match_tree/ datatype
This is a hierarchical structure describing the matches of the various
subexpressions appearing in the matched regular expression. A match
for an expression is a record containing the position of the match and
its length. The root of the structure always describes the outermost
match (the whole string matched by the regular expression). 
<PP>

</SECTION>