<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9"> <TITLE> A Guide to the S-Lang Language: Regular Expressions</TITLE> <LINK HREF="slang-19.html" REL=next> <LINK HREF="slang-17.html" REL=previous> <LINK HREF="slang.html#toc18" REL=contents> </HEAD> <BODY> <A HREF="slang-19.html">Next</A> <A HREF="slang-17.html">Previous</A> <A HREF="slang.html#toc18">Contents</A> <HR> <H2><A NAME="s18">18. Regular Expressions</A></H2> <P> <P>The S-Lang library includes a regular expression (RE) package that may be used by an application embedding the library. The RE syntax should be familiar to anyone acquainted with regular expressions. In this section the syntax of the <B>S-Lang</B> regular expressions is discussed. <P> <H2><A NAME="ss18.1">18.1 <B>S-Lang</B> RE Syntax</A> </H2> <P> <P>A regular expression specifies a pattern to be matched against a string, and has the property that the contcatenation of two REs is also a RE. <P>The <B>S-Lang</B> library supports the following standard regular expressions: <BLOCKQUOTE><CODE> <PRE> . match any character except newline * matches zero or more occurences of previous RE + matches one or more occurences of previous RE ? matches zero or one occurence of previous RE ^ matches beginning of a line $ matches end of line [ ... ] matches any single character between brackets. For example, [-02468] matches `-' or any even digit. and [-0-9a-z] matches `-' and any digit between 0 and 9 as well as letters a through z. \< Match the beginning of a word. \> Match the end of a word. \( ... \) \1, \2, ..., \9 Matches the match specified by nth \( ... \) expression. </PRE> </CODE></BLOCKQUOTE> In addition the following extensions are also supported: <BLOCKQUOTE><CODE> <PRE> \c turn on case-sensitivity (default) \C turn off case-sensitivity \d match any digit \e match ESC char </PRE> </CODE></BLOCKQUOTE> Here are some simple examples: <P><CODE>"^int "</CODE> matches the <CODE>"int "</CODE> at the beginning of a line. <P><CODE>"\<money\>"</CODE> matches <CODE>"money"</CODE> but only if it appears as a separate word. <P><CODE>"^$"</CODE> matches an empty line. <P>A more complex pattern is <BLOCKQUOTE><CODE> <PRE> "\(\<[a-zA-Z]+\>\)[ ]+\1\>" </PRE> </CODE></BLOCKQUOTE> which matches any word repeated consecutively. Note how the grouping operators <CODE>\(</CODE> and <CODE>\)</CODE> are used to define the text matched by the enclosed regular expression, and then subsequently referred to <CODE>\1</CODE>. <P>Finally, remember that when used in string literals either in the <B>S-Lang</B> language or in the C language, care must be taken to "double-up" the <CODE>'\'</CODE> character since both languages treat it as as an escape character. <P> <H2><A NAME="ss18.2">18.2 Differences between <B>S-Lang</B> and egrep REs</A> </H2> <P> <P>There are several differences between <B>S-Lang</B> regular expressions and, e.g., <B>egrep</B> regular expressions. <P>The most notable difference is that the <B>S-Lang</B> regular expressions do not support the <B>OR</B> operator <CODE>|</CODE> in expressions. This means that <CODE>"a|b"</CODE> or <CODE>"a\|b"</CODE> do not have the meaning that they have in regular expression packages that support egrep-style expressions. <P>The other main difference is that while <B>S-Lang</B> regular expressions support the grouping operators <CODE>\(</CODE> and <CODE>\)</CODE>, they are only used as a means of specifying the text that is matched. That is, the expression <BLOCKQUOTE><CODE> <PRE> "@\([a-z]*\)@.*@\1@" </PRE> </CODE></BLOCKQUOTE> matches <CODE>"xxx@abc@silly@abc@yyy"</CODE>, where the pattern <CODE>\1</CODE> matches the text enclosed by the <CODE>\(</CODE> and <CODE>\)</CODE> expressions. However, in the current implementation, the grouping operators are not used to group regular expressions to form a single regular expression. Thus expression such as <CODE>"\(hello\)*"</CODE> is <EM>not</EM> a pattern to match zero or more occurances of <CODE>"hello"</CODE> as it is in e.g., <B>egrep</B>. <P>One question that comes up from time to time is why doesn't <B>S-Lang</B> simply employ some posix-compatible regular expression library. The simple answer is that, at the time of this writing, none exists that is is available across all the platforms that the <B>S-Lang</B> library supports (Unix, VMS, OS/2, win32, win16, BEOS, MSDOS, and QNX) and can be distributed under both the GNU and Artistic licenses. It is particularly important that the library and the interpreter support a common set of regular expressions in a platform independent manner. <P> <P> <HR> <A HREF="slang-19.html">Next</A> <A HREF="slang-17.html">Previous</A> <A HREF="slang.html#toc18">Contents</A> </BODY> </HTML>