<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> <title>User's Guide</title> <link rel="stylesheet" href="../../../doc/src/boostbook.css" type="text/css"> <meta name="generator" content="DocBook XSL Stylesheets V1.75.2"> <link rel="home" href="../index.html" title="The Boost C++ Libraries BoostBook Documentation Subset"> <link rel="up" href="../xpressive.html" title="Chapter 29. Boost.Xpressive"> <link rel="prev" href="../xpressive.html" title="Chapter 29. Boost.Xpressive"> <link rel="next" href="reference.html" title="Reference"> </head> <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> <table cellpadding="2" width="100%"><tr> <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../boost.png"></td> <td align="center"><a href="../../../index.html">Home</a></td> <td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td> <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> <td align="center"><a href="../../../more/index.htm">More</a></td> </tr></table> <hr> <div class="spirit-nav"> <a accesskey="p" href="../xpressive.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="reference.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> </div> <div class="section"> <div class="titlepage"><div><div><h2 class="title" style="clear: both"> <a name="xpressive.user_s_guide"></a><a class="link" href="user_s_guide.html" title="User's Guide">User's Guide</a> </h2></div></div></div> <div class="toc"><dl> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.introduction">Introduction</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive">Installing xpressive</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start">Quick Start</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#xpressive.user_s_guide.creating_a_regex_object">Creating a Regex Object</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching">Matching and Searching</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results">Accessing Results</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions">String Substitutions</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization">String Splitting and Tokenization</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures">Named Captures</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches">Grammars and Nested Matches</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions">Semantic Actions and User-Defined Assertions</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes">Symbol Tables and Attributes</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits">Localization and Regex Traits</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks"> Tips 'N Tricks</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.concepts">Concepts</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.examples">Examples</a></span></dt> </dl></div> <p> This section describes how to use xpressive to accomplish text manipulation and parsing tasks. If you are looking for detailed information regarding specific components in xpressive, check the <a class="link" href="reference.html" title="Reference">Reference</a> section. </p> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.introduction"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction" title="Introduction">Introduction</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.introduction.what_is_xpressive_"></a><h3> <a name="id3097485"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction.what_is_xpressive_">What is xpressive?</a> </h3> <p> xpressive is a regular expression template library. Regular expressions (regexes) can be written as strings that are parsed dynamically at runtime (dynamic regexes), or as <span class="emphasis"><em>expression templates</em></span><sup>[<a name="id3097507" href="#ftn.id3097507" class="footnote">4</a>]</sup> that are parsed at compile-time (static regexes). Dynamic regexes have the advantage that they can be accepted from the user as input at runtime or read from an initialization file. Static regexes have several advantages. Since they are C++ expressions instead of strings, they can be syntax-checked at compile-time. Also, they can naturally refer to code and data elsewhere in your program, giving you the ability to call back into your code from within a regex match. Finally, since they are statically bound, the compiler can generate faster code for static regexes. </p> <p> xpressive's dual nature is unique and powerful. Static xpressive is a bit like the <a href="http://spirit.sourceforge.net" target="_top">Spirit Parser Framework</a>. Like <a href="http://spirit.sourceforge.net" target="_top">Spirit</a>, you can build grammars with static regexes using expression templates. (Unlike <a href="http://spirit.sourceforge.net" target="_top">Spirit</a>, xpressive does exhaustive backtracking, trying every possibility to find a match for your pattern.) Dynamic xpressive is a bit like <a href="../../../libs/regex" target="_top">Boost.Regex</a>. In fact, xpressive's interface should be familiar to anyone who has used <a href="../../../libs/regex" target="_top">Boost.Regex</a>. xpressive's innovation comes from allowing you to mix and match static and dynamic regexes in the same program, and even in the same expression! You can embed a dynamic regex in a static regex, or <span class="emphasis"><em>vice versa</em></span>, and the embedded regex will participate fully in the search, back-tracking as needed to make the match succeed. </p> <a name="boost_xpressive.user_s_guide.introduction.hello__world_"></a><h3> <a name="id3097580"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction.hello__world_">Hello, world!</a> </h3> <p> Enough theory. Let's have a look at <span class="emphasis"><em>Hello World</em></span>, xpressive style: </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">hello</span><span class="special">(</span> <span class="string">"hello world!"</span> <span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">hello</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// first capture </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">2</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// second capture </span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">hello world! hello world </pre> <p> The first thing you'll notice about the code is that all the types in xpressive live in the <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span></code> namespace. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Most of the rest of the examples in this document will leave off the <code class="computeroutput"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span></code> directive. Just pretend it's there. </p></td></tr> </table></div> <p> Next, you'll notice the type of the regular expression object is <code class="computeroutput"><span class="identifier">sregex</span></code>. If you are familiar with <a href="../../../libs/regex" target="_top">Boost.Regex</a>, this is different than what you are used to. The "<code class="computeroutput"><span class="identifier">s</span></code>" in "<code class="computeroutput"><span class="identifier">sregex</span></code>" stands for "<code class="computeroutput"><span class="identifier">string</span></code>", indicating that this regex can be used to find patterns in <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> objects. I'll discuss this difference and its implications in detail later. </p> <p> Notice how the regex object is initialized: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> </pre> <p> To create a regular expression object from a string, you must call a factory method such as <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1526048-bb">basic_regex<>::compile()</a></code></code>. This is another area in which xpressive differs from other object-oriented regular expression libraries. Other libraries encourage you to think of a regular expression as a kind of string on steroids. In xpressive, regular expressions are not strings; they are little programs in a domain-specific language. Strings are only one <span class="emphasis"><em>representation</em></span> of that language. Another representation is an expression template. For example, the above line of code is equivalent to the following: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">' '</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">'!'</span><span class="special">;</span> </pre> <p> This describes the same regular expression, except it uses the domain-specific embedded language defined by static xpressive. </p> <p> As you can see, static regexes have a syntax that is noticeably different than standard Perl syntax. That is because we are constrained by C++'s syntax. The biggest difference is the use of <code class="computeroutput"><span class="special">>></span></code> to mean "followed by". For instance, in Perl you can just put sub-expressions next to each other: </p> <pre class="programlisting"><span class="identifier">abc</span> </pre> <p> But in C++, there must be an operator separating sub-expressions: </p> <pre class="programlisting"><span class="identifier">a</span> <span class="special">>></span> <span class="identifier">b</span> <span class="special">>></span> <span class="identifier">c</span> </pre> <p> In Perl, parentheses <code class="computeroutput"><span class="special">()</span></code> have special meaning. They group, but as a side-effect they also create back-references like <code class="literal">$1</code> and <code class="literal">$2</code>. In C++, there is no way to overload parentheses to give them side-effects. To get the same effect, we use the special <code class="computeroutput"><span class="identifier">s1</span></code>, <code class="computeroutput"><span class="identifier">s2</span></code>, etc. tokens. Assign to one to create a back-reference (known as a sub-match in xpressive). </p> <p> You'll also notice that the one-or-more repetition operator <code class="computeroutput"><span class="special">+</span></code> has moved from postfix to prefix position. That's because C++ doesn't have a postfix <code class="computeroutput"><span class="special">+</span></code> operator. So: </p> <pre class="programlisting"><span class="string">"\\w+"</span> </pre> <p> is the same as: </p> <pre class="programlisting"><span class="special">+</span><span class="identifier">_w</span> </pre> <p> We'll cover all the other differences <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes" title="Static Regexes">later</a>. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.installing_xpressive"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive" title="Installing xpressive">Installing xpressive</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.installing_xpressive.getting_xpressive"></a><h3> <a name="id3098637"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.getting_xpressive">Getting xpressive</a> </h3> <p> There are two ways to get xpressive. The first and simplest is to download the latest version of Boost. Just go to <a href="http://sf.net/projects/boost" target="_top">http://sf.net/projects/boost</a> and follow the <span class="quote">“<span class="quote">Download</span>”</span> link. </p> <p> The second way is by directly accessing the Boost Subversion repository. Just go to <a href="http://svn.boost.org/trac/boost/" target="_top">http://svn.boost.org/trac/boost/</a> and follow the instructions there for anonymous Subversion access. The version in Boost Subversion is unstable. </p> <a name="boost_xpressive.user_s_guide.installing_xpressive.building_with_xpressive"></a><h3> <a name="id3098690"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.building_with_xpressive">Building with xpressive</a> </h3> <p> Xpressive is a header-only template library, which means you don't need to alter your build scripts or link to any separate lib file to use it. All you need to do is <code class="computeroutput"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>. If you are only using static regexes, you can improve compile times by only including <code class="computeroutput"><span class="identifier">xpressive_static</span><span class="special">.</span><span class="identifier">hpp</span></code>. Likewise, you can include <code class="computeroutput"><span class="identifier">xpressive_dynamic</span><span class="special">.</span><span class="identifier">hpp</span></code> if you only plan on using dynamic regexes. </p> <p> If you would also like to use semantic actions or custom assertions with your static regexes, you will need to additionally include <code class="computeroutput"><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span></code>. </p> <a name="boost_xpressive.user_s_guide.installing_xpressive.requirements"></a><h3> <a name="id3098697"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.requirements">Requirements</a> </h3> <p> Xpressive requires Boost version 1.34.1 or higher. </p> <a name="boost_xpressive.user_s_guide.installing_xpressive.supported_compilers"></a><h3> <a name="id3098857"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.supported_compilers">Supported Compilers</a> </h3> <p> Currently, Boost.Xpressive is known to work on the following compilers: </p> <div class="itemizedlist"><ul class="itemizedlist" type="disc"> <li class="listitem"> Visual C++ 7.1 and higher </li> <li class="listitem"> GNU C++ 3.4 and higher </li> <li class="listitem"> Intel for Linux 8.1 and higher </li> <li class="listitem"> Intel for Windows 10 and higher </li> <li class="listitem"> tru64cxx 71 and higher </li> <li class="listitem"> MinGW 3.4 and higher </li> <li class="listitem"> HP C/aC++ A.06.14 </li> </ul></div> <p> Check the latest tests results at Boost's <a href="http://beta.boost.org/development/tests/trunk/developer/xpressive.html" target="_top">Regression Results Page</a>. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Please send any questions, comments and bug reports to eric <at> boost-consulting <dot> com. </p></td></tr> </table></div> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.quick_start"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start" title="Quick Start">Quick Start</a> </h3></div></div></div> <p> You don't need to know much to start being productive with xpressive. Let's begin with the nickel tour of the types and algorithms xpressive provides. </p> <div class="table"> <a name="id3098975"></a><p class="title"><b>Table 29.1. xpressive's Tool-Box</b></p> <div class="table-contents"><table class="table" summary="xpressive's Tool-Box"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Tool </p> </th> <th> <p> Description </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> </p> </td> <td> <p> Contains a compiled regular expression. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> is the most important type in xpressive. Everything you do with xpressive will begin with creating an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> </p> </td> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> contains the results of a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> operation. It acts like a vector of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> objects. A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> object contains a marked sub-expression (also known as a back-reference in Perl). It is basically just a pair of iterators representing the begin and end of the marked sub-expression. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> </p> </td> <td> <p> Checks to see if a string matches a regex. For <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> to succeed, the <span class="emphasis"><em>whole string</em></span> must match the regex, from beginning to end. If you give <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, it will write into it any marked sub-expressions it finds. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> </p> </td> <td> <p> Searches a string to find a sub-string that matches the regex. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> will try to find a match at every position in the string, starting at the beginning, and stopping when it finds a match or when the string is exhausted. As with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>, if you give <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, it will write into it any marked sub-expressions it finds. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> </p> </td> <td> <p> Given an input string, a regex, and a substitution string, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> builds a new string by replacing those parts of the input string that match the regex with the substitution string. The substitution string can contain references to marked sub-expressions. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> </p> </td> <td> <p> An STL-compatible iterator that makes it easy to find all the places in a string that match a regex. Dereferencing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>. Incrementing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> finds the next match. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> </p> </td> <td> <p> Like <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code>, except dereferencing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> returns a string. By default, it will return the whole sub-string that the regex matched, but it can be configured to return any or all of the marked sub-expressions one at a time, or even the parts of the string that <span class="emphasis"><em>didn't</em></span> match the regex. </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> </p> </td> <td> <p> A factory for <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> objects. It "compiles" a string into a regular expression. You will not usually have to deal directly with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> because the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> class has a factory method that uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> internally. But if you need to do anything fancy like create a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object with a different <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, you will need to use a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> explicitly. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> Now that you know a bit about the tools xpressive provides, you can pick the right tool for you by answering the following two questions: </p> <div class="orderedlist"><ol class="orderedlist" type="1"> <li class="listitem"> What <span class="emphasis"><em>iterator</em></span> type will you use to traverse your data? </li> <li class="listitem"> What do you want to <span class="emphasis"><em>do</em></span> to your data? </li> </ol></div> <a name="boost_xpressive.user_s_guide.quick_start.know_your_iterator_type"></a><h3> <a name="id3099760"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_iterator_type">Know Your Iterator Type</a> </h3> <p> Most of the classes in xpressive are templates that are parameterized on the iterator type. xpressive defines some common typedefs to make the job of choosing the right types easier. You can use the table below to find the right types based on the type of your iterator. </p> <div class="table"> <a name="id3099783"></a><p class="title"><b>Table 29.2. xpressive Typedefs vs. Iterator Types</b></p> <div class="table-contents"><table class="table" summary="xpressive Typedefs vs. Iterator Types"> <colgroup> <col> <col> <col> <col> <col> </colgroup> <thead><tr> <th> <p> </p> </th> <th> <p> std::string::const_iterator </p> </th> <th> <p> char const * </p> </th> <th> <p> std::wstring::const_iterator </p> </th> <th> <p> wchar_t const * </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">sregex</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">cregex</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wsregex</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wcregex</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">smatch</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">cmatch</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wsmatch</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wcmatch</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">sregex_compiler</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">cregex_compiler</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wsregex_compiler</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wcregex_compiler</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">sregex_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">cregex_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wcregex_iterator</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">sregex_token_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">cregex_token_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wsregex_token_iterator</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">wcregex_token_iterator</span></code> </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> You should notice the systematic naming convention. Many of these types are used together, so the naming convention helps you to use them consistently. For instance, if you have a <code class="computeroutput"><span class="identifier">sregex</span></code>, you should also be using a <code class="computeroutput"><span class="identifier">smatch</span></code>. </p> <p> If you are not using one of those four iterator types, then you can use the templates directly and specify your iterator type. </p> <a name="boost_xpressive.user_s_guide.quick_start.know_your_task"></a><h3> <a name="id3100305"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_task">Know Your Task</a> </h3> <p> Do you want to find a pattern once? Many times? Search and replace? xpressive has tools for all that and more. Below is a quick reference: </p> <div class="table"> <a name="id3100325"></a><p class="title"><b>Table 29.3. Tasks and Tools</b></p> <div class="table-contents"><table class="table" summary="Tasks and Tools"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> To do this ... </p> </th> <th> <p> Use this ... </p> </th> </tr></thead> <tbody> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">See if a whole string matches a regex</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm </p> </td> </tr> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">See if a string contains a sub-string that matches a regex</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> algorithm </p> </td> </tr> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">Replace all sub-strings that match a regex</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> algorithm </p> </td> </tr> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time">Find all the sub-strings that match a regex and step through them one at a time</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> class </p> </td> </tr> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex">Split a string into tokens that each match a regex</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> class </p> </td> </tr> <tr> <td> <p> <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter">Split a string using a regex as a delimiter</a> </p> </td> <td> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> class </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> These algorithms and classes are described in excruciating detail in the Reference section. </p> <div class="tip"><table border="0" summary="Tip"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> <th align="left">Tip</th> </tr> <tr><td align="left" valign="top"><p> Try clicking on a task in the table above to see a complete example program that uses xpressive to solve that particular task. </p></td></tr> </table></div> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="xpressive.user_s_guide.creating_a_regex_object"></a><a class="link" href="user_s_guide.html#xpressive.user_s_guide.creating_a_regex_object" title="Creating a Regex Object">Creating a Regex Object</a> </h3></div></div></div> <div class="toc"><dl> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes">Static Regexes</a></span></dt> <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes">Dynamic Regexes</a></span></dt> </dl></div> <p> When using xpressive, the first thing you'll do is create a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object. This section goes over the nuts and bolts of building a regular expression in the two dialects xpressive supports: static and dynamic. </p> <div class="section"> <div class="titlepage"><div><div><h4 class="title"> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes" title="Static Regexes">Static Regexes</a> </h4></div></div></div> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.overview"></a><h3> <a name="id3100786"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.overview">Overview</a> </h3> <p> The feature that really sets xpressive apart from other C/C++ regular expression libraries is the ability to author a regular expression using C++ expressions. xpressive achieves this through operator overloading, using a technique called <span class="emphasis"><em>expression templates</em></span> to embed a mini-language dedicated to pattern matching within C++. These "static regexes" have many advantages over their string-based brethren. In particular, static regexes: </p> <div class="itemizedlist"><ul class="itemizedlist" type="disc"> <li class="listitem"> are syntax-checked at compile-time; they will never fail at run-time due to a syntax error. </li> <li class="listitem"> can naturally refer to other C++ data and code, including other regexes, making it simple to build grammars out of regular expressions and bind user-defined actions that execute when parts of your regex match. </li> <li class="listitem"> are statically bound for better inlining and optimization. Static regexes require no state tables, virtual functions, byte-code or calls through function pointers that cannot be resolved at compile time. </li> <li class="listitem"> are not limited to searching for patterns in strings. You can declare a static regex that finds patterns in an array of integers, for instance. </li> </ul></div> <p> Since we compose static regexes using C++ expressions, we are constrained by the rules for legal C++ expressions. Unfortunately, that means that "classic" regular expression syntax cannot always be mapped cleanly into C++. Rather, we map the regex <span class="emphasis"><em>constructs</em></span>, picking new syntax that is legal C++. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.construction_and_assignment"></a><h3> <a name="id3100882"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.construction_and_assignment">Construction and Assignment</a> </h3> <p> You create a static regex by assigning one to an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. For instance, the following defines a regex that can be used to find patterns in objects of type <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="char">'$'</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">>></span> <span class="char">'.'</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">;</span> </pre> <p> Assignment works similarly. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.character_and_string_literals"></a><h3> <a name="id3101029"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.character_and_string_literals">Character and String Literals</a> </h3> <p> In static regexes, character and string literals match themselves. For instance, in the regex above, <code class="computeroutput"><span class="char">'$'</span></code> and <code class="computeroutput"><span class="char">'.'</span></code> match the characters <code class="computeroutput"><span class="char">'$'</span></code> and <code class="computeroutput"><span class="char">'.'</span></code> respectively. Don't be confused by the fact that <code class="literal">$</code> and <code class="literal">.</code> are meta-characters in Perl. In xpressive, literals always represent themselves. </p> <p> When using literals in static regexes, you must take care that at least one operand is not a literal. For instance, the following are <span class="emphasis"><em>not</em></span> valid regexes: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re1</span> <span class="special">=</span> <span class="char">'a'</span> <span class="special">>></span> <span class="char">'b'</span><span class="special">;</span> <span class="comment">// ERROR! </span><span class="identifier">sregex</span> <span class="identifier">re2</span> <span class="special">=</span> <span class="special">+</span><span class="char">'a'</span><span class="special">;</span> <span class="comment">// ERROR! </span></pre> <p> The two operands to the binary <code class="computeroutput"><span class="special">>></span></code> operator are both literals, and the operand of the unary <code class="computeroutput"><span class="special">+</span></code> operator is also a literal, so these statements will call the native C++ binary right-shift and unary plus operators, respectively. That's not what we want. To get operator overloading to kick in, at least one operand must be a user-defined type. We can use xpressive's <code class="computeroutput"><span class="identifier">as_xpr</span><span class="special">()</span></code> helper function to "taint" an expression with regex-ness, forcing operator overloading to find the correct operators. The two regexes above should be written as: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re1</span> <span class="special">=</span> <span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">)</span> <span class="special">>></span> <span class="char">'b'</span><span class="special">;</span> <span class="comment">// OK </span><span class="identifier">sregex</span> <span class="identifier">re2</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">);</span> <span class="comment">// OK </span></pre> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.sequencing_and_alternation"></a><h3> <a name="id3101346"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.sequencing_and_alternation">Sequencing and Alternation</a> </h3> <p> As you've probably already noticed, sub-expressions in static regexes must be separated by the sequencing operator, <code class="computeroutput"><span class="special">>></span></code>. You can read this operator as "followed by". </p> <pre class="programlisting"><span class="comment">// Match an 'a' followed by a digit </span><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="char">'a'</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">;</span> </pre> <p> Alternation works just as it does in Perl with the <code class="computeroutput"><span class="special">|</span></code> operator. You can read this operator as "or". For example: </p> <pre class="programlisting"><span class="comment">// match a digit character or a word character one or more times </span><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">|</span> <span class="identifier">_w</span> <span class="special">);</span> </pre> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.grouping_and_captures"></a><h3> <a name="id3101500"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.grouping_and_captures">Grouping and Captures</a> </h3> <p> In Perl, parentheses <code class="computeroutput"><span class="special">()</span></code> have special meaning. They group, but as a side-effect they also create back-references like <code class="literal">$1</code> and <code class="literal">$2</code>. In C++, parentheses only group -- there is no way to give them side-effects. To get the same effect, we use the special <code class="computeroutput"><span class="identifier">s1</span></code>, <code class="computeroutput"><span class="identifier">s2</span></code>, etc. tokens. Assigning to one creates a back-reference. You can then use the back-reference later in your expression, like using <code class="literal">\1</code> and <code class="literal">\2</code> in Perl. For example, consider the following regex, which finds matching HTML tags: </p> <pre class="programlisting"><span class="string">"<(\\w+)>.*?</\\1>"</span> </pre> <p> In static xpressive, this would be: </p> <pre class="programlisting"><span class="char">'<'</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">'>'</span> <span class="special">>></span> <span class="special">-*</span><span class="identifier">_</span> <span class="special">>></span> <span class="string">"</"</span> <span class="special">>></span> <span class="identifier">s1</span> <span class="special">>></span> <span class="char">'>'</span> </pre> <p> Notice how you capture a back-reference by assigning to <code class="computeroutput"><span class="identifier">s1</span></code>, and then you use <code class="computeroutput"><span class="identifier">s1</span></code> later in the pattern to find the matching end tag. </p> <div class="tip"><table border="0" summary="Tip"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> <th align="left">Tip</th> </tr> <tr><td align="left" valign="top"><p> <span class="bold"><strong>Grouping without capturing a back-reference</strong></span> <br> <br> In xpressive, if you just want grouping without capturing a back-reference, you can just use <code class="computeroutput"><span class="special">()</span></code> without <code class="computeroutput"><span class="identifier">s1</span></code>. That is the equivalent of Perl's <code class="literal">(?:)</code> non-capturing grouping construct. </p></td></tr> </table></div> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.case_insensitivity_and_internationalization"></a><h3> <a name="id3101772"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.case_insensitivity_and_internationalization">Case-Insensitivity and Internationalization</a> </h3> <p> Perl lets you make part of your regular expression case-insensitive by using the <code class="literal">(?i:)</code> pattern modifier. xpressive also has a case-insensitivity pattern modifier, called <code class="computeroutput"><span class="identifier">icase</span></code>. You can use it as follows: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="string">"this"</span> <span class="special">>></span> <span class="identifier">icase</span><span class="special">(</span> <span class="string">"that"</span> <span class="special">);</span> </pre> <p> In this regular expression, <code class="computeroutput"><span class="string">"this"</span></code> will be matched exactly, but <code class="computeroutput"><span class="string">"that"</span></code> will be matched irrespective of case. </p> <p> Case-insensitive regular expressions raise the issue of internationalization: how should case-insensitive character comparisons be evaluated? Also, many character classes are locale-specific. Which characters are matched by <code class="computeroutput"><span class="identifier">digit</span></code> and which are matched by <code class="computeroutput"><span class="identifier">alpha</span></code>? The answer depends on the <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code> object the regular expression object is using. By default, all regular expression objects use the global locale. You can override the default by using the <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> pattern modifier, as follows: </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">my_locale</span> <span class="special">=</span> <span class="comment">/* initialize a std::locale object */</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span> <span class="identifier">my_locale</span> <span class="special">)(</span> <span class="special">+</span><span class="identifier">alpha</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">digit</span> <span class="special">);</span> </pre> <p> This regular expression will evaluate <code class="computeroutput"><span class="identifier">alpha</span></code> and <code class="computeroutput"><span class="identifier">digit</span></code> according to <code class="computeroutput"><span class="identifier">my_locale</span></code>. See the section on <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits" title="Localization and Regex Traits">Localization and Regex Traits</a> for more information about how to customize the behavior of your regexes. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.static_xpressive_syntax_cheat_sheet"></a><h3> <a name="id3102109"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.static_xpressive_syntax_cheat_sheet">Static xpressive Syntax Cheat Sheet</a> </h3> <p> The table below lists the familiar regex constructs and their equivalents in static xpressive. </p> <div class="table"> <a name="id3102131"></a><p class="title"><b>Table 29.4. Perl syntax vs. Static xpressive syntax</b></p> <div class="table-contents"><table class="table" summary="Perl syntax vs. Static xpressive syntax"> <colgroup> <col> <col> <col> </colgroup> <thead><tr> <th> <p> Perl </p> </th> <th> <p> Static xpressive </p> </th> <th> <p> Meaning </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal">.</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_.html" title="Global _">_</a></code> </p> </td> <td> <p> any character (assuming Perl's /s modifier). </p> </td> </tr> <tr> <td> <p> <code class="literal">ab</code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">a</span> <span class="special">>></span> <span class="identifier">b</span></code> </p> </td> <td> <p> sequencing of <code class="literal">a</code> and <code class="literal">b</code> sub-expressions. </p> </td> </tr> <tr> <td> <p> <code class="literal">a|b</code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">a</span> <span class="special">|</span> <span class="identifier">b</span></code> </p> </td> <td> <p> alternation of <code class="literal">a</code> and <code class="literal">b</code> sub-expressions. </p> </td> </tr> <tr> <td> <p> <code class="literal">(a)</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">(</span><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a><span class="special">=</span> <span class="identifier">a</span><span class="special">)</span></code> </p> </td> <td> <p> group and capture a back-reference. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?:a)</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">(</span><span class="identifier">a</span><span class="special">)</span></code> </p> </td> <td> <p> group and do not capture a back-reference. </p> </td> </tr> <tr> <td> <p> <code class="literal">\1</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a></code> </p> </td> <td> <p> a previously captured back-reference. </p> </td> </tr> <tr> <td> <p> <code class="literal">a*</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">*</span><span class="identifier">a</span></code> </p> </td> <td> <p> zero or more times, greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a+</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">+</span><span class="identifier">a</span></code> </p> </td> <td> <p> one or more times, greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a?</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">!</span><span class="identifier">a</span></code> </p> </td> <td> <p> zero or one time, greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a{n,m}</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/repeat.html" title="Function repeat">repeat</a><span class="special"><</span><span class="identifier">n</span><span class="special">,</span><span class="identifier">m</span><span class="special">>(</span><span class="identifier">a</span><span class="special">)</span></code> </p> </td> <td> <p> between <code class="literal">n</code> and <code class="literal">m</code> times, greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a*?</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">-*</span><span class="identifier">a</span></code> </p> </td> <td> <p> zero or more times, non-greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a+?</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">-+</span><span class="identifier">a</span></code> </p> </td> <td> <p> one or more times, non-greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a??</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">-!</span><span class="identifier">a</span></code> </p> </td> <td> <p> zero or one time, non-greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">a{n,m}?</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">-</span><a class="link" href="../boost/xpressive/repeat.html" title="Function repeat">repeat</a><span class="special"><</span><span class="identifier">n</span><span class="special">,</span><span class="identifier">m</span><span class="special">>(</span><span class="identifier">a</span><span class="special">)</span></code> </p> </td> <td> <p> between <code class="literal">n</code> and <code class="literal">m</code> times, non-greedy. </p> </td> </tr> <tr> <td> <p> <code class="literal">^</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/bos.html" title="Global bos">bos</a></code> </p> </td> <td> <p> beginning of sequence assertion. </p> </td> </tr> <tr> <td> <p> <code class="literal">$</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/eos.html" title="Global eos">eos</a></code> </p> </td> <td> <p> end of sequence assertion. </p> </td> </tr> <tr> <td> <p> <code class="literal">\b</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_b.html" title="Global _b">_b</a></code> </p> </td> <td> <p> word boundary assertion. </p> </td> </tr> <tr> <td> <p> <code class="literal">\B</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_b.html" title="Global _b">_b</a></code> </p> </td> <td> <p> not word boundary assertion. </p> </td> </tr> <tr> <td> <p> <code class="literal">\n</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_n.html" title="Global _n">_n</a></code> </p> </td> <td> <p> literal newline. </p> </td> </tr> <tr> <td> <p> <code class="literal">.</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_n.html" title="Global _n">_n</a></code> </p> </td> <td> <p> any character except a literal newline (without Perl's /s modifier). </p> </td> </tr> <tr> <td> <p> <code class="literal">\r?\n|\r</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_ln.html" title="Global _ln">_ln</a></code> </p> </td> <td> <p> logical newline. </p> </td> </tr> <tr> <td> <p> <code class="literal">[^\r\n]</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_ln.html" title="Global _ln">_ln</a></code> </p> </td> <td> <p> any single character not a logical newline. </p> </td> </tr> <tr> <td> <p> <code class="literal">\w</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_w.html" title="Global _w">_w</a></code> </p> </td> <td> <p> a word character, equivalent to set[alnum | '_']. </p> </td> </tr> <tr> <td> <p> <code class="literal">\W</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_w.html" title="Global _w">_w</a></code> </p> </td> <td> <p> not a word character, equivalent to ~set[alnum | '_']. </p> </td> </tr> <tr> <td> <p> <code class="literal">\d</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_d.html" title="Global _d">_d</a></code> </p> </td> <td> <p> a digit character. </p> </td> </tr> <tr> <td> <p> <code class="literal">\D</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_d.html" title="Global _d">_d</a></code> </p> </td> <td> <p> not a digit character. </p> </td> </tr> <tr> <td> <p> <code class="literal">\s</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/_s.html" title="Global _s">_s</a></code> </p> </td> <td> <p> a space character. </p> </td> </tr> <tr> <td> <p> <code class="literal">\S</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_s.html" title="Global _s">_s</a></code> </p> </td> <td> <p> not a space character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:alnum:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/alnum.html" title="Global alnum">alnum</a></code> </p> </td> <td> <p> an alpha-numeric character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:alpha:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/alpha.html" title="Global alpha">alpha</a></code> </p> </td> <td> <p> an alphabetic character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:blank:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/blank.html" title="Global blank">blank</a></code> </p> </td> <td> <p> a horizontal white-space character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:cntrl:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/cntrl.html" title="Global cntrl">cntrl</a></code> </p> </td> <td> <p> a control character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:digit:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/digit.html" title="Global digit">digit</a></code> </p> </td> <td> <p> a digit character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:graph:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/graph.html" title="Global graph">graph</a></code> </p> </td> <td> <p> a graphable character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:lower:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/lower.html" title="Global lower">lower</a></code> </p> </td> <td> <p> a lower-case character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:print:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/print.html" title="Global print">print</a></code> </p> </td> <td> <p> a printing character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:punct:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/punct.html" title="Global punct">punct</a></code> </p> </td> <td> <p> a punctuation character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:space:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/space.html" title="Global space">space</a></code> </p> </td> <td> <p> a white-space character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:upper:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/upper.html" title="Global upper">upper</a></code> </p> </td> <td> <p> an upper-case character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[:xdigit:]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/xdigit.html" title="Global xdigit">xdigit</a></code> </p> </td> <td> <p> a hexadecimal digit character. </p> </td> </tr> <tr> <td> <p> <code class="literal">[0-9]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span></code> </p> </td> <td> <p> characters in range <code class="computeroutput"><span class="char">'0'</span></code> through <code class="computeroutput"><span class="char">'9'</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="literal">[abc]</code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">)</span> <span class="special">|</span> <span class="char">'b'</span> <span class="special">|</span><span class="char">'c'</span></code> </p> </td> <td> <p> characters <code class="computeroutput"><span class="char">'a'</span></code>, <code class="computeroutput"><span class="char">'b'</span></code>, or <code class="computeroutput"><span class="char">'c'</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="literal">[abc]</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span></code> </p> </td> <td> <p> <span class="emphasis"><em>same as above</em></span> </p> </td> </tr> <tr> <td> <p> <code class="literal">[0-9abc]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">[</span> <a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span> <span class="special">|</span> <span class="char">'a'</span> <span class="special">|</span> <span class="char">'b'</span> <span class="special">|</span> <span class="char">'c'</span> <span class="special">]</span></code> </p> </td> <td> <p> characters <code class="computeroutput"><span class="char">'a'</span></code>, <code class="computeroutput"><span class="char">'b'</span></code>, <code class="computeroutput"><span class="char">'c'</span></code> or in range <code class="computeroutput"><span class="char">'0'</span></code> through <code class="computeroutput"><span class="char">'9'</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="literal">[0-9abc]</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">[</span> <a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span> <span class="special">]</span></code> </p> </td> <td> <p> <span class="emphasis"><em>same as above</em></span> </p> </td> </tr> <tr> <td> <p> <code class="literal">[^abc]</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span></code> </p> </td> <td> <p> not characters <code class="computeroutput"><span class="char">'a'</span></code>, <code class="computeroutput"><span class="char">'b'</span></code>, or <code class="computeroutput"><span class="char">'c'</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?i:<span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/icase.html" title="Function template icase">icase</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> match <span class="emphasis"><em>stuff</em></span> disregarding case. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?><span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/keep.html" title="Function template keep">keep</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> independent sub-expression, match <span class="emphasis"><em>stuff</em></span> and turn off backtracking. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?=<span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/before.html" title="Function template before">before</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> positive look-ahead assertion, match if before <span class="emphasis"><em>stuff</em></span> but don't include <span class="emphasis"><em>stuff</em></span> in the match. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?!<span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/before.html" title="Function template before">before</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> negative look-ahead assertion, match if not before <span class="emphasis"><em>stuff</em></span>. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?<=<span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><a class="link" href="../boost/xpressive/after.html" title="Function template after">after</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> positive look-behind assertion, match if after <span class="emphasis"><em>stuff</em></span> but don't include <span class="emphasis"><em>stuff</em></span> in the match. (<span class="emphasis"><em>stuff</em></span> must be constant-width.) </p> </td> </tr> <tr> <td> <p> <code class="literal">(?<!<span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/after.html" title="Function template after">after</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> negative look-behind assertion, match if not after <span class="emphasis"><em>stuff</em></span>. (<span class="emphasis"><em>stuff</em></span> must be constant-width.) </p> </td> </tr> <tr> <td> <p> <code class="literal">(?P<<span class="emphasis"><em>name</em></span>><span class="emphasis"><em>stuff</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><code class="literal"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code> </code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">(</span></code><span class="emphasis"><em>n</em></span><code class="computeroutput"><span class="special">);</span></code><br> ...<br> <code class="computeroutput"><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">=</span> </code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> </p> </td> <td> <p> Create a named capture. </p> </td> </tr> <tr> <td> <p> <code class="literal">(?P=<span class="emphasis"><em>name</em></span>)</code> </p> </td> <td> <p> <code class="computeroutput"><code class="literal"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code> </code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">(</span></code><span class="emphasis"><em>n</em></span><code class="computeroutput"><span class="special">);</span></code><br> ...<br> <code class="literal"><span class="emphasis"><em>name</em></span></code> </p> </td> <td> <p> Refer back to a previously created named capture. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> <br> </p> </div> <div class="section"> <div class="titlepage"><div><div><h4 class="title"> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes" title="Dynamic Regexes">Dynamic Regexes</a> </h4></div></div></div> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.overview"></a><h3> <a name="id3105275"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.overview">Overview</a> </h3> <p> Static regexes are dandy, but sometimes you need something a bit more ... dynamic. Imagine you are developing a text editor with a regex search/replace feature. You need to accept a regular expression from the end user as input at run-time. There should be a way to parse a string into a regular expression. That's what xpressive's dynamic regexes are for. They are built from the same core components as their static counterparts, but they are late-bound so you can specify them at run-time. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.construction_and_assignment"></a><h3> <a name="id3105303"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.construction_and_assignment">Construction and Assignment</a> </h3> <p> There are two ways to create a dynamic regex: with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1526048-bb">basic_regex<>::compile()</a></code></code> function or with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> class template. Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1526048-bb">basic_regex<>::compile()</a></code></code> if you want the default locale. Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> if you need to specify a different locale. In the section on <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">regex grammars</a>, we'll see another use for <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. </p> <p> Here is an example of using <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">compile</span><span class="special">()</span></code>: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"this|that"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">icase</span> <span class="special">);</span> </pre> <p> Here is the same example using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>: </p> <pre class="programlisting"><span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"this|that"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">icase</span> <span class="special">);</span> </pre> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1526048-bb">basic_regex<>::compile()</a></code></code> is implemented in terms of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.dynamic_xpressive_syntax"></a><h3> <a name="id3105643"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.dynamic_xpressive_syntax">Dynamic xpressive Syntax</a> </h3> <p> Since the dynamic syntax is not constrained by the rules for valid C++ expressions, we are free to use familiar syntax for dynamic regexes. For this reason, the syntax used by xpressive for dynamic regexes follows the lead set by John Maddock's <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm" target="_top">proposal</a> to add regular expressions to the Standard Library. It is essentially the syntax standardized by <a href="http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf" target="_top">ECMAScript</a>, with minor changes in support of internationalization. </p> <p> Since the syntax is documented exhaustively elsewhere, I will simply refer you to the existing standards, rather than duplicate the specification here. </p> <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.internationalization"></a><h3> <a name="id3105699"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.internationalization">Internationalization</a> </h3> <p> As with static regexes, dynamic regexes support internationalization by allowing you to specify a different <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. To do this, you must use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> class has an <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> function. After you have imbued a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> object with a custom <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, all regex objects compiled by that <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> will use that locale. For example: </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">my_locale</span> <span class="special">=</span> <span class="comment">/* initialize your locale object here */</span><span class="special">;</span> <span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">imbue</span><span class="special">(</span> <span class="identifier">my_locale</span> <span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+|\\d+"</span> <span class="special">);</span> </pre> <p> This regex will use <code class="computeroutput"><span class="identifier">my_locale</span></code> when evaluating the intrinsic character sets <code class="computeroutput"><span class="string">"\\w"</span></code> and <code class="computeroutput"><span class="string">"\\d"</span></code>. </p> </div> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.matching_and_searching"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching" title="Matching and Searching">Matching and Searching</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.matching_and_searching.overview"></a><h3> <a name="id3106017"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.overview">Overview</a> </h3> <p> Once you have created a regex object, you can use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> algorithms to find patterns in strings. This page covers the basics of regex matching and searching. In all cases, if you are familiar with how <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> in the <a href="../../../libs/regex" target="_top">Boost.Regex</a> library work, xpressive's versions work the same way. </p> <a name="boost_xpressive.user_s_guide.matching_and_searching.seeing_if_a_string_matches_a_regex"></a><h3> <a name="id3106111"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.seeing_if_a_string_matches_a_regex">Seeing if a String Matches a Regex</a> </h3> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm checks to see if a regex matches a given input. </p> <div class="warning"><table border="0" summary="Warning"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="../../../doc/src/images/warning.png"></td> <th align="left">Warning</th> </tr> <tr><td align="left" valign="top"><p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm will only report success if the regex matches the <span class="emphasis"><em>whole input</em></span>, from beginning to end. If the regex matches only a part of the input, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> will return false. If you want to search through the string looking for sub-strings that the regex matches, use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> algorithm. </p></td></tr> </table></div> <p> The input can be a bidirectional range such as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>, a C-style null-terminated string or a pair of iterators. In all cases, the type of the iterator used to traverse the input sequence must match the iterator type used to declare the regex object. (You can use the table in the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_iterator_type">Quick Start</a> to find the correct regex type for your iterator.) </p> <pre class="programlisting"><span class="identifier">cregex</span> <span class="identifier">cre</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// this regex can match C-style strings </span><span class="identifier">sregex</span> <span class="identifier">sre</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// this regex can match std::strings </span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">cre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// OK </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">(</span><span class="string">"hello"</span><span class="special">),</span> <span class="identifier">sre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// OK </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">sre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// ERROR! iterator mis-match! </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> </pre> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm optionally accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> struct as an out parameter. If given, the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm fills in the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> struct with information about which parts of the regex matched which parts of the input. </p> <pre class="programlisting"><span class="identifier">cmatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="identifier">cregex</span> <span class="identifier">cre</span> <span class="special">=</span> <span class="special">+(</span><span class="identifier">s1</span><span class="special">=</span> <span class="identifier">_w</span><span class="special">);</span> <span class="comment">// store the results of the regex_match in "what" </span><span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">cre</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// prints "o" </span><span class="special">}</span> </pre> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm also optionally accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> bitmask. With <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code>, you can control certain aspects of how the match is evaluated. See the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> reference for a complete list of the flags and their meanings. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"hello"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">sre</span> <span class="special">=</span> <span class="identifier">bol</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// match_not_bol means that "bol" should not match at [begin,begin) </span><span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">sre</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">match_not_bol</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="comment">// should never get here!!! </span><span class="special">}</span> </pre> <p> Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">here</a> to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>. And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> reference to see a complete list of the available overloads. </p> <a name="boost_xpressive.user_s_guide.matching_and_searching.searching_for_matching_sub_strings"></a><h3> <a name="id3107093"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.searching_for_matching_sub_strings">Searching for Matching Sub-Strings</a> </h3> <p> Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> when you want to know if an input sequence contains a sub-sequence that a regex matches. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> will try to match the regex at the beginning of the input sequence and scan forward in the sequence until it either finds a match or exhausts the sequence. </p> <p> In all other regards, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> behaves like <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> <span class="emphasis"><em>(see above)</em></span>. In particular, it can operate on a bidirectional range such as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>, C-style null-terminated strings or iterator ranges. The same care must be taken to ensure that the iterator type of your regex matches the iterator type of your input sequence. As with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>, you can optionally provide a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> struct to receive the results of the search, and a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> bitmask to control how the match is evaluated. </p> <p> Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">here</a> to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>. And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> reference to see a complete list of the available overloads. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.accessing_results"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results" title="Accessing Results">Accessing Results</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.accessing_results.overview"></a><h3> <a name="id3107314"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.overview">Overview</a> </h3> <p> Sometimes, it is not enough to know simply whether a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> was successful or not. If you pass an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>, then after the algorithm has completed successfully the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> will contain extra information about which parts of the regex matched which parts of the sequence. In Perl, these sub-sequences are called <span class="emphasis"><em>back-references</em></span>, and they are stored in the variables <code class="literal">$1</code>, <code class="literal">$2</code>, etc. In xpressive, they are objects of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>, and they are stored in the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> structure, which acts as a vector of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> objects. </p> <a name="boost_xpressive.user_s_guide.accessing_results.match_results"></a><h3> <a name="id3107488"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.match_results">match_results</a> </h3> <p> So, you've passed a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object to a regex algorithm, and the algorithm has succeeded. Now you want to examine the results. Most of what you'll be doing with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object is indexing into it to access its internally stored <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> objects, but there are a few other things you can do with a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object besides. </p> <p> The table below shows how to access the information stored in a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object named <code class="computeroutput"><span class="identifier">what</span></code>. </p> <div class="table"> <a name="id3107596"></a><p class="title"><b>Table 29.5. match_results<> Accessors</b></p> <div class="table-contents"><table class="table" summary="match_results<> Accessors"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Accessor </p> </th> <th> <p> Effects </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span></code> </p> </td> <td> <p> Returns the number of sub-matches, which is always greater than zero after a successful match because the full match is stored in the zero-th sub-match. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">]</span></code> </p> </td> <td> <p> Returns the <span class="emphasis"><em>n</em></span>-th sub-match. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">length</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> </p> </td> <td> <p> Returns the length of the <span class="emphasis"><em>n</em></span>-th sub-match. Same as <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">].</span><span class="identifier">length</span><span class="special">()</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">position</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> </p> </td> <td> <p> Returns the offset into the input sequence at which the <span class="emphasis"><em>n</em></span>-th sub-match begins. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">str</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> </p> </td> <td> <p> Returns a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code> constructed from the <span class="emphasis"><em>n</em></span>-th sub-match. Same as <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">].</span><span class="identifier">str</span><span class="special">()</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">prefix</span><span class="special">()</span></code> </p> </td> <td> <p> Returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> object which represents the sub-sequence from the beginning of the input sequence to the start of the full match. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">suffix</span><span class="special">()</span></code> </p> </td> <td> <p> Returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> object which represents the sub-sequence from the end of the full match to the end of the input sequence. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span></code> </p> </td> <td> <p> Returns the <code class="computeroutput"><span class="identifier">regex_id</span></code> of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object that was last used with this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> There is more you can do with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object, but that will be covered when we talk about <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">Grammars and Nested Matches</a>. </p> <a name="boost_xpressive.user_s_guide.accessing_results.sub_match"></a><h3> <a name="id3108179"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.sub_match">sub_match</a> </h3> <p> When you index into a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object, you get back a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> object. A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> is basically a pair of iterators. It is defined like this: </p> <pre class="programlisting"><span class="keyword">template</span><span class="special"><</span> <span class="keyword">class</span> <span class="identifier">BidirectionalIterator</span> <span class="special">></span> <span class="keyword">struct</span> <span class="identifier">sub_match</span> <span class="special">:</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><</span> <span class="identifier">BidirectionalIterator</span><span class="special">,</span> <span class="identifier">BidirectionalIterator</span> <span class="special">></span> <span class="special">{</span> <span class="keyword">bool</span> <span class="identifier">matched</span><span class="special">;</span> <span class="comment">// ... </span><span class="special">};</span> </pre> <p> Since it inherits publicaly from <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><></span></code>, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> has <code class="computeroutput"><span class="identifier">first</span></code> and <code class="computeroutput"><span class="identifier">second</span></code> data members of type <code class="computeroutput"><span class="identifier">BidirectionalIterator</span></code>. These are the beginning and end of the sub-sequence this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> represents. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> also has a Boolean <code class="computeroutput"><span class="identifier">matched</span></code> data member, which is true if this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> participated in the full match. </p> <p> The following table shows how you might access the information stored in a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> object called <code class="computeroutput"><span class="identifier">sub</span></code>. </p> <div class="table"> <a name="id3108510"></a><p class="title"><b>Table 29.6. sub_match<> Accessors</b></p> <div class="table-contents"><table class="table" summary="sub_match<> Accessors"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Accessor </p> </th> <th> <p> Effects </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span></code> </p> </td> <td> <p> Returns the length of the sub-match. Same as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">distance</span><span class="special">(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">first</span><span class="special">,</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">second</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">str</span><span class="special">()</span></code> </p> </td> <td> <p> Returns a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code> constructed from the sub-match. Same as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><</span><span class="identifier">char_type</span><span class="special">>(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">first</span><span class="special">,</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">second</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">compare</span><span class="special">(</span><span class="identifier">str</span><span class="special">)</span></code> </p> </td> <td> <p> Performs a string comparison between the sub-match and <code class="computeroutput"><span class="identifier">str</span></code>, where <code class="computeroutput"><span class="identifier">str</span></code> can be a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code>, C-style null-terminated string, or another sub-match. Same as <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">str</span><span class="special">().</span><span class="identifier">compare</span><span class="special">(</span><span class="identifier">str</span><span class="special">)</span></code>. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.accessing_results._inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject__results_invalidation__inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject_"></a><h3> <a name="id3108909"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results._inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject__results_invalidation__inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject_"><span class="inlinemediaobject"><img src="../images/caution.png" alt="caution"></span> Results Invalidation <span class="inlinemediaobject"><img src="../images/caution.png" alt="caution"></span></a> </h3> <p> Results are stored as iterators into the input sequence. Anything which invalidates the input sequence will invalidate the match results. For instance, if you match a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> object, the results are only valid until your next call to a non-const member function of that <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> object. After that, the results held by the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object are invalid. Don't use them! </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.string_substitutions"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions" title="String Substitutions">String Substitutions</a> </h3></div></div></div> <p> Regular expressions are not only good for searching text; they're good at <span class="emphasis"><em>manipulating</em></span> it. And one of the most common text manipulation tasks is search-and-replace. xpressive provides the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> algorithm for searching and replacing. </p> <a name="boost_xpressive.user_s_guide.string_substitutions.regex_replace__"></a><h3> <a name="id3109082"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.regex_replace__">regex_replace()</a> </h3> <p> Performing search-and-replace using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> is simple. All you need is an input sequence, a regex object, and a format string or a formatter object. There are several versions of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> algorithm. Some accept the input sequence as a bidirectional container such as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> and returns the result in a new container of the same type. Others accept the input as a null terminated string and return a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>. Still others accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution may be specified as a string with format sequences or as a formatter object. Below are some simple examples of using string-based substitutions. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">as_xpr</span><span class="special">(</span><span class="string">"his"</span><span class="special">);</span> <span class="comment">// find all occurrences of "his" ... </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">format</span><span class="special">(</span><span class="string">"her"</span><span class="special">);</span> <span class="comment">// ... and replace them with "her" </span> <span class="comment">// use the version of regex_replace() that operates on strings </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">input</span><span class="special">,</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// use the version of regex_replace() that operates on iterators </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="keyword">char</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">);</span> <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">out_iter</span><span class="special">,</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> </pre> <p> The above program prints out the following: </p> <pre class="programlisting">Ther is her face Ther is her face </pre> <p> Notice that <span class="emphasis"><em>all</em></span> the occurrences of <code class="computeroutput"><span class="string">"his"</span></code> have been replaced with <code class="computeroutput"><span class="string">"her"</span></code>. </p> <p> Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">here</a> to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>. And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> reference to see a complete list of the available overloads. </p> <a name="boost_xpressive.user_s_guide.string_substitutions.replace_options"></a><h3> <a name="id3109633"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.replace_options">Replace Options</a> </h3> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> algorithm takes an optional bitmask parameter to control the formatting. The possible values of the bitmask are: </p> <div class="table"> <a name="id3109667"></a><p class="title"><b>Table 29.7. Format Flags</b></p> <div class="table-contents"><table class="table" summary="Format Flags"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Flag </p> </th> <th> <p> Meaning </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_default</span></code> </p> </td> <td> <p> Recognize the ECMA-262 format sequences (see below). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_first_only</span></code> </p> </td> <td> <p> Only replace the first match, not all of them. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_no_copy</span></code> </p> </td> <td> <p> Don't copy the parts of the input sequence that didn't match the regex to the output sequence. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_literal</span></code> </p> </td> <td> <p> Treat the format string as a literal; that is, don't recognize any escape sequences. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_perl</span></code> </p> </td> <td> <p> Recognize the Perl format sequences (see below). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_sed</span></code> </p> </td> <td> <p> Recognize the sed format sequences (see below). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">format_all</span></code> </p> </td> <td> <p> In addition to the Perl format sequences, recognize some Boost-specific format sequences. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> These flags live in the <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">regex_constants</span></code> namespace. If the substitution parameter is a function object instead of a string, the flags <code class="computeroutput"><span class="identifier">format_literal</span></code>, <code class="computeroutput"><span class="identifier">format_perl</span></code>, <code class="computeroutput"><span class="identifier">format_sed</span></code>, and <code class="computeroutput"><span class="identifier">format_all</span></code> are ignored. </p> <a name="boost_xpressive.user_s_guide.string_substitutions.the_ecma_262_format_sequences"></a><h3> <a name="id3109961"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_ecma_262_format_sequences">The ECMA-262 Format Sequences</a> </h3> <p> When you haven't specified a substitution string dialect with one of the format flags above, you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows the escape sequences recognized in ECMA-262 mode. </p> <div class="table"> <a name="id3109984"></a><p class="title"><b>Table 29.8. Format Escape Sequences</b></p> <div class="table-contents"><table class="table" summary="Format Escape Sequences"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Escape Sequence </p> </th> <th> <p> Meaning </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal">$1</code>, <code class="literal">$2</code>, etc. </p> </td> <td> <p> the corresponding sub-match </p> </td> </tr> <tr> <td> <p> <code class="literal">$&</code> </p> </td> <td> <p> the full match </p> </td> </tr> <tr> <td> <p> <code class="literal">$`</code> </p> </td> <td> <p> the match prefix </p> </td> </tr> <tr> <td> <p> <code class="literal">$'</code> </p> </td> <td> <p> the match suffix </p> </td> </tr> <tr> <td> <p> <code class="literal">$$</code> </p> </td> <td> <p> a literal <code class="computeroutput"><span class="char">'$'</span></code> character </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> Any other sequence beginning with <code class="computeroutput"><span class="char">'$'</span></code> simply represents itself. For example, if the format string were <code class="computeroutput"><span class="string">"$a"</span></code> then <code class="computeroutput"><span class="string">"$a"</span></code> would be inserted into the output sequence. </p> <a name="boost_xpressive.user_s_guide.string_substitutions.the_sed_format_sequences"></a><h3> <a name="id3110191"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_sed_format_sequences">The Sed Format Sequences</a> </h3> <p> When specifying the <code class="computeroutput"><span class="identifier">format_sed</span></code> flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, the following escape sequences are recognized: </p> <div class="table"> <a name="id3110234"></a><p class="title"><b>Table 29.9. Sed Format Escape Sequences</b></p> <div class="table-contents"><table class="table" summary="Sed Format Escape Sequences"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Escape Sequence </p> </th> <th> <p> Meaning </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal">\1</code>, <code class="literal">\2</code>, etc. </p> </td> <td> <p> The corresponding sub-match </p> </td> </tr> <tr> <td> <p> <code class="literal">&</code> </p> </td> <td> <p> the full match </p> </td> </tr> <tr> <td> <p> <code class="literal">\a</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\a'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\e</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">27</span><span class="special">)</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\f</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\f'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\n</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\n'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\r</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\r'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\t</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\t'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\v</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\v'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\xFF</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> is any hex digit </p> </td> </tr> <tr> <td> <p> <code class="literal">\x{FFFF}</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFFFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> is any hex digit </p> </td> </tr> <tr> <td> <p> <code class="literal">\cX</code> </p> </td> <td> <p> The control character <code class="literal"><span class="emphasis"><em>X</em></span></code> </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.the_perl_format_sequences"></a><h3> <a name="id3110966"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_perl_format_sequences">The Perl Format Sequences</a> </h3> <p> When specifying the <code class="computeroutput"><span class="identifier">format_perl</span></code> flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, the following escape sequences are recognized: </p> <div class="table"> <a name="id3111008"></a><p class="title"><b>Table 29.10. Perl Format Escape Sequences</b></p> <div class="table-contents"><table class="table" summary="Perl Format Escape Sequences"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> Escape Sequence </p> </th> <th> <p> Meaning </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="literal">$1</code>, <code class="literal">$2</code>, etc. </p> </td> <td> <p> the corresponding sub-match </p> </td> </tr> <tr> <td> <p> <code class="literal">$&</code> </p> </td> <td> <p> the full match </p> </td> </tr> <tr> <td> <p> <code class="literal">$`</code> </p> </td> <td> <p> the match prefix </p> </td> </tr> <tr> <td> <p> <code class="literal">$'</code> </p> </td> <td> <p> the match suffix </p> </td> </tr> <tr> <td> <p> <code class="literal">$$</code> </p> </td> <td> <p> a literal <code class="computeroutput"><span class="char">'$'</span></code> character </p> </td> </tr> <tr> <td> <p> <code class="literal">\a</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\a'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\e</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">27</span><span class="special">)</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\f</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\f'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\n</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\n'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\r</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\r'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\t</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\t'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\v</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="char">'\v'</span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\xFF</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> is any hex digit </p> </td> </tr> <tr> <td> <p> <code class="literal">\x{FFFF}</code> </p> </td> <td> <p> A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFFFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> is any hex digit </p> </td> </tr> <tr> <td> <p> <code class="literal">\cX</code> </p> </td> <td> <p> The control character <code class="literal"><span class="emphasis"><em>X</em></span></code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\l</code> </p> </td> <td> <p> Make the next character lowercase </p> </td> </tr> <tr> <td> <p> <code class="literal">\L</code> </p> </td> <td> <p> Make the rest of the substitution lowercase until the next <code class="literal">\E</code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\u</code> </p> </td> <td> <p> Make the next character uppercase </p> </td> </tr> <tr> <td> <p> <code class="literal">\U</code> </p> </td> <td> <p> Make the rest of the substitution uppercase until the next <code class="literal">\E</code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\E</code> </p> </td> <td> <p> Terminate <code class="literal">\L</code> or <code class="literal">\U</code> </p> </td> </tr> <tr> <td> <p> <code class="literal">\1</code>, <code class="literal">\2</code>, etc. </p> </td> <td> <p> The corresponding sub-match </p> </td> </tr> <tr> <td> <p> <code class="literal">\g<name></code> </p> </td> <td> <p> The named backref <span class="emphasis"><em>name</em></span> </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.the_boost_specific_format_sequences"></a><h3> <a name="id3111733"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_boost_specific_format_sequences">The Boost-Specific Format Sequences</a> </h3> <p> When specifying the <code class="computeroutput"><span class="identifier">format_all</span></code> flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, the escape sequences recognized are the same as those above for <code class="computeroutput"><span class="identifier">format_perl</span></code>. In addition, conditional expressions of the following form are recognized: </p> <pre class="programlisting">?Ntrue-expression:false-expression </pre> <p> where <span class="emphasis"><em>N</em></span> is a decimal digit representing a sub-match. If the corresponding sub-match participated in the full match, then the substitution is <span class="emphasis"><em>true-expression</em></span>. Otherwise, it is <span class="emphasis"><em>false-expression</em></span>. In this mode, you can use parens <code class="literal">()</code> for grouping. If you want a literal paren, you must escape it as <code class="literal">\(</code>. </p> <a name="boost_xpressive.user_s_guide.string_substitutions.formatter_objects"></a><h3> <a name="id3111832"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.formatter_objects">Formatter Objects</a> </h3> <p> Format strings are not always expressive enough for all your text substitution needs. Consider the simple example of wanting to map input strings to output strings, as you may want to do with environment variables. Rather than a format <span class="emphasis"><em>string</em></span>, for this you would use a formatter <span class="emphasis"><em>object</em></span>. Consider the following code, which finds embedded environment variables of the form <code class="computeroutput"><span class="string">"$(XYZ)"</span></code> and computes the substitution string by looking up the environment variable in a map. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">;</span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">xpressive</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">format_fun</span><span class="special">(</span><span class="identifier">smatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span><span class="special">)</span> <span class="special">{</span> <span class="keyword">return</span> <span class="identifier">env</span><span class="special">[</span><span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">].</span><span class="identifier">str</span><span class="special">()];</span> <span class="special">}</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> <span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> <span class="comment">// replace strings like "$(XYZ)" with the result of env["XYZ"] </span> <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">format_fun</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> In this case, we use a function, <code class="computeroutput"><span class="identifier">format_fun</span><span class="special">()</span></code> to compute the substitution string on the fly. It accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object which contains the results of the current match. <code class="computeroutput"><span class="identifier">format_fun</span><span class="special">()</span></code> uses the first submatch as a key into the global <code class="computeroutput"><span class="identifier">env</span></code> map. The above code displays: </p> <pre class="programlisting">"this" has the value "that" </pre> <p> The formatter need not be an ordinary function. It may be an object of class type. And rather than return a string, it may accept an output iterator into which it writes the substitution. Consider the following, which is functionally equivalent to the above. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">;</span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">struct</span> <span class="identifier">formatter</span> <span class="special">{</span> <span class="keyword">typedef</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env_map</span><span class="special">;</span> <span class="identifier">env_map</span> <span class="identifier">env</span><span class="special">;</span> <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Out</span><span class="special">></span> <span class="identifier">Out</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">smatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span><span class="special">,</span> <span class="identifier">Out</span> <span class="identifier">out</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> <span class="identifier">env_map</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">where</span> <span class="special">=</span> <span class="identifier">env</span><span class="special">.</span><span class="identifier">find</span><span class="special">(</span><span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]);</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">where</span> <span class="special">!=</span> <span class="identifier">env</span><span class="special">.</span><span class="identifier">end</span><span class="special">())</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">sub</span> <span class="special">=</span> <span class="identifier">where</span><span class="special">-></span><span class="identifier">second</span><span class="special">;</span> <span class="identifier">out</span> <span class="special">=</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">out</span><span class="special">);</span> <span class="special">}</span> <span class="keyword">return</span> <span class="identifier">out</span><span class="special">;</span> <span class="special">}</span> <span class="special">};</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">formatter</span> <span class="identifier">fmt</span><span class="special">;</span> <span class="identifier">fmt</span><span class="special">.</span><span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> <span class="identifier">fmt</span><span class="special">.</span><span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">fmt</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> The formatter must be a callable object -- a function or a function object -- that has one of three possible signatures, detailed in the table below. For the table, <code class="computeroutput"><span class="identifier">fmt</span></code> is a function pointer or function object, <code class="computeroutput"><span class="identifier">what</span></code> is a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object, <code class="computeroutput"><span class="identifier">out</span></code> is an OutputIterator, and <code class="computeroutput"><span class="identifier">flags</span></code> is a value of <code class="computeroutput"><span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">match_flag_type</span></code>: </p> <div class="table"> <a name="id3113618"></a><p class="title"><b>Table 29.11. Formatter Signatures</b></p> <div class="table-contents"><table class="table" summary="Formatter Signatures"> <colgroup> <col> <col> <col> </colgroup> <thead><tr> <th> <p> Formatter Invocation </p> </th> <th> <p> Return Type </p> </th> <th> <p> Semantics </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">)</span></code> </p> </td> <td> <p> Range of characters (e.g. <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>) or null-terminated string </p> </td> <td> <p> The string matched by the regex is replaced with the string returned by the formatter. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">,</span> <span class="identifier">out</span><span class="special">)</span></code> </p> </td> <td> <p> OutputIterator </p> </td> <td> <p> The formatter writes the replacement string into <code class="computeroutput"><span class="identifier">out</span></code> and returns <code class="computeroutput"><span class="identifier">out</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">,</span> <span class="identifier">out</span><span class="special">,</span> <span class="identifier">flags</span><span class="special">)</span></code> </p> </td> <td> <p> OutputIterator </p> </td> <td> <p> The formatter writes the replacement string into <code class="computeroutput"><span class="identifier">out</span></code> and returns <code class="computeroutput"><span class="identifier">out</span></code>. The <code class="computeroutput"><span class="identifier">flags</span></code> parameter is the value of the match flags passed to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> algorithm. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.formatter_expressions"></a><h3> <a name="id3113921"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.formatter_expressions">Formatter Expressions</a> </h3> <p> In addition to format <span class="emphasis"><em>strings</em></span> and formatter <span class="emphasis"><em>objects</em></span>, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> also accepts formatter <span class="emphasis"><em>expressions</em></span>. A formatter expression is a lambda expression that generates a string. It uses the same syntax as that for <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions" title="Semantic Actions and User-Defined Assertions">Semantic Actions</a>, which are covered later. The above example, which uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> to substitute strings for environment variables, is repeated here using a formatter expression. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env</span><span class="special">;</span> <span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> <span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">env</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> In the above, the formatter expression is <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">env</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code>. This means to use the value of the first submatch, <code class="computeroutput"><span class="identifier">s1</span></code>, as a key into the <code class="computeroutput"><span class="identifier">env</span></code> map. The purpose of <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> here is to make the reference to the <code class="computeroutput"><span class="identifier">env</span></code> local variable <span class="emphasis"><em>lazy</em></span> so that the index operation is deferred until we know what to replace <code class="computeroutput"><span class="identifier">s1</span></code> with. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization" title="String Splitting and Tokenization">String Splitting and Tokenization</a> </h3></div></div></div> <p> <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> is the Ginsu knife of the text manipulation world. It slices! It dices! This section describes how to use the highly-configurable <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> to chop up input sequences. </p> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.overview"></a><h3> <a name="id3114735"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.overview">Overview</a> </h3> <p> You initialize a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> with an input sequence, a regex, and some optional configuration parameters. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> will use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> to find the first place in the sequence that the regex matches. When dereferenced, the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> returns a <span class="emphasis"><em>token</em></span> in the form of a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code>. Which string it returns depends on the configuration parameters. By default it returns a string corresponding to the full match, but it could also return a string corresponding to a particular marked sub-expression, or even the part of the sequence that <span class="emphasis"><em>didn't</em></span> match. When you increment the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>, it will move to the next token. Which token is next depends on the configuration parameters. It could simply be a different marked sub-expression in the current match, or it could be part or all of the next match. Or it could be the part that <span class="emphasis"><em>didn't</em></span> match. </p> <p> As you can see, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> can do a lot. That makes it hard to describe, but some examples should make it clear. </p> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_1__simple_tokenization"></a><h3> <a name="id3114903"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_1__simple_tokenization">Example 1: Simple Tokenization</a> </h3> <p> This example uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> to chop a sequence into a series of tokens consisting of words. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// find a word </span> <span class="comment">// iterate over all the words in the input </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> <span class="comment">// write all the words to std::cout </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> </pre> <p> This program displays the following: </p> <pre class="programlisting">This is his face </pre> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_2__simple_tokenization__reloaded"></a><h3> <a name="id3115241"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_2__simple_tokenization__reloaded">Example 2: Simple Tokenization, Reloaded</a> </h3> <p> This example also uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> to chop a sequence into a series of tokens consisting of words, but it uses the regex as a delimiter. When we pass a <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code> as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> constructor, it instructs the token iterator to consider as tokens those parts of the input that <span class="emphasis"><em>didn't</em></span> match the regex. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_s</span><span class="special">;</span> <span class="comment">// find white space </span> <span class="comment">// iterate over all non-white space in the input. Note the -1 below: </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="special">-</span><span class="number">1</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> <span class="comment">// write all the words to std::cout </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> </pre> <p> This program displays the following: </p> <pre class="programlisting">This is his face </pre> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_3__simple_tokenization__revolutions"></a><h3> <a name="id3115630"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_3__simple_tokenization__revolutions">Example 3: Simple Tokenization, Revolutions</a> </h3> <p> This example also uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> to chop a sequence containing a bunch of dates into a series of tokens consisting of just the years. When we pass a positive integer <code class="literal"><span class="emphasis"><em>N</em></span></code> as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> constructor, it instructs the token iterator to consider as tokens only the <code class="literal"><span class="emphasis"><em>N</em></span></code>-th marked sub-expression of each match. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(\\d{2})/(\\d{2})/(\\d{4})"</span><span class="special">);</span> <span class="comment">// find a date </span> <span class="comment">// iterate over all the years in the input. Note the 3 below, corresponding to the 3rd sub-expression: </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="number">3</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> <span class="comment">// write all the words to std::cout </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> </pre> <p> This program displays the following: </p> <pre class="programlisting">2003 1999 1981 </pre> <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_4__not_so_simple_tokenization"></a><h3> <a name="id3116026"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_4__not_so_simple_tokenization">Example 4: Not-So-Simple Tokenization</a> </h3> <p> This example is like the previous one, except that instead of tokenizing just the years, this program turns the days, months and years into tokens. When we pass an array of integers <code class="literal"><span class="emphasis"><em>{I,J,...}</em></span></code> as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> constructor, it instructs the token iterator to consider as tokens the <code class="literal"><span class="emphasis"><em>I</em></span></code>-th, <code class="literal"><span class="emphasis"><em>J</em></span></code>-th, etc. marked sub-expression of each match. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(\\d{2})/(\\d{2})/(\\d{4})"</span><span class="special">);</span> <span class="comment">// find a date </span> <span class="comment">// iterate over the days, months and years in the input </span><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">sub_matches</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">2</span><span class="special">,</span> <span class="number">1</span><span class="special">,</span> <span class="number">3</span> <span class="special">};</span> <span class="comment">// day, month, year </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">sub_matches</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> <span class="comment">// write all the words to std::cout </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> </pre> <p> This program displays the following: </p> <pre class="programlisting">02 01 2003 23 04 1999 13 11 1981 </pre> <p> The <code class="computeroutput"><span class="identifier">sub_matches</span></code> array instructs the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> to first take the value of the 2nd sub-match, then the 1st sub-match, and finally the 3rd. Incrementing the iterator again instructs it to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> again to find the next match. At that point, the process repeats -- the token iterator takes the value of the 2nd sub-match, then the 1st, et cetera. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.named_captures"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures" title="Named Captures">Named Captures</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.named_captures.overview"></a><h3> <a name="id3116543"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.overview">Overview</a> </h3> <p> For complicated regular expressions, dealing with numbered captures can be a pain. Counting left parentheses to figure out which capture to reference is no fun. Less fun is the fact that merely editing a regular expression could cause a capture to be assigned a new number, invaliding code that refers back to it by the old number. </p> <p> Other regular expression engines solve this problem with a feature called <span class="emphasis"><em>named captures</em></span>. This feature allows you to assign a name to a capture, and to refer back to the capture by name rather by number. Xpressive also supports named captures, both in dynamic and in static regexes. </p> <a name="boost_xpressive.user_s_guide.named_captures.dynamic_named_captures"></a><h3> <a name="id3116588"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.dynamic_named_captures">Dynamic Named Captures</a> </h3> <p> For dynamic regular expressions, xpressive follows the lead of other popular regex engines with the syntax of named captures. You can create a named capture with <code class="computeroutput"><span class="string">"(?P<xxx>...)"</span></code> and refer back to that capture with <code class="computeroutput"><span class="string">"(?P=xxx)"</span></code>. Here, for instance, is a regular expression that creates a named capture and refers back to it: </p> <pre class="programlisting"><span class="comment">// Create a named capture called "char" that matches a single </span><span class="comment">// character and refer back to that capture by name. </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> </pre> <p> The effect of the above regular expression is to find the first doubled character. </p> <p> Once you have executed a match or search operation using a regex with named captures, you can access the named capture through the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object using the capture's name. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"char = "</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="string">"char"</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> The above code displays: </p> <pre class="programlisting">char = e </pre> <p> You can also refer back to a named capture from within a substitution string. The syntax for that is <code class="computeroutput"><span class="string">"\\g<xxx>"</span></code>. Below is some code that demonstrates how to use named captures when doing string substitution. </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">,</span> <span class="string">"**\\g<char>**"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">format_perl</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> </pre> <p> Notice that you have to specify <code class="computeroutput"><span class="identifier">format_perl</span></code> when using named captures. Only the perl syntax recognizes the <code class="computeroutput"><span class="string">"\\g<xxx>"</span></code> syntax. The above code displays: </p> <pre class="programlisting">tw**e**t </pre> <a name="boost_xpressive.user_s_guide.named_captures.static_named_captures"></a><h3> <a name="id3117203"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.static_named_captures">Static Named Captures</a> </h3> <p> If you're using static regular expressions, creating and using named captures is even easier. You can use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code></code> type to create a variable that you can use like <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a></code>, <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s2</a></code> and friends, but with a name that is more meaningful. Below is how the above example would look using static regexes: </p> <pre class="programlisting"><span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> <span class="comment">// char_ is now a synonym for s1 </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> </pre> <p> After a match operation, you can use the <code class="computeroutput"><span class="identifier">mark_tag</span></code> to index into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> to access the named capture: </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> <span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">char_</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> The above code displays: </p> <pre class="programlisting">char = e </pre> <p> When doing string substitutions with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, you can use named captures to create <span class="emphasis"><em>format expressions</em></span> as below: </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> <span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">,</span> <span class="string">"**"</span> <span class="special">+</span> <span class="identifier">char_</span> <span class="special">+</span> <span class="string">"**"</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> </pre> <p> The above code displays: </p> <pre class="programlisting">tw**e**t </pre> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> You need to include <code class="literal"><boost/xpressive/regex_actions.hpp></code> to use format expressions. </p></td></tr> </table></div> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">Grammars and Nested Matches</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.overview"></a><h3> <a name="id3117948"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.overview">Overview</a> </h3> <p> One of the key benefits of representing regexes as C++ expressions is the ability to easily refer to other C++ code and data from within the regex. This enables programming idioms that are not possible with other regular expression libraries. Of particular note is the ability for one regex to refer to another regex, allowing you to build grammars out of regular expressions. This section describes how to embed one regex in another by value and by reference, how regex objects behave when they refer to other regexes, and how to access the tree of results after a successful parse. </p> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_value"></a><h3> <a name="id3117975"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_value">Embedding a Regex by Value</a> </h3> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object has value semantics. When a regex object appears on the right-hand side in the definition of another regex, it is as if the regex were embedded by value; that is, a copy of the nested regex is stored by the enclosing regex. The inner regex is invoked by the outer regex during pattern matching. The inner regex participates fully in the match, back-tracking as needed to make the match succeed. </p> <p> Consider a text editor that has a regex-find feature with a whole-word option. You can implement this with xpressive as follows: </p> <pre class="programlisting"><span class="identifier">find_dialog</span> <span class="identifier">dlg</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">dialog_ok</span> <span class="special">==</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">do_modal</span><span class="special">()</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">pattern</span> <span class="special">=</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">get_text</span><span class="special">();</span> <span class="comment">// the pattern the user entered </span> <span class="keyword">bool</span> <span class="identifier">whole_word</span> <span class="special">=</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">whole_word</span><span class="special">.</span><span class="identifier">is_checked</span><span class="special">();</span> <span class="comment">// did the user select the whole-word option? </span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="identifier">pattern</span> <span class="special">);</span> <span class="comment">// try to compile the pattern </span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">whole_word</span> <span class="special">)</span> <span class="special">{</span> <span class="comment">// wrap the regex in begin-word / end-word assertions </span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">bow</span> <span class="special">>></span> <span class="identifier">re</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">;</span> <span class="special">}</span> <span class="comment">// ... use re ... </span><span class="special">}</span> </pre> <p> Look closely at this line: </p> <pre class="programlisting"><span class="comment">// wrap the regex in begin-word / end-word assertions </span><span class="identifier">re</span> <span class="special">=</span> <span class="identifier">bow</span> <span class="special">>></span> <span class="identifier">re</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">;</span> </pre> <p> This line creates a new regex that embeds the old regex by value. Then, the new regex is assigned back to the original regex. Since a copy of the old regex was made on the right-hand side, this works as you might expect: the new regex has the behavior of the old regex wrapped in begin- and end-word assertions. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Note that <code class="computeroutput"><span class="identifier">re</span> <span class="special">=</span> <span class="identifier">bow</span> <span class="special">>></span> <span class="identifier">re</span> <span class="special">>></span> <span class="identifier">eow</span></code> does <span class="emphasis"><em>not</em></span> define a recursive regular expression, since regex objects embed by value by default. The next section shows how to define a recursive regular expression by embedding a regex by reference. </p></td></tr> </table></div> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference"></a><h3> <a name="id3118463"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference">Embedding a Regex by Reference</a> </h3> <p> If you want to be able to build recursive regular expressions and context-free grammars, embedding a regex by value is not enough. You need to be able to make your regular expressions self-referential. Most regular expression engines don't give you that power, but xpressive does. </p> <div class="tip"><table border="0" summary="Tip"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> <th align="left">Tip</th> </tr> <tr><td align="left" valign="top"><p> The theoretical computer scientists out there will correctly point out that a self-referential regular expression is not "regular", so in the strict sense, xpressive isn't really a <span class="emphasis"><em>regular</em></span> expression engine at all. But as Larry Wall once said, "the term [regular expression] has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here." </p></td></tr> </table></div> <p> Consider the following code, which uses the <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> helper to define a recursive regular expression that matches balanced, nested parentheses: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">parentheses</span><span class="special">;</span> <span class="identifier">parentheses</span> <span class="comment">// A balanced set of parentheses ... </span> <span class="special">=</span> <span class="char">'('</span> <span class="comment">// is an opening parenthesis ... </span> <span class="special">>></span> <span class="comment">// followed by ... </span> <span class="special">*(</span> <span class="comment">// zero or more ... </span> <span class="identifier">keep</span><span class="special">(</span> <span class="special">+~(</span><span class="identifier">set</span><span class="special">=</span><span class="char">'('</span><span class="special">,</span><span class="char">')'</span><span class="special">)</span> <span class="special">)</span> <span class="comment">// of a bunch of things that are not parentheses ... </span> <span class="special">|</span> <span class="comment">// or ... </span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">parentheses</span><span class="special">)</span> <span class="comment">// a balanced set of parentheses </span> <span class="special">)</span> <span class="comment">// (ooh, recursion!) ... </span> <span class="special">>></span> <span class="comment">// followed by ... </span> <span class="char">')'</span> <span class="comment">// a closing parenthesis </span> <span class="special">;</span> </pre> <p> Matching balanced, nested tags is an important text processing task, and it is one that "classic" regular expressions cannot do. The <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> helper makes it possible. It allows one regex object to be embedded in another <span class="emphasis"><em>by reference</em></span>. Since the right-hand side holds <code class="computeroutput"><span class="identifier">parentheses</span></code> by reference, assigning the right-hand side back to <code class="computeroutput"><span class="identifier">parentheses</span></code> creates a cycle, which will execute recursively. </p> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.building_a_grammar"></a><h3> <a name="id3118777"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.building_a_grammar">Building a Grammar</a> </h3> <p> Once we allow self-reference in our regular expressions, the genie is out of the bottle and all manner of fun things are possible. In particular, we can now build grammars out of regular expressions. Let's have a look at the text-book grammar example: the humble calculator. </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">group</span><span class="special">,</span> <span class="identifier">factor</span><span class="special">,</span> <span class="identifier">term</span><span class="special">,</span> <span class="identifier">expression</span><span class="special">;</span> <span class="identifier">group</span> <span class="special">=</span> <span class="char">'('</span> <span class="special">>></span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">expression</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="identifier">factor</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">|</span> <span class="identifier">group</span><span class="special">;</span> <span class="identifier">term</span> <span class="special">=</span> <span class="identifier">factor</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'*'</span> <span class="special">>></span> <span class="identifier">factor</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'/'</span> <span class="special">>></span> <span class="identifier">factor</span><span class="special">));</span> <span class="identifier">expression</span> <span class="special">=</span> <span class="identifier">term</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'+'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'-'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">));</span> </pre> <p> The regex <code class="computeroutput"><span class="identifier">expression</span></code> defined above does something rather remarkable for a regular expression: it matches mathematical expressions. For example, if the input string were <code class="computeroutput"><span class="string">"foo 9*(10+3) bar"</span></code>, this pattern would match <code class="computeroutput"><span class="string">"9*(10+3)"</span></code>. It only matches well-formed mathematical expressions, where the parentheses are balanced and the infix operators have two arguments each. Don't try this with just any regular expression engine! </p> <p> Let's take a closer look at this regular expression grammar. Notice that it is cyclic: <code class="computeroutput"><span class="identifier">expression</span></code> is implemented in terms of <code class="computeroutput"><span class="identifier">term</span></code>, which is implemented in terms of <code class="computeroutput"><span class="identifier">factor</span></code>, which is implemented in terms of <code class="computeroutput"><span class="identifier">group</span></code>, which is implemented in terms of <code class="computeroutput"><span class="identifier">expression</span></code>, closing the loop. In general, the way to define a cyclic grammar is to forward-declare the regex objects and embed by reference those regular expressions that have not yet been initialized. In the above grammar, there is only one place where we need to reference a regex object that has not yet been initialized: the definition of <code class="computeroutput"><span class="identifier">group</span></code>. In that place, we use <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> to embed <code class="computeroutput"><span class="identifier">expression</span></code> by reference. In all other places, it is sufficient to embed the other regex objects by value, since they have already been initialized and their values will not change. </p> <div class="tip"><table border="0" summary="Tip"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> <th align="left">Tip</th> </tr> <tr><td align="left" valign="top"><p> <span class="bold"><strong>Embed by value if possible</strong></span> <br> <br> In general, prefer embedding regular expressions by value rather than by reference. It involves one less indirection, making your patterns match a little faster. Besides, value semantics are simpler and will make your grammars easier to reason about. Don't worry about the expense of "copying" a regex. Each regex object shares its implementation with all of its copies. </p></td></tr> </table></div> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.dynamic_regex_grammars"></a><h3> <a name="id3119263"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.dynamic_regex_grammars">Dynamic Regex Grammars</a> </h3> <p> Using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>, you can also build grammars out of dynamic regular expressions. You do that by creating named regexes, and referring to other regexes by name. Each <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> instance keeps a mapping from names to regexes that have been created with it. </p> <p> You can create a named dynamic regex by prefacing your regex with <code class="computeroutput"><span class="string">"(?$name=)"</span></code>, where <span class="emphasis"><em>name</em></span> is the name of the regex. You can refer to a named regex from another regex with <code class="computeroutput"><span class="string">"(?$name)"</span></code>. The named regex does not need to exist yet at the time it is referenced in another regex, but it must exist by the time you use the regex. </p> <p> Below is a code fragment that uses dynamic regex grammars to implement the calculator example from above. </p> <pre class="programlisting"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">regex_constants</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">expr</span><span class="special">;</span> <span class="special">{</span> <span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> <span class="identifier">syntax_option_type</span> <span class="identifier">x</span> <span class="special">=</span> <span class="identifier">ignore_white_space</span><span class="special">;</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $group = ) \\( (? $expr ) \\) "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $factor = ) \\d+ | (? $group ) "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $term = ) (? $factor )"</span> <span class="string">" ( \\* (? $factor ) | / (? $factor ) )* "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> <span class="identifier">expr</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $expr = ) (? $term )"</span> <span class="string">" ( \\+ (? $term ) | - (? $term ) )* "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> <span class="special">}</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"foo 9*(10+3) bar"</span><span class="special">);</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">expr</span><span class="special">))</span> <span class="special">{</span> <span class="comment">// This prints "9*(10+3)": </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> As with static regex grammars, nested regex invocations create nested match results (see <span class="emphasis"><em>Nested Results</em></span> below). The result is a complete parse tree for string that matched. Unlike static regexes, dynamic regexes are always embedded by reference, not by value. </p> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.cyclic_patterns__copying_and_memory_management__oh_my_"></a><h3> <a name="id3119842"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.cyclic_patterns__copying_and_memory_management__oh_my_">Cyclic Patterns, Copying and Memory Management, Oh My!</a> </h3> <p> The calculator examples above raises a number of very complicated memory-management issues. Each of the four regex objects refer to each other, some directly and some indirectly, some by value and some by reference. What if we were to return one of them from a function and let the others go out of scope? What becomes of the references? The answer is that the regex objects are internally reference counted, such that they keep their referenced regex objects alive as long as they need them. So passing a regex object by value is never a problem, even if it refers to other regex objects that have gone out of scope. </p> <p> Those of you who have dealt with reference counting are probably familiar with its Achilles Heel: cyclic references. If regex objects are reference counted, what happens to cycles like the one created in the calculator examples? Are they leaked? The answer is no, they are not leaked. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object has some tricky reference tracking code that ensures that even cyclic regex grammars are cleaned up when the last external reference goes away. So don't worry about it. Create cyclic grammars, pass your regex objects around and copy them all you want. It is fast and efficient and guaranteed not to leak or result in dangling references. </p> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_regexes_and_sub_match_scoping"></a><h3> <a name="id3119900"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_regexes_and_sub_match_scoping">Nested Regexes and Sub-Match Scoping</a> </h3> <p> Nested regular expressions raise the issue of sub-match scoping. If both the inner and outer regex write to and read from the same sub-match vector, chaos would ensue. The inner regex would stomp on the sub-matches written by the outer regex. For example, what does this do? </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">inner</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(.)\\1"</span> <span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">outer</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">inner</span> <span class="special">>></span> <span class="identifier">s1</span><span class="special">;</span> </pre> <p> The author probably didn't intend for the inner regex to overwrite the sub-match written by the outer regex. The problem is particularly acute when the inner regex is accepted from the user as input. The author has no way of knowing whether the inner regex will stomp the sub-match vector or not. This is clearly not acceptable. </p> <p> Instead, what actually happens is that each invocation of a nested regex gets its own scope. Sub-matches belong to that scope. That is, each nested regex invocation gets its own copy of the sub-match vector to play with, so there is no way for an inner regex to stomp on the sub-matches of an outer regex. So, for example, the regex <code class="computeroutput"><span class="identifier">outer</span></code> defined above would match <code class="computeroutput"><span class="string">"ABBA"</span></code>, as it should. </p> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_results"></a><h3> <a name="id3120087"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_results">Nested Results</a> </h3> <p> If nested regexes have their own sub-matches, there should be a way to access them after a successful match. In fact, there is. After a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>, the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> struct behaves like the head of a tree of nested results. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> class provides a <code class="computeroutput"><span class="identifier">nested_results</span><span class="special">()</span></code> member function that returns an ordered sequence of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> structures, representing the results of the nested regexes. The order of the nested results is the same as the order in which the nested regex objects matched. </p> <p> Take as an example the regex for balanced, nested parentheses we saw earlier: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">parentheses</span><span class="special">;</span> <span class="identifier">parentheses</span> <span class="special">=</span> <span class="char">'('</span> <span class="special">>></span> <span class="special">*(</span> <span class="identifier">keep</span><span class="special">(</span> <span class="special">+~(</span><span class="identifier">set</span><span class="special">=</span><span class="char">'('</span><span class="special">,</span><span class="char">')'</span><span class="special">)</span> <span class="special">)</span> <span class="special">|</span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">parentheses</span><span class="special">)</span> <span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"blah blah( a(b)c (c(e)f (g)h )i (j)6 )blah"</span> <span class="special">);</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_search</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">parentheses</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="comment">// display the whole match </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// display the nested results </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">output_nested_results</span><span class="special">()</span> <span class="special">);</span> <span class="special">}</span> </pre> <p> This program displays the following: </p> <pre class="programlisting">( a(b)c (c(e)f (g)h )i (j)6 ) (b) (c(e)f (g)h ) (e) (g) (j) </pre> <p> Here you can see how the results are nested and that they are stored in the order in which they are found. </p> <div class="tip"><table border="0" summary="Tip"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> <th align="left">Tip</th> </tr> <tr><td align="left" valign="top"><p> See the definition of <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results">output_nested_results</a> in the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">Examples</a> section. </p></td></tr> </table></div> <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.filtering_nested_results"></a><h3> <a name="id3120663"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.filtering_nested_results">Filtering Nested Results</a> </h3> <p> Sometimes a regex will have several nested regex objects, and you want to know which result corresponds to which regex object. That's where <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">match_results</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> come in handy. When iterating over the nested results, you can compare the regex id from the results to the id of the regex object you're interested in. </p> <p> To make this a bit easier, xpressive provides a predicate to make it simple to iterate over just the results that correspond to a certain nested regex. It is called <code class="computeroutput"><span class="identifier">regex_id_filter_predicate</span></code>, and it is intended to be used with <a href="../../../libs/iterator/doc/index.html" target="_top">Boost.Iterator</a>. You can use it as follows: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">name</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">alpha</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">integer</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">*(</span> <span class="special">*</span><span class="identifier">_s</span> <span class="special">>></span> <span class="special">(</span> <span class="identifier">name</span> <span class="special">|</span> <span class="identifier">integer</span> <span class="special">)</span> <span class="special">);</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"marsha 123 jan 456 cindy 789"</span> <span class="special">);</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">re</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">smatch</span><span class="special">::</span><span class="identifier">nested_results_type</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">begin</span> <span class="special">=</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">();</span> <span class="identifier">smatch</span><span class="special">::</span><span class="identifier">nested_results_type</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">end</span> <span class="special">=</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">();</span> <span class="comment">// declare filter predicates to select just the names or the integers </span> <span class="identifier">sregex_id_filter_predicate</span> <span class="identifier">name_id</span><span class="special">(</span> <span class="identifier">name</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span> <span class="special">);</span> <span class="identifier">sregex_id_filter_predicate</span> <span class="identifier">integer_id</span><span class="special">(</span> <span class="identifier">integer</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span> <span class="special">);</span> <span class="comment">// iterate over only the results from the name regex </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">name_id</span><span class="special">,</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">name_id</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> <span class="identifier">output_result</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// iterate over only the results from the integer regex </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">integer_id</span><span class="special">,</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">integer_id</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> <span class="identifier">output_result</span> <span class="special">);</span> <span class="special">}</span> </pre> <p> where <code class="computeroutput"><span class="identifier">output_results</span></code> is a simple function that takes a <code class="computeroutput"><span class="identifier">smatch</span></code> and displays the full match. Notice how we use the <code class="computeroutput"><span class="identifier">regex_id_filter_predicate</span></code> together with <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">()</span></code> from the <a href="../../../libs/iterator/doc/index.html" target="_top">Boost.Iterator</a> to select only those results corresponding to a particular nested regex. This program displays the following: </p> <pre class="programlisting">marsha jan cindy 123 456 789 </pre> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions" title="Semantic Actions and User-Defined Assertions">Semantic Actions and User-Defined Assertions</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.overview"></a><h3> <a name="id3121634"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.overview">Overview</a> </h3> <p> Imagine you want to parse an input string and build a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code> from it. For something like that, matching a regular expression isn't enough. You want to <span class="emphasis"><em>do something</em></span> when parts of your regular expression match. Xpressive lets you attach semantic actions to parts of your static regular expressions. This section shows you how. </p> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.semantic_actions"></a><h3> <a name="id3121693"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.semantic_actions">Semantic Actions</a> </h3> <p> Consider the following code, which uses xpressive's semantic actions to parse a string of word/integer pairs and stuffs them into a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. It is described below. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> <span class="comment">// Match a word and an integer, separated by =>, </span> <span class="comment">// and then stuff the result into a std::map<> </span> <span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> <span class="comment">// Match one or more word/integer pairs, separated </span> <span class="comment">// by whitespace. </span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">pair</span> <span class="special">>></span> <span class="special">*(+</span><span class="identifier">_s</span> <span class="special">>></span> <span class="identifier">pair</span><span class="special">);</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program prints the following: </p> <pre class="programlisting">1 23 456 </pre> <p> The regular expression <code class="computeroutput"><span class="identifier">pair</span></code> has two parts: the pattern and the action. The pattern says to match a word, capturing it in sub-match 1, and an integer, capturing it in sub-match 2, separated by <code class="computeroutput"><span class="string">"=>"</span></code>. The action is the part in square brackets: <code class="computeroutput"><span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">]</span></code>. It says to take sub-match one and use it to index into the <code class="computeroutput"><span class="identifier">results</span></code> map, and assign to it the result of converting sub-match 2 to an integer. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> To use semantic actions with your static regexes, you must <code class="computeroutput"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code> </p></td></tr> </table></div> <p> How does this work? Just as the rest of the static regular expression, the part between brackets is an expression template. It encodes the action and executes it later. The expression <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span></code> creates a lazy reference to the <code class="computeroutput"><span class="identifier">result</span></code> object. The larger expression <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code> is a lazy map index operation. Later, when this action is getting executed, <code class="computeroutput"><span class="identifier">s1</span></code> gets replaced with the first <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>. Likewise, when <code class="computeroutput"><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span></code> gets executed, <code class="computeroutput"><span class="identifier">s2</span></code> is replaced with the second <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>. The <code class="computeroutput"><span class="identifier">as</span><span class="special"><></span></code> action converts its argument to the requested type using Boost.Lexical_cast. The effect of the whole action is to insert a new word/integer pair into the map. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> There is an important difference between the function <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> in <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">ref</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code> and <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> in <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>. The first returns a plain <code class="computeroutput"><span class="identifier">reference_wrapper</span><span class="special"><></span></code> which behaves in many respects like an ordinary reference. By contrast, <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> returns a <span class="emphasis"><em>lazy</em></span> reference that you can use in expressions that are executed lazily. That is why we can say <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code>, even though <code class="computeroutput"><span class="identifier">result</span></code> doesn't have an <code class="computeroutput"><span class="keyword">operator</span><span class="special">[]</span></code> that would accept <code class="computeroutput"><span class="identifier">s1</span></code>. </p></td></tr> </table></div> <p> In addition to the sub-match placeholders <code class="computeroutput"><span class="identifier">s1</span></code>, <code class="computeroutput"><span class="identifier">s2</span></code>, etc., you can also use the placeholder <code class="computeroutput"><span class="identifier">_</span></code> within an action to refer back to the string matched by the sub-expression to which the action is attached. For instance, you can use the following regex to match a bunch of digits, interpret them as an integer and assign the result to a local variable: </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="comment">// Here, _ refers back to all the </span><span class="comment">// characters matched by (+_d) </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">];</span> </pre> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_action_execution"></a><h4> <a name="id3123260"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_action_execution">Lazy Action Execution</a> </h4> <p> What does it mean, exactly, to attach an action to part of a regular expression and perform a match? When does the action execute? If the action is part of a repeated sub-expression, does the action execute once or many times? And if the sub-expression initially matches, but ultimately fails because the rest of the regular expression fails to match, is the action executed at all? </p> <p> The answer is that by default, actions are executed <span class="emphasis"><em>lazily</em></span>. When a sub-expression matches a string, its action is placed on a queue, along with the current values of any sub-matches to which the action refers. If the match algorithm must backtrack, actions are popped off the queue as necessary. Only after the entire regex has matched successfully are the actions actually exeucted. They are executed all at once, in the order in which they were added to the queue, as the last step before <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> returns. </p> <p> For example, consider the following regex that increments a counter whenever it finds a digit. </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> <span class="comment">// count the exciting digits, but not the </span><span class="comment">// questionable ones. </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span> <span class="special">==</span> <span class="number">2</span> <span class="special">);</span> </pre> <p> The action <code class="computeroutput"><span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span></code> is queued three times: once for each found digit. But it is only <span class="emphasis"><em>executed</em></span> twice: once for each digit that precedes a <code class="computeroutput"><span class="char">'!'</span></code> character. When the <code class="computeroutput"><span class="char">'?'</span></code> character is encountered, the match algorithm backtracks, removing the final action from the queue. </p> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.immediate_action_execution"></a><h4> <a name="id3123594"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.immediate_action_execution">Immediate Action Execution</a> </h4> <p> When you want semantic actions to execute immediately, you can wrap the sub-expression containing the action in a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/keep.html" title="Function template keep">keep()</a></code></code>. <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> turns off back-tracking for its sub-expression, but it also causes any actions queued by the sub-expression to execute at the end of the <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>. It is as if the sub-expression in the <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> were compiled into an independent regex object, and matching the <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> is like a separate invocation of <code class="computeroutput"><span class="identifier">regex_search</span><span class="special">()</span></code>. It matches characters and executes actions but never backtracks or unwinds. For example, imagine the above example had been written as follows: </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> <span class="comment">// count all the digits. </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">keep</span><span class="special">(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">)</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span> <span class="special">==</span> <span class="number">3</span> <span class="special">);</span> </pre> <p> We have wrapped the sub-expression <code class="computeroutput"><span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span></code> in <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>. Now, whenever this regex matches a digit, the action will be queued and then immediately executed before we try to match a <code class="computeroutput"><span class="char">'!'</span></code> character. In this case, the action executes three times. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Like <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>, actions within <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/before.html" title="Function template before">before()</a></code></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/after.html" title="Function template after">after()</a></code></code> are also executed early when their sub-expressions have matched. </p></td></tr> </table></div> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_functions"></a><h4> <a name="id3124052"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_functions">Lazy Functions</a> </h4> <p> So far, we've seen how to write semantic actions consisting of variables and operators. But what if you want to be able to call a function from a semantic action? Xpressive provides a mechanism to do this. </p> <p> The first step is to define a function object type. Here, for instance, is a function object type that calls <code class="computeroutput"><span class="identifier">push</span><span class="special">()</span></code> on its argument: </p> <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">push_impl</span> <span class="special">{</span> <span class="comment">// Result type, needed for tr1::result_of </span> <span class="keyword">typedef</span> <span class="keyword">void</span> <span class="identifier">result_type</span><span class="special">;</span> <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Sequence</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Value</span><span class="special">></span> <span class="keyword">void</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">Sequence</span> <span class="special">&</span><span class="identifier">seq</span><span class="special">,</span> <span class="identifier">Value</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">val</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> <span class="identifier">seq</span><span class="special">.</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">val</span><span class="special">);</span> <span class="special">}</span> <span class="special">};</span> </pre> <p> The next step is to use xpressive's <code class="computeroutput"><span class="identifier">function</span><span class="special"><></span></code> template to define a function object named <code class="computeroutput"><span class="identifier">push</span></code>: </p> <pre class="programlisting"><span class="comment">// Global "push" function object. </span><span class="identifier">function</span><span class="special"><</span><span class="identifier">push_impl</span><span class="special">>::</span><span class="identifier">type</span> <span class="keyword">const</span> <span class="identifier">push</span> <span class="special">=</span> <span class="special">{{}};</span> </pre> <p> The initialization looks a bit odd, but this is because <code class="computeroutput"><span class="identifier">push</span></code> is being statically initialized. That means it doesn't need to be constructed at runtime. We can use <code class="computeroutput"><span class="identifier">push</span></code> in semantic actions as follows: </p> <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">stack</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ints</span><span class="special">;</span> <span class="comment">// Match digits, cast them to an int </span><span class="comment">// and push it on the stack. </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">ints</span><span class="special">),</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">))];</span> </pre> <p> You'll notice that doing it this way causes member function invocations to look like ordinary function invocations. You can choose to write your semantic action in a different way that makes it look a bit more like a member function call: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">ints</span><span class="special">)->*</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">))];</span> </pre> <p> Xpressive recognizes the use of the <code class="computeroutput"><span class="special">->*</span></code> and treats this expression exactly the same as the one above. </p> <p> When your function object must return a type that depends on its arguments, you can use a <code class="computeroutput"><span class="identifier">result</span><span class="special"><></span></code> member template instead of the <code class="computeroutput"><span class="identifier">result_type</span></code> typedef. Here, for example, is a <code class="computeroutput"><span class="identifier">first</span></code> function object that returns the <code class="computeroutput"><span class="identifier">first</span></code> member of a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><></span></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>: </p> <pre class="programlisting"><span class="comment">// Function object that returns the </span><span class="comment">// first element of a pair. </span><span class="keyword">struct</span> <span class="identifier">first_impl</span> <span class="special">{</span> <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Sig</span><span class="special">></span> <span class="keyword">struct</span> <span class="identifier">result</span> <span class="special">{};</span> <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">This</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">></span> <span class="keyword">struct</span> <span class="identifier">result</span><span class="special"><</span><span class="identifier">This</span><span class="special">(</span><span class="identifier">Pair</span><span class="special">)></span> <span class="special">{</span> <span class="keyword">typedef</span> <span class="keyword">typename</span> <span class="identifier">remove_reference</span><span class="special"><</span><span class="identifier">Pair</span><span class="special">></span> <span class="special">::</span><span class="identifier">type</span><span class="special">::</span><span class="identifier">first_type</span> <span class="identifier">type</span><span class="special">;</span> <span class="special">};</span> <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">></span> <span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">::</span><span class="identifier">first_type</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">Pair</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">p</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> <span class="keyword">return</span> <span class="identifier">p</span><span class="special">.</span><span class="identifier">first</span><span class="special">;</span> <span class="special">}</span> <span class="special">};</span> <span class="comment">// OK, use as first(s1) to get the begin iterator </span><span class="comment">// of the sub-match referred to by s1. </span><span class="identifier">function</span><span class="special"><</span><span class="identifier">first_impl</span><span class="special">>::</span><span class="identifier">type</span> <span class="keyword">const</span> <span class="identifier">first</span> <span class="special">=</span> <span class="special">{{}};</span> </pre> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_local_variables"></a><h4> <a name="id3125132"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_local_variables">Referring to Local Variables</a> </h4> <p> As we've seen in the examples above, we can refer to local variables within an actions using <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code>. Any such variables are held by reference by the regular expression, and care should be taken to avoid letting those references dangle. For instance, in the following code, the reference to <code class="computeroutput"><span class="identifier">i</span></code> is left to dangle when <code class="computeroutput"><span class="identifier">bad_voodoo</span><span class="special">()</span></code> returns: </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">bad_voodoo</span><span class="special">()</span> <span class="special">{</span> <span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> <span class="comment">// ERROR! rex refers by reference to a local </span> <span class="comment">// variable, which will dangle after bad_voodoo() </span> <span class="comment">// returns. </span> <span class="keyword">return</span> <span class="identifier">rex</span><span class="special">;</span> <span class="special">}</span> </pre> <p> When writing semantic actions, it is your responsibility to make sure that all the references do not dangle. One way to do that would be to make the variables shared pointers that are held by the regex by value. </p> <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">good_voodoo</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">)</span> <span class="special">{</span> <span class="comment">// Use val() to hold the shared_ptr by value: </span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> <span class="comment">// OK, rex holds a reference count to the integer. </span> <span class="keyword">return</span> <span class="identifier">rex</span><span class="special">;</span> <span class="special">}</span> </pre> <p> In the above code, we use <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">val</span><span class="special">()</span></code> to hold the shared pointer by value. That's not normally necessary because local variables appearing in actions are held by value by default, but in this case, it is necessary. Had we written the action as <code class="computeroutput"><span class="special">++*</span><span class="identifier">pi</span></code>, it would have executed immediately. That's because <code class="computeroutput"><span class="special">++*</span><span class="identifier">pi</span></code> is not an expression template, but <code class="computeroutput"><span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span></code> is. </p> <p> It can be tedious to wrap all your variables in <code class="computeroutput"><span class="identifier">ref</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">val</span><span class="special">()</span></code> in your semantic actions. Xpressive provides the <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> and <code class="computeroutput"><span class="identifier">value</span><span class="special"><></span></code> templates to make things easier. The following table shows the equivalencies: </p> <div class="table"> <a name="id3125687"></a><p class="title"><b>Table 29.12. reference<> and value<></b></p> <div class="table-contents"><table class="table" summary="reference<> and value<>"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> This ... </p> </th> <th> <p> ... is equivalent to this ... </p> </th> </tr></thead> <tbody> <tr> <td> <p> </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> <td> <p> </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">reference</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ri</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ri</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> </tr> <tr> <td> <p> </p> <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">(</span><span class="keyword">new</span> <span class="keyword">int</span><span class="special">(</span><span class="number">0</span><span class="special">));</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> <td> <p> </p> <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">(</span><span class="keyword">new</span> <span class="keyword">int</span><span class="special">(</span><span class="number">0</span><span class="special">));</span> <span class="identifier">value</span><span class="special"><</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">vpi</span><span class="special">(</span><span class="identifier">pi</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">vpi</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><p> As you can see, when using <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code>, you need to first declare a local variable and then declare a <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> to it. These two steps can be combined into one using <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code>. </p> <div class="table"> <a name="id3126365"></a><p class="title"><b>Table 29.13. local<> vs. reference<></b></p> <div class="table-contents"><table class="table" summary="local<> vs. reference<>"> <colgroup> <col> <col> </colgroup> <thead><tr> <th> <p> This ... </p> </th> <th> <p> ... is equivalent to this ... </p> </th> </tr></thead> <tbody><tr> <td> <p> </p> <pre class="programlisting"><span class="identifier">local</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">i</span><span class="special">(</span><span class="number">0</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">i</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> <td> <p> </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">reference</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ri</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ri</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> <p> </p> </td> </tr></tbody> </table></div> </div> <br class="table-break"><p> We can use <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code> to rewrite the above example as follows: </p> <pre class="programlisting"><span class="identifier">local</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">i</span><span class="special">(</span><span class="number">0</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> <span class="comment">// count the exciting digits, but not the </span><span class="comment">// questionable ones. </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">i</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span><span class="special">.</span><span class="identifier">get</span><span class="special">()</span> <span class="special">==</span> <span class="number">2</span> <span class="special">);</span> </pre> <p> Notice that we use <code class="computeroutput"><span class="identifier">local</span><span class="special"><>::</span><span class="identifier">get</span><span class="special">()</span></code> to access the value of the local variable. Also, beware that <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code> can be used to create a dangling reference, just as <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> can. </p> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_non_local_variables"></a><h4> <a name="id3126957"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_non_local_variables">Referring to Non-Local Variables</a> </h4> <p> In the beginning of this section, we used a regex with a semantic action to parse a string of word/integer pairs and stuff them into a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. That required that the map and the regex be defined together and used before either could go out of scope. What if we wanted to define the regex once and use it to fill lots of different maps? We would rather pass the map into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> algorithm rather than embed a reference to it directly in the regex object. What we can do instead is define a placeholder and use that in the semantic action instead of the map itself. Later, when we call one of the regex algorithms, we can bind the reference to an actual map object. The following code shows how. </p> <pre class="programlisting"><span class="comment">// Define a placeholder for a map object: </span><span class="identifier">placeholder</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">_map</span><span class="special">;</span> <span class="comment">// Match a word and an integer, separated by =>, </span><span class="comment">// and then stuff the result into a std::map<> </span><span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> <span class="special">[</span> <span class="identifier">_map</span><span class="special">[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> <span class="comment">// Match one or more word/integer pairs, separated </span><span class="comment">// by whitespace. </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">pair</span> <span class="special">>></span> <span class="special">*(+</span><span class="identifier">_s</span> <span class="special">>></span> <span class="identifier">pair</span><span class="special">);</span> <span class="comment">// The string to parse </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> <span class="comment">// Here is the actual map to fill in: </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> <span class="comment">// Bind the _map placeholder to the actual map </span><span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">let</span><span class="special">(</span> <span class="identifier">_map</span> <span class="special">=</span> <span class="identifier">result</span> <span class="special">);</span> <span class="comment">// Execute the match and fill in result map </span><span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program displays: </p> <pre class="programlisting">1 23 456 </pre> <p> We use <code class="computeroutput"><span class="identifier">placeholder</span><span class="special"><></span></code> here to define <code class="computeroutput"><span class="identifier">_map</span></code>, which stands in for a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code> variable. We can use the placeholder in the semantic action as if it were a map. Then, we define a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> struct and bind an actual map to the placeholder with "<code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">let</span><span class="special">(</span> <span class="identifier">_map</span> <span class="special">=</span> <span class="identifier">result</span> <span class="special">);</span></code>". The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> call behaves as if the placeholder in the semantic action had been replaced with a reference to <code class="computeroutput"><span class="identifier">result</span></code>. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Placeholders in semantic actions are not <span class="emphasis"><em>actually</em></span> replaced at runtime with references to variables. The regex object is never mutated in any way during any of the regex algorithms, so they are safe to use in multiple threads. </p></td></tr> </table></div> <p> The syntax for late-bound action arguments is a little different if you are using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>. The regex iterators accept an extra constructor parameter for specifying the argument bindings. There is a <code class="computeroutput"><span class="identifier">let</span><span class="special">()</span></code> function that you can use to bind variables to their placeholders. The following code demonstrates how. </p> <pre class="programlisting"><span class="comment">// Define a placeholder for a map object: </span><span class="identifier">placeholder</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">_map</span><span class="special">;</span> <span class="comment">// Match a word and an integer, separated by =>, </span><span class="comment">// and then stuff the result into a std::map<> </span><span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> <span class="special">[</span> <span class="identifier">_map</span><span class="special">[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> <span class="comment">// The string to parse </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> <span class="comment">// Here is the actual map to fill in: </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> <span class="comment">// Create a regex_iterator to find all the matches </span><span class="identifier">sregex_iterator</span> <span class="identifier">it</span><span class="special">(</span><span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">pair</span><span class="special">,</span> <span class="identifier">let</span><span class="special">(</span><span class="identifier">_map</span><span class="special">=</span><span class="identifier">result</span><span class="special">));</span> <span class="identifier">sregex_iterator</span> <span class="identifier">end</span><span class="special">;</span> <span class="comment">// step through all the matches, and fill in </span><span class="comment">// the result map </span><span class="keyword">while</span><span class="special">(</span><span class="identifier">it</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">)</span> <span class="special">++</span><span class="identifier">it</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> </pre> <p> This program displays: </p> <pre class="programlisting">1 23 456 </pre> <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.user_defined_assertions"></a><h3> <a name="id3128897"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.user_defined_assertions">User-Defined Assertions</a> </h3> <p> You are probably already familiar with regular expression <span class="emphasis"><em>assertions</em></span>. In Perl, some examples are the <code class="literal">^</code> and <code class="literal">$</code> assertions, which you can use to match the beginning and end of a string, respectively. Xpressive lets you define your own assertions. A custom assertion is a contition which must be true at a point in the match in order for the match to succeed. You can check a custom assertion with xpressive's <code class="literal"><code class="computeroutput">check()</code></code> function. </p> <p> There are a couple of ways to define a custom assertion. The simplest is to use a function object. Let's say that you want to ensure that a sub-expression matches a sub-string that is either 3 or 6 characters long. The following struct defines such a predicate: </p> <pre class="programlisting"><span class="comment">// A predicate that is true IFF a sub-match is </span><span class="comment">// either 3 or 6 characters long. </span><span class="keyword">struct</span> <span class="identifier">three_or_six</span> <span class="special">{</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">ssub_match</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">sub</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> <span class="keyword">return</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span> <span class="special">==</span> <span class="number">3</span> <span class="special">||</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span> <span class="special">==</span> <span class="number">6</span><span class="special">;</span> <span class="special">}</span> <span class="special">};</span> </pre> <p> You can use this predicate within a regular expression as follows: </p> <pre class="programlisting"><span class="comment">// match words of 3 characters or 6 characters. </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">bow</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">)[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">three_or_six</span><span class="special">())</span> <span class="special">]</span> <span class="special">;</span> </pre> <p> The above regular expression will find whole words that are either 3 or 6 characters long. The <code class="computeroutput"><span class="identifier">three_or_six</span></code> predicate accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> that refers back to the part of the string matched by the sub-expression to which the custom assertion is attached. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> The custom assertion participates in determining whether the match succeeds or fails. Unlike actions, which execute lazily, custom assertions execute immediately while the regex engine is searching for a match. </p></td></tr> </table></div> <p> Custom assertions can also be defined inline using the same syntax as for semantic actions. Below is the same custom assertion written inline: </p> <pre class="programlisting"><span class="comment">// match words of 3 characters or 6 characters. </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">bow</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">)[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">length</span><span class="special">(</span><span class="identifier">_</span><span class="special">)==</span><span class="number">3</span> <span class="special">||</span> <span class="identifier">length</span><span class="special">(</span><span class="identifier">_</span><span class="special">)==</span><span class="number">6</span><span class="special">)</span> <span class="special">]</span> <span class="special">;</span> </pre> <p> In the above, <code class="computeroutput"><span class="identifier">length</span><span class="special">()</span></code> is a lazy function that calls the <code class="computeroutput"><span class="identifier">length</span><span class="special">()</span></code> member function of its argument, and <code class="computeroutput"><span class="identifier">_</span></code> is a placeholder that receives the <code class="computeroutput"><span class="identifier">sub_match</span></code>. </p> <p> Once you get the hang of writing custom assertions inline, they can be very powerful. For example, you can write a regular expression that only matches valid dates (for some suitably liberal definition of the term <span class="quote">“<span class="quote">valid</span>”</span>). </p> <pre class="programlisting"><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">days_per_month</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span><span class="number">31</span><span class="special">,</span> <span class="number">29</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">};</span> <span class="identifier">mark_tag</span> <span class="identifier">month</span><span class="special">(</span><span class="number">1</span><span class="special">),</span> <span class="identifier">day</span><span class="special">(</span><span class="number">2</span><span class="special">);</span> <span class="comment">// find a valid date of the form month/day/year. </span><span class="identifier">sregex</span> <span class="identifier">date</span> <span class="special">=</span> <span class="special">(</span> <span class="comment">// Month must be between 1 and 12 inclusive </span> <span class="special">(</span><span class="identifier">month</span><span class="special">=</span> <span class="identifier">_d</span> <span class="special">>></span> <span class="special">!</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1</span> <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">12</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'/'</span> <span class="comment">// Day must be between 1 and 31 inclusive </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">day</span><span class="special">=</span> <span class="identifier">_d</span> <span class="special">>></span> <span class="special">!</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1</span> <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">31</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'/'</span> <span class="comment">// Only consider years between 1970 and 2038 </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1970</span> <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">2038</span><span class="special">)</span> <span class="special">]</span> <span class="special">)</span> <span class="comment">// Ensure the month actually has that many days! </span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">days_per_month</span><span class="special">)[</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">month</span><span class="special">)-</span><span class="number">1</span><span class="special">]</span> <span class="special">>=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">day</span><span class="special">)</span> <span class="special">)</span> <span class="special">]</span> <span class="special">;</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"99/99/9999 2/30/2006 2/28/2006"</span><span class="special">);</span> <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">date</span><span class="special">))</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> <span class="special">}</span> </pre> <p> The above program prints out the following: </p> <pre class="programlisting">2/28/2006 </pre> <p> Notice how the inline custom assertions are used to range-check the values for the month, day and year. The regular expression doesn't match <code class="computeroutput"><span class="string">"99/99/9999"</span></code> or <code class="computeroutput"><span class="string">"2/30/2006"</span></code> because they are not valid dates. (There is no 99th month, and February doesn't have 30 days.) </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes" title="Symbol Tables and Attributes">Symbol Tables and Attributes</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.overview"></a><h3> <a name="id3130543"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.overview">Overview</a> </h3> <p> Symbol tables can be built into xpressive regular expressions with just a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. The map keys are the strings to be matched and the map values are the data to be returned to your semantic action. Xpressive attributes, named <code class="computeroutput"><span class="identifier">a1</span></code>, <code class="computeroutput"><span class="identifier">a2</span></code>, through <code class="computeroutput"><span class="identifier">a9</span></code>, hold the value corresponding to a matching key so that it can be used in a semantic action. A default value can be specified for an attribute if a symbol is not found. </p> <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.symbol_tables"></a><h3> <a name="id3130625"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.symbol_tables">Symbol Tables</a> </h3> <p> An xpressive symbol table is just a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>, where the key is a string type and the value can be anything. For example, the following regular expression matches a key from map1 and assigns the corresponding value to the attribute <code class="computeroutput"><span class="identifier">a1</span></code>. Then, in the semantic action, it assigns the value stored in attribute <code class="computeroutput"><span class="identifier">a1</span></code> to an integer result. </p> <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">result</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">map1</span><span class="special">;</span> <span class="comment">// ... (fill the map) </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span> <span class="identifier">a1</span> <span class="special">=</span> <span class="identifier">map1</span> <span class="special">)</span> <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span> <span class="special">=</span> <span class="identifier">a1</span> <span class="special">];</span> </pre> <p> Consider the following example code, which translates number names into integers. It is described below. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">number_map</span><span class="special">;</span> <span class="identifier">number_map</span><span class="special">[</span><span class="string">"one"</span><span class="special">]</span> <span class="special">=</span> <span class="number">1</span><span class="special">;</span> <span class="identifier">number_map</span><span class="special">[</span><span class="string">"two"</span><span class="special">]</span> <span class="special">=</span> <span class="number">2</span><span class="special">;</span> <span class="identifier">number_map</span><span class="special">[</span><span class="string">"three"</span><span class="special">]</span> <span class="special">=</span> <span class="number">3</span><span class="special">;</span> <span class="comment">// Match a string from number_map </span> <span class="comment">// and store the integer value in 'result' </span> <span class="comment">// if not found, store -1 in 'result' </span> <span class="keyword">int</span> <span class="identifier">result</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">cregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">((</span><span class="identifier">a1</span> <span class="special">=</span> <span class="identifier">number_map</span> <span class="special">)</span> <span class="special">|</span> <span class="special">*</span><span class="identifier">_</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">a1</span> <span class="special">|</span> <span class="special">-</span><span class="number">1</span><span class="special">)];</span> <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"three"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"two"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"stuff"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program prints the following: </p> <pre class="programlisting">3 2 -1 </pre> <p> First the program builds a number map, with number names as string keys and the corresponding integers as values. Then it constructs a static regular expression using an attribute <code class="computeroutput"><span class="identifier">a1</span></code> to represent the result of the symbol table lookup. In the semantic action, the attribute is assigned to an integer variable <code class="computeroutput"><span class="identifier">result</span></code>. If the symbol was not found, a default value of <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code> is assigned to <code class="computeroutput"><span class="identifier">result</span></code>. A wildcard, <code class="computeroutput"><span class="special">*</span><span class="identifier">_</span></code>, makes sure the regex matches even if the symbol is not found. </p> <p> A more complete version of this example can be found in <code class="literal">libs/xpressive/example/numbers.cpp</code><sup>[<a name="id3131674" href="#ftn.id3131674" class="footnote">5</a>]</sup>. It translates number names up to "nine hundred ninety nine million nine hundred ninety nine thousand nine hundred ninety nine" along with some special number names like "dozen". </p> <p> Symbol table matches are case sensitive by default, but they can be made case-insensitive by enclosing the expression in <code class="computeroutput"><span class="identifier">icase</span><span class="special">()</span></code>. </p> <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.attributes"></a><h3> <a name="id3131715"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.attributes">Attributes</a> </h3> <p> Up to nine attributes can be used in a regular expression. They are named <code class="computeroutput"><span class="identifier">a1</span></code>, <code class="computeroutput"><span class="identifier">a2</span></code>, ..., <code class="computeroutput"><span class="identifier">a9</span></code> in the <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span></code> namespace. The attribute type is the same as the second component of the map that is assigned to it. A default value for an attribute can be specified in a semantic action with the syntax <code class="computeroutput"><span class="special">(</span><span class="identifier">a1</span> <span class="special">|</span> <em class="replaceable"><code>default-value</code></em><span class="special">)</span></code>. </p> <p> Attributes are properly scoped, so you can do crazy things like: <code class="computeroutput"><span class="special">(</span> <span class="special">(</span><span class="identifier">a1</span><span class="special">=</span><span class="identifier">sym1</span><span class="special">)</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">a1</span><span class="special">=</span><span class="identifier">sym2</span><span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">x</span><span class="special">)=</span><span class="identifier">a1</span><span class="special">]</span> <span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">y</span><span class="special">)=</span><span class="identifier">a1</span><span class="special">]</span></code>. The inner semantic action sees the inner <code class="computeroutput"><span class="identifier">a1</span></code>, and the outer semantic action sees the outer one. They can even have different types. </p> <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> Xpressive builds a hidden ternary search trie from the map so it can search quickly. If BOOST_DISABLE_THREADS is defined, the hidden ternary search trie "self adjusts", so after each search it restructures itself to improve the efficiency of future searches based on the frequency of previous searches. </p></td></tr> </table></div> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits" title="Localization and Regex Traits">Localization and Regex Traits</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.overview"></a><h3> <a name="id3131979"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.overview">Overview</a> </h3> <p> Matching a regular expression against a string often requires locale-dependent information. For example, how are case-insensitive comparisons performed? The locale-sensitive behavior is captured in a traits class. xpressive provides three traits class templates: <code class="computeroutput"><span class="identifier">cpp_regex_traits</span><span class="special"><></span></code>, <code class="computeroutput"><span class="identifier">c_regex_traits</span><span class="special"><></span></code> and <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code>. The first wraps a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, the second wraps the global C locale, and the third is a stub traits type for use when searching non-character data. All traits templates conform to the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Regex Traits Concept</a>. </p> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.setting_the_default_regex_trait"></a><h3> <a name="id3132082"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.setting_the_default_regex_trait">Setting the Default Regex Trait</a> </h3> <p> By default, xpressive uses <code class="computeroutput"><span class="identifier">cpp_regex_traits</span><span class="special"><></span></code> for all patterns. This causes all regex objects to use the global <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. If you compile with <code class="computeroutput"><span class="identifier">BOOST_XPRESSIVE_USE_C_TRAITS</span></code> defined, then xpressive will use <code class="computeroutput"><span class="identifier">c_regex_traits</span><span class="special"><></span></code> by default. </p> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_dynamic_regexes"></a><h3> <a name="id3132169"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_dynamic_regexes">Using Custom Traits with Dynamic Regexes</a> </h3> <p> To create a dynamic regex that uses a custom traits object, you must use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. The basic steps are shown in the following example: </p> <pre class="programlisting"><span class="comment">// Declare a regex_compiler that uses the global C locale </span><span class="identifier">regex_compiler</span><span class="special"><</span><span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*,</span> <span class="identifier">c_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="special">></span> <span class="identifier">crxcomp</span><span class="special">;</span> <span class="identifier">cregex</span> <span class="identifier">crx</span> <span class="special">=</span> <span class="identifier">crxcomp</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+"</span> <span class="special">);</span> <span class="comment">// Declare a regex_compiler that uses a custom std::locale </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">loc</span> <span class="special">=</span> <span class="comment">/* ... create a locale here ... */</span><span class="special">;</span> <span class="identifier">regex_compiler</span><span class="special"><</span><span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*,</span> <span class="identifier">cpp_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="special">></span> <span class="identifier">cpprxcomp</span><span class="special">(</span><span class="identifier">loc</span><span class="special">);</span> <span class="identifier">cregex</span> <span class="identifier">cpprx</span> <span class="special">=</span> <span class="identifier">cpprxcomp</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+"</span> <span class="special">);</span> </pre> <p> The <code class="computeroutput"><span class="identifier">regex_compiler</span></code> objects act as regex factories. Once they have been imbued with a locale, every regex object they create will use that locale. </p> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_static_regexes"></a><h3> <a name="id3132502"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_static_regexes">Using Custom Traits with Static Regexes</a> </h3> <p> If you want a particular static regex to use a different set of traits, you can use the special <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> pattern modifier. For instance: </p> <pre class="programlisting"><span class="comment">// Define a regex that uses the global C locale </span><span class="identifier">c_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="identifier">ctraits</span><span class="special">;</span> <span class="identifier">sregex</span> <span class="identifier">crx</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">ctraits</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> <span class="comment">// Define a regex that uses a customized std::locale </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">loc</span> <span class="special">=</span> <span class="comment">/* ... create a locale here ... */</span><span class="special">;</span> <span class="identifier">cpp_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="identifier">cpptraits</span><span class="special">(</span><span class="identifier">loc</span><span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">cpprx1</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">cpptraits</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> <span class="comment">// A shorthand for above </span><span class="identifier">sregex</span> <span class="identifier">cpprx2</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> </pre> <p> The <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> pattern modifier must wrap the entire pattern. It is an error to <code class="computeroutput"><span class="identifier">imbue</span></code> only part of a static regex. For example: </p> <pre class="programlisting"><span class="comment">// ERROR! Cannot imbue() only part of a regex </span><span class="identifier">sregex</span> <span class="identifier">error</span> <span class="special">=</span> <span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)(</span> <span class="identifier">_w</span> <span class="special">);</span> </pre> <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.searching_non_character_data_with__literal_null_regex_traits__literal_"></a><h3> <a name="id3132915"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.searching_non_character_data_with__literal_null_regex_traits__literal_">Searching Non-Character Data With <code class="literal">null_regex_traits</code></a> </h3> <p> With xpressive static regexes, you are not limitted to searching for patterns in character sequences. You can search for patterns in raw bytes, integers, or anything that conforms to the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.chart_requirements">Char Concept</a>. The <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code> makes it simple. It is a stub implementation of the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Regex Traits Concept</a>. It recognizes no character classes and does no case-sensitive mappings. </p> <p> For example, with <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code>, you can write a static regex to find a pattern in a sequence of integers as follows: </p> <pre class="programlisting"><span class="comment">// some integral data to search </span><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">data</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span><span class="number">0</span><span class="special">,</span> <span class="number">1</span><span class="special">,</span> <span class="number">2</span><span class="special">,</span> <span class="number">3</span><span class="special">,</span> <span class="number">4</span><span class="special">,</span> <span class="number">5</span><span class="special">,</span> <span class="number">6</span><span class="special">};</span> <span class="comment">// create a null_regex_traits<> object for searching integers ... </span><span class="identifier">null_regex_traits</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">nul</span><span class="special">;</span> <span class="comment">// imbue a regex object with the null_regex_traits ... </span><span class="identifier">basic_regex</span><span class="special"><</span><span class="keyword">int</span> <span class="keyword">const</span> <span class="special">*></span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">nul</span><span class="special">)(</span><span class="number">1</span> <span class="special">>></span> <span class="special">+((</span><span class="identifier">set</span><span class="special">=</span> <span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">)</span> <span class="special">|</span> <span class="number">4</span><span class="special">)</span> <span class="special">>></span> <span class="number">5</span><span class="special">);</span> <span class="identifier">match_results</span><span class="special"><</span><span class="keyword">int</span> <span class="keyword">const</span> <span class="special">*></span> <span class="identifier">what</span><span class="special">;</span> <span class="comment">// search for the pattern in the array of integers ... </span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">data</span><span class="special">,</span> <span class="identifier">data</span> <span class="special">+</span> <span class="number">7</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">matched</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(*</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">first</span> <span class="special">==</span> <span class="number">1</span><span class="special">);</span> <span class="identifier">assert</span><span class="special">(*</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">second</span> <span class="special">==</span> <span class="number">6</span><span class="special">);</span> </pre> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.tips_n_tricks"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks" title="Tips 'N Tricks"> Tips 'N Tricks</a> </h3></div></div></div> <p> Squeeze the most performance out of xpressive with these tips and tricks. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.compile_patterns_once_and_reuse_them"></a><h3> <a name="id3133533"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.compile_patterns_once_and_reuse_them">Compile Patterns Once And Reuse Them</a> </h3> <p> Compiling a regex (dynamic or static) is <span class="emphasis"><em>far</em></span> more expensive than executing a match or search. If you have the option, prefer to compile a pattern into a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> object once and reuse it rather than recreating it over and over. </p> <p> Since <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> objects are not mutated by any of the regex algorithms, they are completely thread-safe once their initialization (and that of any grammars of which they are members) completes. The easiest way to reuse your patterns is to simply make your <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> objects "static const". </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.reuse__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__objects"></a><h3> <a name="id3133623"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.reuse__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__objects">Reuse <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> Objects</a> </h3> <p> The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object caches dynamically allocated memory. For this reason, it is far better to reuse the same <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object if you have to do many regex searches. </p> <p> Caveat: <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> objects are not thread-safe, so don't go wild reusing them across threads. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_take_a__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__object"></a><h3> <a name="id3133721"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_take_a__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__object">Prefer Algorithms That Take A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> Object</a> </h3> <p> This is a corollary to the previous tip. If you are doing multiple searches, you should prefer the regex algorithms that accept a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object over the ones that don't, and you should reuse the same <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object each time. If you don't provide a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> object, a temporary one will be created for you and discarded when the algorithm returns. Any memory cached in the object will be deallocated and will have to be reallocated the next time. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_accept_iterator_ranges_over_null_terminated_strings"></a><h3> <a name="id3133817"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_accept_iterator_ranges_over_null_terminated_strings">Prefer Algorithms That Accept Iterator Ranges Over Null-Terminated Strings</a> </h3> <p> xpressive provides overloads of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> algorithms that operate on C-style null-terminated strings. You should prefer the overloads that take iterator ranges. When you pass a null-terminated string to a regex algorithm, the end iterator is calculated immediately by calling <code class="computeroutput"><span class="identifier">strlen</span></code>. If you already know the length of the string, you can avoid this overhead by calling the regex algorithms with a <code class="computeroutput"><span class="special">[</span><span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">)</span></code> pair. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.use_static_regexes"></a><h3> <a name="id3133915"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.use_static_regexes">Use Static Regexes</a> </h3> <p> On average, static regexes execute about 10 to 15% faster than their dynamic counterparts. It's worth familiarizing yourself with the static regex dialect. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.understand__literal_syntax_option_type__optimize__literal_"></a><h3> <a name="id3133947"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.understand__literal_syntax_option_type__optimize__literal_">Understand <code class="literal">syntax_option_type::optimize</code></a> </h3> <p> The <code class="computeroutput"><span class="identifier">optimize</span></code> flag tells the regex compiler to spend some extra time analyzing the pattern. It can cause some patterns to execute faster, but it increases the time to compile the pattern, and often increases the amount of memory consumed by the pattern. If you plan to reuse your pattern, <code class="computeroutput"><span class="identifier">optimize</span></code> is usually a win. If you will only use the pattern once, don't use <code class="computeroutput"><span class="identifier">optimize</span></code>. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.common_pitfalls"></a><h2> <a name="id3133954"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.common_pitfalls">Common Pitfalls</a> </h2> <p> Keep the following tips in mind to avoid stepping in potholes with xpressive. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.create_grammars_on_a_single_thread"></a><h3> <a name="id3134038"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.create_grammars_on_a_single_thread">Create Grammars On A Single Thread</a> </h3> <p> With static regexes, you can create grammars by nesting regexes inside one another. When compiling the outer regex, both the outer and inner regex objects, and all the regex objects to which they refer either directly or indirectly, are modified. For this reason, it's dangerous for global regex objects to participate in grammars. It's best to build regex grammars from a single thread. Once built, the resulting regex grammar can be executed from multiple threads without problems. </p> <a name="boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers"></a><h3> <a name="id3134065"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers">Beware Nested Quantifiers</a> </h3> <p> This is a pitfall common to many regular expression engines. Some patterns can cause exponentially bad performance. Often these patterns involve one quantified term nested withing another quantifier, such as <code class="computeroutput"><span class="string">"(a*)*"</span></code>, although in many cases, the problem is harder to spot. Beware of patterns that have nested quantifiers. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.concepts"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts" title="Concepts">Concepts</a> </h3></div></div></div> <a name="boost_xpressive.user_s_guide.concepts.chart_requirements"></a><h3> <a name="id3134122"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.chart_requirements">CharT requirements</a> </h3> <p> If type <code class="computeroutput"><span class="identifier">BidiIterT</span></code> is used as a template argument to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>, then <code class="computeroutput"><span class="identifier">CharT</span></code> is <code class="computeroutput"><span class="identifier">iterator_traits</span><span class="special"><</span><span class="identifier">BidiIterT</span><span class="special">>::</span><span class="identifier">value_type</span></code>. Type <code class="computeroutput"><span class="identifier">CharT</span></code> must have a trivial default constructor, copy constructor, assignment operator, and destructor. In addition the following requirements must be met for objects; <code class="computeroutput"><span class="identifier">c</span></code> of type <code class="computeroutput"><span class="identifier">CharT</span></code>, <code class="computeroutput"><span class="identifier">c1</span></code> and <code class="computeroutput"><span class="identifier">c2</span></code> of type <code class="computeroutput"><span class="identifier">CharT</span> <span class="keyword">const</span></code>, and <code class="computeroutput"><span class="identifier">i</span></code> of type <code class="computeroutput"><span class="keyword">int</span></code>: </p> <div class="table"> <a name="id3134280"></a><p class="title"><b>Table 29.14. CharT Requirements</b></p> <div class="table-contents"><table class="table" summary="CharT Requirements"> <colgroup> <col> <col> <col> </colgroup> <thead><tr> <th> <p> <span class="bold"><strong>Expression</strong></span> </p> </th> <th> <p> <span class="bold"><strong>Return type</strong></span> </p> </th> <th> <p> <span class="bold"><strong>Assertion / Note / Pre- / Post-condition</strong></span> </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> Default constructor (must be trivial). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span><span class="special">(</span><span class="identifier">c1</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> Copy constructor (must be trivial). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special">=</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> Assignment operator (must be trivial). </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special">==</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> has the same value as <code class="computeroutput"><span class="identifier">c2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special">!=</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> and <code class="computeroutput"><span class="identifier">c2</span></code> are not equal. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special"><</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if the value of <code class="computeroutput"><span class="identifier">c1</span></code> is less than <code class="computeroutput"><span class="identifier">c2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special">></span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if the value of <code class="computeroutput"><span class="identifier">c1</span></code> is greater than <code class="computeroutput"><span class="identifier">c2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special"><=</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> is less than or equal to <code class="computeroutput"><span class="identifier">c2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">c1</span> <span class="special">>=</span> <span class="identifier">c2</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> is greater than or equal to <code class="computeroutput"><span class="identifier">c2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">intmax_t</span> <span class="identifier">i</span> <span class="special">=</span> <span class="identifier">c1</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">int</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> must be convertible to an integral type. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> must be constructable from an integral type. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.concepts.traits_requirements"></a><h3> <a name="id3135108"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Traits Requirements</a> </h3> <p> In the following table <code class="computeroutput"><span class="identifier">X</span></code> denotes a traits class defining types and functions for the character container type <code class="computeroutput"><span class="identifier">CharT</span></code>; <code class="computeroutput"><span class="identifier">u</span></code> is an object of type <code class="computeroutput"><span class="identifier">X</span></code>; <code class="computeroutput"><span class="identifier">v</span></code> is an object of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">X</span></code>; <code class="computeroutput"><span class="identifier">p</span></code> is a value of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span><span class="special">*</span></code>; <code class="computeroutput"><span class="identifier">I1</span></code> and <code class="computeroutput"><span class="identifier">I2</span></code> are <code class="computeroutput"><span class="identifier">Input</span> <span class="identifier">Iterators</span></code>; <code class="computeroutput"><span class="identifier">c</span></code> is a value of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span></code>; <code class="computeroutput"><span class="identifier">s</span></code> is an object of type <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code>; <code class="computeroutput"><span class="identifier">cs</span></code> is an object of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code>; <code class="computeroutput"><span class="identifier">b</span></code> is a value of type <code class="computeroutput"><span class="keyword">bool</span></code>; <code class="computeroutput"><span class="identifier">i</span></code> is a value of type <code class="computeroutput"><span class="keyword">int</span></code>; <code class="computeroutput"><span class="identifier">F1</span></code> and <code class="computeroutput"><span class="identifier">F2</span></code> are values of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span><span class="special">*</span></code>; <code class="computeroutput"><span class="identifier">loc</span></code> is an object of type <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code>; and <code class="computeroutput"><span class="identifier">ch</span></code> is an object of <code class="computeroutput"><span class="keyword">const</span> <span class="keyword">char</span></code>. </p> <div class="table"> <a name="id3135453"></a><p class="title"><b>Table 29.15. Traits Requirements</b></p> <div class="table-contents"><table class="table" summary="Traits Requirements"> <colgroup> <col> <col> <col> </colgroup> <thead><tr> <th> <p> <span class="bold"><strong>Expression</strong></span> </p> </th> <th> <p> <span class="bold"><strong>Return type</strong></span> </p> </th> <th> <p> <span class="bold"><strong>Assertion / Note<br> Pre / Post condition</strong></span> </p> </th> </tr></thead> <tbody> <tr> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> The character container type used in the implementation of class template <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><</span><span class="identifier">CharT</span><span class="special">></span></code> or <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special"><</span><span class="identifier">CharT</span><span class="special">></span></code> </p> </td> <td> <p> </p> <p> </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> </p> </td> <td> <p> <span class="emphasis"><em>Implementation defined</em></span> </p> </td> <td> <p> A copy constructible type that represents the locale used by the traits class. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_class_type</span></code> </p> </td> <td> <p> <span class="emphasis"><em>Implementation defined</em></span> </p> </td> <td> <p> A bitmask type representing a particular character classification. Multiple values of this type can be bitwise-or'ed together to obtain a new valid value. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">hash</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">unsigned</span> <span class="keyword">char</span></code> </p> </td> <td> <p> Yields a value between <code class="computeroutput"><span class="number">0</span></code> and <code class="computeroutput"><span class="identifier">UCHAR_MAX</span></code> inclusive. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">widen</span><span class="special">(</span><span class="identifier">ch</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">CharT</span></code> </p> </td> <td> <p> Widens the specified <code class="computeroutput"><span class="keyword">char</span></code> and returns the resulting <code class="computeroutput"><span class="identifier">CharT</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">in_range</span><span class="special">(</span><span class="identifier">r1</span><span class="special">,</span> <span class="identifier">r2</span><span class="special">,</span> <span class="identifier">c</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> For any characters <code class="computeroutput"><span class="identifier">r1</span></code> and <code class="computeroutput"><span class="identifier">r2</span></code>, returns <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> <span class="identifier">c</span> <span class="special">&&</span> <span class="identifier">c</span> <span class="special"><=</span> <span class="identifier">r2</span></code>. Requires that <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> <span class="identifier">r2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">in_range_nocase</span><span class="special">(</span><span class="identifier">r1</span><span class="special">,</span> <span class="identifier">r2</span><span class="special">,</span> <span class="identifier">c</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> For characters <code class="computeroutput"><span class="identifier">r1</span></code> and <code class="computeroutput"><span class="identifier">r2</span></code>, returns <code class="computeroutput"><span class="keyword">true</span></code> if there is some character <code class="computeroutput"><span class="identifier">d</span></code> for which <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">d</span><span class="special">)</span> <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> and <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> <span class="identifier">d</span> <span class="special">&&</span> <span class="identifier">d</span> <span class="special"><=</span> <span class="identifier">r2</span></code>. Requires that <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> <span class="identifier">r2</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> </p> </td> <td> <p> Returns a character such that for any character <code class="computeroutput"><span class="identifier">d</span></code> that is to be considered equivalent to <code class="computeroutput"><span class="identifier">c</span></code> then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span> <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">d</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> </p> </td> <td> <p> For all characters <code class="computeroutput"><span class="identifier">C</span></code> that are to be considered equivalent to <code class="computeroutput"><span class="identifier">c</span></code> when comparisons are to be performed without regard to case, then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span> <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">C</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> </p> </td> <td> <p> Returns a sort key for the character sequence designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> such that if the character sequence <code class="computeroutput"><span class="special">[</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span></code> sorts before the character sequence <code class="computeroutput"><span class="special">[</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code> then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span> <span class="special"><</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> </p> </td> <td> <p> Returns a sort key for the character sequence designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> such that if the character sequence <code class="computeroutput"><span class="special">[</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span></code> sorts before the character sequence <code class="computeroutput"><span class="special">[</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code> when character case is not considered then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span> <span class="special"><</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_classname</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_class_type</span></code> </p> </td> <td> <p> Converts the character sequence designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span><span class="identifier">F2</span><span class="special">)</span></code> into a bitmask type that can subsequently be passed to <code class="computeroutput"><span class="identifier">isctype</span></code>. Values returned from <code class="computeroutput"><span class="identifier">lookup_classname</span></code> can be safely bitwise or'ed together. Returns <code class="computeroutput"><span class="number">0</span></code> if the character sequence is not the name of a character class recognized by <code class="computeroutput"><span class="identifier">X</span></code>. The value returned shall be independent of the case of the characters in the sequence. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_collatename</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> </p> </td> <td> <p> Returns a sequence of characters that represents the collating element consisting of the character sequence designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code>. Returns an empty string if the character sequence is not a valid collating element. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">isctype</span><span class="special">(</span><span class="identifier">c</span><span class="special">,</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_classname</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">))</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">bool</span></code> </p> </td> <td> <p> Returns <code class="computeroutput"><span class="keyword">true</span></code> if character <code class="computeroutput"><span class="identifier">c</span></code> is a member of the character class designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code>, <code class="computeroutput"><span class="keyword">false</span></code> otherwise. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">value</span><span class="special">(</span><span class="identifier">c</span><span class="special">,</span> <span class="identifier">i</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="keyword">int</span></code> </p> </td> <td> <p> Returns the value represented by the digit <code class="computeroutput"><span class="identifier">c</span></code> in base <code class="computeroutput"><span class="identifier">i</span></code> if the character <code class="computeroutput"><span class="identifier">c</span></code> is a valid digit in base <code class="computeroutput"><span class="identifier">i</span></code>; otherwise returns <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code>.<br> [Note: the value of <code class="computeroutput"><span class="identifier">i</span></code> will only be <code class="computeroutput"><span class="number">8</span></code>, <code class="computeroutput"><span class="number">10</span></code>, or <code class="computeroutput"><span class="number">16</span></code>. -end note] </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">u</span><span class="special">.</span><span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> </p> </td> <td> <p> Imbues <code class="computeroutput"><span class="identifier">u</span></code> with the locale <code class="computeroutput"><span class="identifier">loc</span></code>, returns the previous locale used by <code class="computeroutput"><span class="identifier">u</span></code>. </p> </td> </tr> <tr> <td> <p> <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">getloc</span><span class="special">()</span></code> </p> </td> <td> <p> <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> </p> </td> <td> <p> Returns the current locale used by <code class="computeroutput"><span class="identifier">v</span></code>. </p> </td> </tr> </tbody> </table></div> </div> <br class="table-break"><a name="boost_xpressive.user_s_guide.concepts.acknowledgements"></a><h3> <a name="id3137917"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.acknowledgements">Acknowledgements</a> </h3> <p> This section is adapted from the equivalent page in the <a href="../../../libs/regex" target="_top">Boost.Regex</a> documentation and from the <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm" target="_top">proposal</a> to add regular expressions to the Standard Library. </p> </div> <div class="section"> <div class="titlepage"><div><div><h3 class="title"> <a name="boost_xpressive.user_s_guide.examples"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">Examples</a> </h3></div></div></div> <p> Below you can find six complete sample programs. <br> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex"></a><h5> <a name="id3137985"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">See if a whole string matches a regex</a> </h5> <p> This is the example from the Introduction. It is reproduced here for your convenience. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">hello</span><span class="special">(</span> <span class="string">"hello world!"</span> <span class="special">);</span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">hello</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// first capture </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">2</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// second capture </span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">hello world! hello world </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex"></a><h5> <a name="id3138521"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">See if a string contains a sub-string that matches a regex</a> </h5> <p> Notice in this example how we use custom <code class="computeroutput"><span class="identifier">mark_tag</span></code>s to make the pattern more readable. We can use the <code class="computeroutput"><span class="identifier">mark_tag</span></code>s later to index into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*</span><span class="identifier">str</span> <span class="special">=</span> <span class="string">"I was born on 5/30/1973 at 7am."</span><span class="special">;</span> <span class="comment">// define some custom mark_tags with names more meaningful than s1, s2, etc. </span> <span class="identifier">mark_tag</span> <span class="identifier">day</span><span class="special">(</span><span class="number">1</span><span class="special">),</span> <span class="identifier">month</span><span class="special">(</span><span class="number">2</span><span class="special">),</span> <span class="identifier">year</span><span class="special">(</span><span class="number">3</span><span class="special">),</span> <span class="identifier">delim</span><span class="special">(</span><span class="number">4</span><span class="special">);</span> <span class="comment">// this regex finds a date </span> <span class="identifier">cregex</span> <span class="identifier">date</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">month</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span><span class="special">))</span> <span class="comment">// find the month ... </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">delim</span><span class="special">=</span> <span class="special">(</span><span class="identifier">set</span><span class="special">=</span> <span class="char">'/'</span><span class="special">,</span><span class="char">'-'</span><span class="special">))</span> <span class="comment">// followed by a delimiter ... </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">day</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">delim</span> <span class="comment">// and a day followed by the same delimiter ... </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">year</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">));</span> <span class="comment">// and the year. </span> <span class="identifier">cmatch</span> <span class="identifier">what</span><span class="special">;</span> <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_search</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">date</span> <span class="special">)</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">day</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the day </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">month</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the month </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">year</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the year </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">delim</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the delimiter </span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">5/30/1973 30 5 1973 / </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex"></a><h5> <a name="id3139532"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">Replace all sub-strings that match a regex</a> </h5> <p> The following program finds dates in a string and marks them up with pseudo-HTML. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"I was born on 5/30/1973 at 7am."</span> <span class="special">);</span> <span class="comment">// essentially the same regex as in the previous example, but using a dynamic regex </span> <span class="identifier">sregex</span> <span class="identifier">date</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\d{1,2})([/-])(\\d{1,2})\\2((?:\\d{2}){1,2})"</span> <span class="special">);</span> <span class="comment">// As in Perl, $& is a reference to the sub-string that matched the regex </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">format</span><span class="special">(</span> <span class="string">"<date>$&</date>"</span> <span class="special">);</span> <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">date</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">I was born on <date>5/30/1973</date> at 7am. </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time"></a><h5> <a name="id3139954"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time">Find all the sub-strings that match a regex and step through them one at a time</a> </h5> <p> The following program finds the words in a wide-character string. It uses <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code>. Notice that dereferencing a <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code> yields a <code class="computeroutput"><span class="identifier">wsmatch</span></code> object. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">wstring</span> <span class="identifier">str</span><span class="special">(</span> <span class="identifier">L</span><span class="string">"This is his face."</span> <span class="special">);</span> <span class="comment">// find a whole word </span> <span class="identifier">wsregex</span> <span class="identifier">token</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">alnum</span><span class="special">;</span> <span class="identifier">wsregex_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">token</span> <span class="special">);</span> <span class="identifier">wsregex_iterator</span> <span class="identifier">end</span><span class="special">;</span> <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">wsmatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span> <span class="special">=</span> <span class="special">*</span><span class="identifier">cur</span><span class="special">;</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">wcout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">L</span><span class="char">'\n'</span><span class="special">;</span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">This is his face </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex"></a><h5> <a name="id3140493"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex">Split a string into tokens that each match a regex</a> </h5> <p> The following program finds race times in a string and displays first the minutes and then the seconds. It uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"Eric: 4:40, Karl: 3:35, Francesca: 2:32"</span> <span class="special">);</span> <span class="comment">// find a race time </span> <span class="identifier">sregex</span> <span class="identifier">time</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\d):(\\d\\d)"</span> <span class="special">);</span> <span class="comment">// for each match, the token iterator should first take the value of </span> <span class="comment">// the first marked sub-expression followed by the value of the second </span> <span class="comment">// marked sub-expression </span> <span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">subs</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">1</span><span class="special">,</span> <span class="number">2</span> <span class="special">};</span> <span class="identifier">sregex_token_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">time</span><span class="special">,</span> <span class="identifier">subs</span> <span class="special">);</span> <span class="identifier">sregex_token_iterator</span> <span class="identifier">end</span><span class="special">;</span> <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="special">*</span><span class="identifier">cur</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="special">}</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">4 40 3 35 2 32 </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter"></a><h5> <a name="id3141056"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter">Split a string using a regex as a delimiter</a> </h5> <p> The following program takes some text that has been marked up with html and strips out the mark-up. It uses a regex that matches an HTML tag and a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> that returns the parts of the string that do <span class="emphasis"><em>not</em></span> match the regex. </p> <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country."</span> <span class="special">);</span> <span class="comment">// find a HTML tag </span> <span class="identifier">sregex</span> <span class="identifier">html</span> <span class="special">=</span> <span class="char">'<'</span> <span class="special">>></span> <span class="identifier">optional</span><span class="special">(</span><span class="char">'/'</span><span class="special">)</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="char">'>'</span><span class="special">;</span> <span class="comment">// the -1 below directs the token iterator to display the parts of </span> <span class="comment">// the string that did NOT match the regular expression. </span> <span class="identifier">sregex_token_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">html</span><span class="special">,</span> <span class="special">-</span><span class="number">1</span> <span class="special">);</span> <span class="identifier">sregex_token_iterator</span> <span class="identifier">end</span><span class="special">;</span> <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> <span class="special">{</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'{'</span> <span class="special"><<</span> <span class="special">*</span><span class="identifier">cur</span> <span class="special"><<</span> <span class="char">'}'</span><span class="special">;</span> <span class="special">}</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> <span class="special">}</span> </pre> <p> This program outputs the following: </p> <pre class="programlisting">{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.} </pre> <p> <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> <p></p> <a name="boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results"></a><h5> <a name="id3141644"></a> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results">Display a tree of nested results</a> </h5> <p> Here is a helper class to demonstrate how you might display a tree of nested results: </p> <pre class="programlisting"><span class="comment">// Displays nested results to std::cout with indenting </span><span class="keyword">struct</span> <span class="identifier">output_nested_results</span> <span class="special">{</span> <span class="keyword">int</span> <span class="identifier">tabs_</span><span class="special">;</span> <span class="identifier">output_nested_results</span><span class="special">(</span> <span class="keyword">int</span> <span class="identifier">tabs</span> <span class="special">=</span> <span class="number">0</span> <span class="special">)</span> <span class="special">:</span> <span class="identifier">tabs_</span><span class="special">(</span> <span class="identifier">tabs</span> <span class="special">)</span> <span class="special">{</span> <span class="special">}</span> <span class="keyword">template</span><span class="special"><</span> <span class="keyword">typename</span> <span class="identifier">BidiIterT</span> <span class="special">></span> <span class="keyword">void</span> <span class="keyword">operator</span> <span class="special">()(</span> <span class="identifier">match_results</span><span class="special"><</span> <span class="identifier">BidiIterT</span> <span class="special">></span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span> <span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> <span class="comment">// first, do some indenting </span> <span class="keyword">typedef</span> <span class="keyword">typename</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">iterator_traits</span><span class="special"><</span> <span class="identifier">BidiIterT</span> <span class="special">>::</span><span class="identifier">value_type</span> <span class="identifier">char_type</span><span class="special">;</span> <span class="identifier">char_type</span> <span class="identifier">space_ch</span> <span class="special">=</span> <span class="identifier">char_type</span><span class="special">(</span><span class="char">' '</span><span class="special">);</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">fill_n</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span><span class="identifier">char_type</span><span class="special">>(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">),</span> <span class="identifier">tabs_</span> <span class="special">*</span> <span class="number">4</span><span class="special">,</span> <span class="identifier">space_ch</span> <span class="special">);</span> <span class="comment">// output the match </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// output any nested matches </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">output_nested_results</span><span class="special">(</span> <span class="identifier">tabs_</span> <span class="special">+</span> <span class="number">1</span> <span class="special">)</span> <span class="special">);</span> <span class="special">}</span> <span class="special">};</span> </pre> <p> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> </p> </div> <div class="footnotes"> <br><hr width="100" align="left"> <div class="footnote"><p><sup>[<a name="ftn.id3097507" href="#id3097507" class="para">4</a>] </sup> See <a href="http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html" target="_top">Expression Templates</a> </p></div> <div class="footnote"><p><sup>[<a name="ftn.id3131674" href="#id3131674" class="para">5</a>] </sup> Many thanks to David Jenkins, who contributed this example. </p></div> </div> </div> <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> <td align="left"></td> <td align="right"><div class="copyright-footer">Copyright © 2007 Eric Niebler<p> Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) </p> </div></td> </tr></table> <hr> <div class="spirit-nav"> <a accesskey="p" href="../xpressive.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="reference.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> </div> </body> </html>