Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > b9ba69a436161613d8fb030c8c726a8e > files > 445

spirit-1.5.1-2mdk.noarch.rpm

<html>
<head>
<title>Subrules</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="theme/style.css" type="text/css">
</head>

<body>
<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  <tr>
    <td width="10">
    </td>
    <td width="85%">
      <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Subrules</b></font>
    </td>
    <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
  </tr>
</table>
<br>
<table border="0">
  <tr>
    <td width="10"></td>
    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
    <td width="30"><a href="grammar.html"><img src="theme/l_arr.gif" border="0"></a></td>
    <td width="20"><a href="semantic_actions.html"><img src="theme/r_arr.gif" border="0"></a></td>
   </tr>
</table>
<p>Spirit is implemented using expression templates. This is a very powerful technique.
  Along with its power comes some complications. We almost take for granted that
  when we write <tt>i | j &gt;&gt; k</tt> where <tt>i</tt>, <tt>j</tt> and <tt>k</tt>
  are all integers the result is still an integer. Yet, with expression templates,
  the same expression <tt>i | j &gt;&gt; k</tt> where <tt>i</tt>, <tt>j</tt> and
  <tt>k</tt> are of type <tt>T</tt>, the result is a complex composite type [see
  <a href="basic_concepts.html">Basic Concepts</a>]. Spirit expressions, which
  are combinations of primitives and composites yield an infinite set of new types.
  One problem is that C++ offers no easy facility to deduce the type of an arbitrarily
  complex expression that yields a complex type. Thus, while it is easy to write:</p>
<pre><code><font color="#000000"><span class=identifier>    </span><span class=keyword>int </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are ints</span></font></code></pre>
<p>Forget rules for a second. Expression templates yield an endless supply of 
  types. There is no easy way to do this in C++ if <tt>i</tt>, <tt>j</tt> and 
  <tt>k</tt> are Spirit parsers:</p>
<pre><code><font color="#000000"><span class=comment>    </span><span class=special>&lt;</span><span class=identifier>what_type</span><span class=special>&gt; </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>; </span><span class=comment>// where i, j, and k are Spirit parsers</span></font></code></pre>
<p>If <tt>i</tt>, <tt>j</tt> and <tt>k</tt> are all <tt>chlit&lt;&gt;</tt> objects,
  what we want is:</p>
<pre><code><font color="#000000"><span class=comment>    </span><span class=keyword>typedef
        </span><span class=identifier>alternative
        </span><span class=special>&lt;
            </span><span class=identifier>chlit</span><span class=special>&lt;&gt;,        </span><span class=comment>//  i
            </span><span class=identifier>sequence
            </span><span class=special>&lt;
                </span><span class=identifier>chlit</span><span class=special>&lt;&gt;,    </span><span class=comment>//  j
                </span><span class=identifier>chlit</span><span class=special>&lt;&gt;     </span><span class=comment>//  k
            </span><span class=special>&gt;
        </span><span class=special>&gt;
    </span><span class=identifier>rule_t</span><span class=special>;

    </span><span class=identifier>rule_t </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>;     </span><span class=comment>// where i, j, and k are chlit&lt;&gt; objects</span></font></code></pre>
<p>We deliberately formatted the type declaration nicely to make it understandable.
  Try that with a more complex expression. While it can be done, explicitly spelling
  out the type of a Spirit expression template is tedious and error prone. The
  right hand side (rhs) has to mirror the type of the left hand side (lhs).</p>
<table width="80%" border="0" align="center">
  <tr>
    <td class="note_box"><img src="theme/lens.gif" width="15" height="16"><b>
      typeof and auto</b> <br>
      <br>
      Some compilers already support the <tt>typeof</tt> keyword. This can be
      used to free us from having to explicitly type the type (pun intentional).
      Using the <tt>typeof</tt>, we can rewrite the Spirit expression above as:<br>
      <br>
      <span class=identifier><code>typeof</code></span><code><span class=special>(</span><span class=identifier>i 
      </span><span class=special>| </span><span class=identifier>j </span><span class=special>&gt;&gt; 
      </span><span class=identifier>k</span><span class=special>) </span><span class=identifier>r 
      </span><span class=special>= </span><span class=identifier>i </span><span class=special>| 
      </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>;</span></code><br>
      <br>
      While this is better than having to explicitly declare a complex type, it
      is redundant, error prone and still an eye sore. The expression is typed
      twice. The only way to simplify this is to introduce a macro<tt>:<font color="#FF0000"><br>
      <br>
      <code>#define</code></font><code> RULE(NAME, DEFINITION) \<br>
      typeof((DEFINITION)) NAME = </code></tt><code><tt>(DEFINITION</tt>)</code><br>
      <br>
      <a href="http://www.boost-consulting.com">David Abrahams</a> proposed in
      comp.std.c++ to reuse the <tt>auto</tt> keyword to introduce named temporaries.
      This has been extensibly discussed in <a href="http://www.boost.org">boost.org</a>.
      Example:<br>
      <br>
      <span class=keyword><code>auto </code></span><code><span class=identifier>r 
      </span><span class=special>= </span><span class=identifier>i </span><span class=special>| 
      </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>;</span></code><br>
      <br>
      Once such a C++ extension is accepted into the standard, this would be a
      neat solution and a nice fit for our purpose. It's not a complete solution
      though since there are still situations where we do not know the rhs beforehand;
      for instance when pre-declaring cyclic dependent rules.</td>
  </tr>
</table>
<p>Fortunately, rules come to the rescue. Rules can capture the type of the expression
  assigned to it. Thus:</p>
<pre><code><font color="#000000">    <span class=identifier>rule</span><span class=special>&lt;&gt; </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>i </span><span class=special>| </span><span class=identifier>j </span><span class=special>&gt;&gt; </span><span class=identifier>k</span><span class=special>;  </span><span class=comment>// where i, j, and k are chlit&lt;&gt; objects</span></font></code></pre>
<p>It might not be apparent but behind the scenes, plain rules are actually implemented
  using a pointer to a runtime polymorphic abstract class that holds the dynamic
  type of the parser assigned to it. When a Spirit expression is assigned to a
  rule, its type is encapsulated in a concrete subclass of the abstract class.
  A virtual parse function delegates the parsing to the encapsulated object.</p>
<p>Rules have drawbacks though:</p>
<p><img src="theme/bullet.gif" width="12" height="12"> It is coupled to a specific
  scanner type. The rule is tied to a specific scanner [see <a href="scanner.html#business">The
  Scanner Business</a>]<br>
  <img src="theme/bullet.gif" width="12" height="12"> The rule's parse member
  function has a virtual function call overhead that cannot be inlined<br>
</p>
<h2>Static rules: subrules </h2>
<p>The subrule is a fully static version of the rule. The subrule does not have
  the drawbacks listed above. </p>
<p><img src="theme/bullet.gif" width="12" height="12"> The subrule is not tied
  to a specific scanner so just about any scanner type may be used<br>
  <img src="theme/bullet.gif" width="12" height="12"> The subrule also allows
  aggressive inlining since there are no virtual function calls</p>
<p>Apart from a few minor differences, the subrule follows the usage and syntax
  of the rule closely. Here's the calculator grammar using subrules:</p>
<pre><code><font color="#000000"><span class=comment>    </span><span class=keyword>struct </span><span class=identifier>calculator </span><span class=special>: </span><span class=keyword>public </span><span class=identifier>grammar</span><span class=special>&lt;</span><span class=identifier>calculator</span><span class=special>&gt;
    </span><span class=special>{
        </span><span class=keyword>template </span><span class=special>&lt;</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>&gt;
        </span><span class=keyword>struct </span><span class=identifier>definition
        </span><span class=special>{
            </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>calculator </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>self</span><span class=special>)
            </span><span class=special>{
                </span><span class=identifier>first </span><span class=special>=
                </span><span class=special>(
                    </span><span class=identifier>expression  </span><span class=special>= </span><span class=identifier>term </span><span class=special>&gt;&gt; </span><span class=special>*((</span><span class=literal>'+' </span><span class=special>&gt;&gt; </span><span class=identifier>term</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'-' </span><span class=special>&gt;&gt; </span><span class=identifier>term</span><span class=special>)),
                    </span><span class=identifier>term        </span><span class=special>= </span><span class=identifier>factor </span><span class=special>&gt;&gt; </span><span class=special>*((</span><span class=literal>'*' </span><span class=special>&gt;&gt; </span><span class=identifier>factor</span><span class=special>) </span><span class=special>| </span><span class=special>(</span><span class=literal>'/' </span><span class=special>&gt;&gt; </span><span class=identifier>factor</span><span class=special>)),
                    </span><span class=identifier>factor      </span><span class=special>= </span><span class=identifier>integer </span><span class=special>| </span><span class=identifier>group</span><span class=special>,
                    </span><span class=identifier>group       </span><span class=special>= </span><span class=literal>'(' </span><span class=special>&gt;&gt; </span><span class=identifier>expression </span><span class=special>&gt;&gt; </span><span class=literal>')'
                </span><span class=special>);
            </span><span class=special>}

            </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt;  </span><span class=identifier>expression</span><span class=special>;
            </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>1</span><span class=special>&gt;  </span><span class=identifier>term</span><span class=special>;
            </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>2</span><span class=special>&gt;  </span><span class=identifier>factor</span><span class=special>;
            </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>3</span><span class=special>&gt;  </span><span class=identifier>group</span><span class=special>;

            </span><span class=identifier>rule</span><span class=special>&lt;</span><span class=identifier>ScannerT</span><span class=special>&gt; </span><span class=identifier>first</span><span class=special>;
            </span><span class=identifier>rule</span><span class=special>&lt;</span><span class=identifier>ScannerT</span><span class=special>&gt; </span><span class=keyword>const</span><span class=special>&
            </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>first</span><span class=special>; </span><span class=special>}
        </span><span class=special>};
    </span><span class=special>};</span></font></code></pre>
<p><img src="theme/lens.gif" width="15" height="16"> A fullly working example
  with <a href="semantic_actions.html">semantic actions</a> can be <a href="subrule_calc.cpp.html">viewed
  here</a>. This is part of the Spirit distribution. <br>
  [ See libs/spirit/example/fundamental/calc/subrule_calc.cpp ] </p>
<table border="0" align="left">
  <tr> 
    <td width="199"><img src="theme/subrule1.png" width="234" height="224"></td>
    <td width="2"></td>
  </tr>
</table>
<p>The subrule as an efficient version of the rule. Compiler optimizations such
  as aggressive inlining help reduce the code size and increase performance significantly.
</p>
<p>The subrule is not a panacea however. Subrules push the C++ compiler hard to
  its knees. For example, current compilers have a limit on recursion depth that
  may not be exceeded. Don't even think about writing a full pascal grammar using
  subrules alone. A grammar using subrules is a single C++ expression. Current
  C++ compilers cannot handle very complex expressions very well. Finally, a plain
  rule is still needed to act as place holder for subrules.</p>
<p>The code above is a good example of the recommended way to use subrules. Notice
  the hierarchy. We have a grammar that encapsulates the whole calculator. The
  start rule is a plain rule that holds the set of subrules. The subrules in turn
  defines the actual details of the grammar.</p>
<p>This setup gives a good balance. The subrules do all the work. Each grammar
  will have only one rule: <tt>first</tt>. The rule is used just to hold the subrules
  and make them visible to the grammar. </p>
<h3>The grammar definition</h3>
<p>Like the rule, the expression after assignment operator <tt>=</tt> defines
  the subrule:</p>
<pre>    <span class=identifier>identifier </span><span class=special>= </span><span class=identifier>expression</span></pre>
<p>Unlike rules, subrules may be defined only once. Redefining a subrule is illegal
  and will result to a compile time assertion.<br>
</p>
<h3>Separators [ , ]</h3>
<p>While rules are terminated by the semicollon <tt>';'</tt>. Subrules are not
  terminated but are rather separated by the comma: <tt>','</tt>. Like Pascal
  statements, the last subrule in a group may not have a trailing comma.</p>
<pre><span class=identifier>    </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>),
    </span><span class=identifier>b </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'b'</span><span class=special>),
    </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>), </span><span class=comment>// BAD, trailing comma</span><code><font color="#000000"><font color="#800000"><i></i></font></font></code><code><font color="#000000"><font color="#800000"><i></i></font></font><i></i></code></pre>
<p>
<pre><code><span class=comment>    </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>),
    </span><span class=identifier>b </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'b'</span><span class=special>),
    </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>)  </span><span class=comment>// OK</span></code></pre>
<h3> The start subrule</h3>
<p>Unlike rules, parsing proceeds from the start subrule. The first (topmost)
  subrule in a group of subrules is called the <b>start subrule</b>. In our example
  above, <tt>expression</tt> is the start subrule. When a group of subrules is
  called forth, the start subrule <tt>expression</tt> is called first. </p>
<h3></h3>
<h3>IDs</h3>
<p>Each subrule has a corresponding ID. The ID is any integral constant that uniquely
  specifies the subrule. Our example above has four subrules. They are declared
  as: </p>
<pre><code><span class=comment>    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt;  </span><span class=identifier>expression</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>1</span><span class=special>&gt;  </span><span class=identifier>term</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>2</span><span class=special>&gt;  </span><span class=identifier>factor</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>3</span><span class=special>&gt;  </span><span class=identifier>group</span><span class=special>;</span></code></pre>
<h3> Aliases</h3>
<p>It is possible to have subrules with similar IDs. A subrule with a similar
  ID to will be an alias of the other. Both subrules may be used interchangeably.</p>
<pre><code><span class=special>    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt;  </span><span class=identifier>a</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt;  </span><span class=identifier>alias</span><span class=special>;  </span><span class=comment>// alias of a</span></code></pre>
<h3>Groups: scope and nesting</h3>
<p>The scope of a subrule and its definition is the enclosing group, typically 
  (and by convention) enclosed inside the parentheses. IDs outside a scope are 
  not directly visible. Inner subrule groups can be nested by enclosing each sub-group 
  inside another set of parentheses. Each group is unique and acts independently. 
  Consequently, while it may not be advisable to do so, a subrule in a group may 
  share the same ID as a subrule in another group since both groups are independent 
  of each other.<br>
</p>
<pre><code><span class=comment>    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt; </span><span class=identifier>a</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>1</span><span class=special>&gt; </span><span class=identifier>b</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>0</span><span class=special>&gt; </span><span class=identifier>c</span><span class=special>;
    </span><span class=identifier>subrule</span><span class=special>&lt;</span><span class=number>1</span><span class=special>&gt; </span><span class=identifier>d</span><span class=special>;

    </span><span class=special>(                       </span><span class=comment>// outer subrule group, scope of a and b
        </span><span class=identifier>a </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'a'</span><span class=special>),
        </span><span class=identifier>b </span><span class=special>=
        </span><span class=special>(                   </span><span class=comment>// inner subrule group, scope of b and c
            </span><span class=identifier>c </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'c'</span><span class=special>),
            </span><span class=identifier>d </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'d'</span><span class=special>)
        </span><span class=special>)
    </span><span class=special>)</span></code></pre>
<p>Subrule IDs need to be unique only within a group. A grammar is an implicit 
  group. Furthermore, even subrules in a grammar may have the same IDs without 
  clashing if they are inside a group. Subrules may be explicitly grouped using 
  the parentheses. Parenthesized groups have unique scopes. In the code above, 
  the outer subrule group defines the subrules <tt>a</tt> and <tt>b</tt> while 
  the inner subrule group defines the subrules <tt>c</tt> and <tt>d</tt>. Notice 
  that the definition of <tt>b</tt> is the inner subrule.</p>
<table width="80%" border="0" align="center">
  <tr>
    <td class="note_box"><img src="theme/lens.gif" width="15" height="16"><b>
      Why are subrules important?</b><br>
      <br>
      Subrules open up the oportunity to do aggressive meta programming as well
      because they do not rely on virtual functions. The virtual function is the
      meta-programmer's hell. Not only does it slow down the program due to the
      virtual function indirect call, it is also an opaque wall where no metaprogram
      can get past. It kills all meta-information beyond the virtual function
      call. Worse, the virtual function cannot be templated. Which means that
      its arguments have to be tied to a actual types. Many problems stem from
      this limitation. <br>
      <br>
      While Spirit is a currently classified as a non-deterministic recursive-descent
      parser, Doug Gregor first noted that other parsing techniques apart from
      top-down recursive descent may be applied. For instance, apart from non-deterministic
      recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented
      using the same expression template front end. Spirit rules use virtual functions
      to encode the RHS parser expression in an opaque abstract parser type. While
      it serves its purpose well, the rule's virtual functions are the stumbling
      blocks to more advanced metaprogramming. Subrules are free from virtual
      functions.</td>
  </tr>
</table>
<br>
<table border="0">
  <tr>
    <td width="10"></td>
    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
    <td width="30"><a href="grammar.html"><img src="theme/l_arr.gif" border="0"></a></td>
    <td width="20"><a href="semantic_actions.html"><img src="theme/r_arr.gif" border="0"></a></td>
  </tr>
</table>
<br>
<hr size="1">
<p class="copyright">Copyright &copy; 1998-2002 Joel de Guzman<br>
  <br>
  <font size="2">Permission to copy, use, modify, sell and distribute this document 
  is granted provided this copyright notice appears in all copies. This document 
  is provided &quot;as is&quot; without express or implied warranty, and with 
  no claim as to its suitability for any purpose.</font></p>
<p>&nbsp;</p>
<p><code><font color="#000000"><font color="#0000ff"></font></font></code></p>
</body>
</html>