Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > b9ba69a436161613d8fb030c8c726a8e > files > 410

spirit-1.5.1-2mdk.noarch.rpm

<html>
<head>
<title>Basic Concepts</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="theme/style.css" type="text/css">
</head>

<body>
<table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  <tr> 
    <td width="10"> 
    </td>
    <td width="85%"> 
      <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Basic Concepts</b></font>
    </td>
    <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
  </tr>
</table>
<br>
<table border="0">
  <tr>
    <td width="10"></td>
    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
    <td width="30"><a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a></td>
    <td width="20"><a href="primitives.html"><img src="theme/r_arr.gif" border="0"></a></td>
   </tr>
</table>
<p>There are a few fundamental concepts that need to be understood well: 1) <b>The 
  Parser</b>, 2) <b>Match</b>, 3) <b>The Scanner</b>, and 4) <b>Semantic Actions</b>. 
  These basic concepts interact with each other, and the functionalities of each 
  interweave throughout the entire framework to make it one coherent whole.</p>
<table width="48%" border="0" align="center">
  <tr>
    <td height="211"><img src="theme/intro1.png"></td>
  </tr>
</table>
<h2>The Parser</h2>
<p>Central to the framework is the parser. The parser does the actual work of 
  recognizing a linear input stream of data read sequentially from start to 
  end by the scanner. The parser attempts to match the input following a well-defined 
  set of specifications in the form of grammar rules. The parser reports 
  the success or failure to its client through a match object. When successful, 
  the parser calls a client-supplied semantic action. Finally, the strategically-placed 
  semantic action extracts structural information depending on the data 
  passed to it by the parser and the heirarchical context of the parser it is 
  attached to.</p>
<p> Parsers come in different flavors. The Spirit framework comes bundled with 
  an extensive set of pre-defined parsers that perform various parsing tasks from 
  the trivial to the complex. The parser, as a concept, has a public conceptual 
  interface contract. Following the contract, anyone can write a conforming parser 
  that will play along well with the framework's predefined components. We shall 
  provide a blueprint detailing the conceptual interface of the parser later.</p>
<p>Clients of the framework generally do not need to write their own hand-coded 
  parsers at all. Spirit has an immense repertoire of pre-defined parsers covering 
  all aspects of syntax and semantic analysis. We shall examine this repertoire 
  of parsers in the following sections. In the rare case where a specific functionality 
  is not available, it is extremely easy to write a user-defined parser. The ease 
  in writing a parser entity is the main reason for Spirit's extensibility.</p>
<h2>Primitives and Composites</h2>
<p>Spirit parsers fall into two categories: <b>primitives</b> and <b>composites</b>. 
  These two categories are more or less synonymous to terminals and non-terminals 
  in parsing lingo. Primitives are non-decomposable atomic units. Composites on 
  the other hand are parsers that are composed of other parsers which can in turn 
  be a primitive or another composite. To illustrate, consider the Spirit expression:</p>
<pre><code><span class=identifier>    real_p </span><span class=special>&gt;&gt; </span><span class=special>*(</span><span class=literal>',' </span><span class=special>&gt;&gt; </span><span class=identifier>real_p</span><span class=special>)</span></code></pre>
<p><tt><tt>real_p</tt></tt> is a primitive parser that can parse real numbers. 
  The quoted comma <tt class="quotes">','</tt> in the expression is a shortcut 
  and is equivalent to <tt>ch_p<span class="operators">(</span><span class="quotes">','</span><span class="operators">)</span></tt>, 
  which is another primitive parser that recognizes single characters. </p>
<p>The expression above corresponds to the following parse tree:</p>
<table width="29%" border="0" align="center">
  <tr>
    <td><img src="theme/intro7.png"></td>
  </tr>
</table>
<p>The expression:</p>
<pre><span class=special>    </span><span class=literal>',' </span><span class=special>&gt;&gt; </span><span class=identifier>real_p</span></pre>
<p>composes a <b>sequence</b> parser. The <tt>sequence</tt> parser is a composite 
  parser comprising two parsers: the one on its left hand side (lhs), <tt>ch_p<span class="operators">(</span><span class="quotes">','</span><span class="operators">)</span></tt> ;
  and the other on its right hand side (rhs), <tt>real_p</tt>. This composite 
  parser, when called, calls its lhs and rhs in sequence and reports a successful 
  match only if both are successful. </p>
<table width="14%" border="0" align="center">
  <tr>
    <td><img src="theme/intro2.png"></td>
  </tr>
</table>
<p> The <tt>sequence</tt> parser is a binary composite. It is composed of two 
  parsers. There are unary composites as well. Unary composites hold only a single 
  subject. Like the binary composite, the unary composite may change the behavior of its 
  embedded subject. One particular example is the <b>Kleene star</b>. The Kleene 
  star, when called, calls its sole subject zero or more times. &quot;Zero or 
  more&quot; implies that the Kleene star always returns a successful match, possibly 
  matching the null string: &quot;&quot;.</p>
<p>The expression:</p>
<pre><code><span class=identifier>    </span><span class=special>*(</span><span class=literal>',' </span><span class=special>&gt;&gt; </span><span class=identifier>real_p</span><span class=special>)</span></code></pre>
<p>wraps the whole sequence composite above inside a <tt>kleene_star</tt>. </p>
<table width="17%" border="0" align="center">
  <tr>
    <td><img src="theme/intro3.png"></td>
  </tr>
</table>
<p>Finally, the full expression composes a <tt>real_p</tt> primitive parser and 
  the <tt>kleene_star</tt> we have above into another higher level <tt>sequence</tt> 
  parser composite.<br>
</p>
<table width="34%" border="0" align="center">
  <tr>
    <td><img src="theme/intro4.png"></td>
  </tr>
</table>
<p>A few simple classes, when composed and structured in a hierarchy, form a very 
  powerful object-oriented recursive-descent parsing engine. These classes provide 
  the infrastructure needed for the construction of more-complex parsers. The 
  final parser composite is a non-deterministic recursive-descent parser with 
  infinite look-ahead. </p>
<p>Top-down descent traverses the hierarchy. The outer <tt>sequence</tt> calls 
  the leftmost <tt>real_p</tt> parser. If successful, the <tt>kleene_star</tt> 
  is called next. The <tt>kleene_star</tt> calls the inner <tt>sequence</tt> repeatedly 
  in a loop until it fails to match, or the input is exhausted. Inside, <tt>ch_p('x')</tt> 
  and then <tt>real_p</tt> are called in sequence. The following diagram illustrates 
  what's happening, somewhat reminiscent of Pascal syntax diagrams.</p>
<p></p>
<table width="37%" border="0" align="center">
  <tr>
    <td><img src="theme/intro5.png"></td>
  </tr>
</table>
<p>The flexibility of object embedding and composition combined with recursion 
  opens up a unique approach to parsing. Subclasses are free to form aggregates 
  and algorithms of arbitrary complexity. Complex parsers can be created with 
  the composition of only a few primitive classes.</p>
<p>The framework is designed to be fully open-ended and extensible. New primitives 
  or composites, from the trivial to the complex, may be added any time. Composition 
  happens (statically) at compile time. This is possible through the expressive 
  flexibility of C++ expression templates and template meta-programming.</p>
<p>The final result is a composite composed of primitives and smaller composites. 
  This embedding strategy gives us the ability to build hierarchical structures 
  that fully model EBNF expressions of arbitrary complexity. Later on, we shall 
  see more primitive and composite building blocks.</p>
<h2>The Scanner</h2>
<p>Like the parser, the scanner is also an abstract concept. The task of the scanner 
  is to feed the sequential input data stream to the parser. The scanner is composed 
  of two STL conforming forward iterators, first and last, where first is held 
  by reference and last, by value. The first iterator is held by reference to 
  allow re-positioning by the parser. A set of policies govern how the scanner 
  behaves. Parsers extract data from the scanner and position the iterator appropriately 
  through its member functions.</p>
<p>Knowledge of the intricacies of these policies is not required at all in most 
  cases. However, knowledge of the scanner's basic API is required to write 
  fully-conforming Spirit parsers. The scanner's API will be outlined in a separate 
  section. In addition, for the power users and the adventurous among us, a full 
  section will be devoted to covering the scanner policies. The scanner policies 
  make Spirit very flexible and extensible. For instance, some of the policies 
  may be modified to filter data. A practical example is a scanner policy that 
  does not distinguish upper and lower case whereby making it useful for parsing 
  case insensitive input. Another example is a scanner policy that strips white 
  spaces from the input.</p>
<h2>The Match</h2>
<p>The parser has a conceptual parse member function taking in a scanner and returning 
  a match object. The primary function of the match object is to report parsing 
  success (or failure) back to the parser's caller; i.e. it evaluates to true if 
  the parse function is successful, false otherwise. If the parse is successful, 
  the match object may also be queried to report the number of characters matched 
  (using <tt>match.length()</tt>). The length is non-negative if the match is successful, 
  and the typical length of a parse failure is -1. A zero length is perfectly 
  valid and still represents a successful match.</p>
<p>Parsers may have attribute data associated with it. For example, the real_p 
  parser has a numeric datum associated with it. This attribute is the parsed number. 
  This attribute is passed on to the returned match object. The match object may 
  be queried to get this attribute. This data is valid only when the match is 
  successful.</p>
<h2>Semantic Actions</h2>
<p>A composite parser forms a hierarchy. Parsing proceeds from the topmost parent 
  parser which delegates and apportions the parsing task to its children recursively 
  to its childeren's children and so on until a primitive is reached. By attaching 
  semantic actions to various points in this hierachy, in effect we can transform 
  the flat linear input stream into a structured representation. This is essentially 
  what parsers do.</p>
<p>Recall our example above:<br>
</p>
<pre><code><span class=special>    </span><span class=identifier>real_p </span><span class=special>&gt;&gt; </span><span class=special>*(</span><span class=literal>',' </span><span class=special>&gt;&gt; </span><span class=identifier>real_p</span><span class=special>)</span></code></pre>
<p>By hooking a function (or functor) into the real_p parsers, we can extract 
  the numbers from the input:<br>
</p>
<pre><span class=special>    </span><span class=identifier>real_p</span><span class=special>[&</span><span class=identifier>f</span><span class=special>] </span><span class=special>&gt;&gt; </span><span class=special>*(</span><span class=literal>',' </span><span class=special>&gt;&gt; </span><span class=identifier>real_p</span><span class=special>[&</span><span class=identifier>f</span><span class=special>])</span></pre>
<table width="41%" border="0" align="center">
  <tr> 
    <td><img src="theme/intro6.png"></td>
  </tr>
</table>
<p>where <tt>f</tt> is a function that takes in a single argument. The <tt><span class="operators">[&amp;</span>f<span class="operators">]</span></tt> 
  hooks the parser with the function such that when <tt>real_p</tt> recognizes 
  a valid number, the function <tt>f</tt> is called. It is up to the function 
  then to do what's appropriate. For example, it can stuff the numbers in a vector. 
  Or perhaps, if the grammar is changed slightly by replacing <tt class="quotes">','</tt> 
  with <tt class="quotes">'+'</tt>, then we have a primitive calculator that computes 
  sums. The function <tt>f</tt> then can then be made to add all incoming numbers.</p>
<br>
<table border="0">
  <tr> 
    <td width="10"></td>
    <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
    <td width="30"><a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a></td>
    <td width="20"><a href="primitives.html"><img src="theme/r_arr.gif" border="0"></a></td>
  </tr>
</table>
<br>
<hr size="1">
<p class="copyright">Copyright &copy; 1998-2002 Joel de Guzman<br>
  <br>
  <font size="2">Permission to copy, use, modify, sell and distribute this document 
  is granted provided this copyright notice appears in all copies. This document 
  is provided &quot;as is&quot; without express or implied warranty, and with 
  no claim as to its suitability for any purpose. </font> </p>
<p>&nbsp;</p>
</body>
</html>