Sophie

Sophie

distrib > Mandriva > 2009.0 > i586 > by-pkgid > 8de1f55ea6a1a64d0f3f3ea116288458 > files > 99

happy-1.17-3mdv2009.0.i586.rpm

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>3.3. Including semantic results</title><link rel="stylesheet" href="fptools.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="Happy User Guide"><link rel="up" href="sec-glr.html" title="Chapter 3. Generalized LR Parsing"><link rel="prev" href="sec-glr-using.html" title="3.2. Basic use of a Happy-generated GLR parser"><link rel="next" href="sec-glr-misc.html" title="3.4. Further information"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.3. Including semantic results</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="sec-glr-using.html">Prev</a> </td><th width="60%" align="center">Chapter 3. Generalized LR Parsing</th><td width="20%" align="right"> <a accesskey="n" href="sec-glr-misc.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sec-glr-semantics"></a>3.3. Including semantic results</h2></div></div></div><p>
      This section discusses the options for including semantic information
      in grammars. 
      </p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-semantics-intro"></a>3.3.1. Forms of semantics</h3></div></div></div><p>
	Semantic information may be attached to productions in the 
	conventional way, but when more than one analysis is possible, 
	the use of the semantic information must change. 
	Two schemes have been implemented, which we call 
	<span class="emphasis"><em>tree decoding</em></span>
	and <span class="emphasis"><em>label decoding</em></span>. 
	The former is for simple applications, where there is not much 
	ambiguity and hence where the effective unpacking of the parse
	forest isn't a factor. This mode is quite similar to the 
	standard mode in <span class="application">Happy</span>.
	The latter is for serious applications, where sharing is important
	and where processing of the forest (eg filtering) is needed. 
	Here, the emphasis is about providing rich labels in nodes of the
	the parse forest, to support such processing. 
        </p><p>
	The default mode is labelling. If you want the tree decode mode,
	use the <code class="option">--decode</code> flag.
	</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-semantics-tree"></a>3.3.2. Tree decoding</h3></div></div></div><p>
	Tree decoding corresponds to unpacking the parse forest to individual
	trees and collecting the list of semantic results computed from 
	each of these. It is a mode intended for simple applications, 
	where there is limited ambiguity.
	You may access semantic results from components of a reduction 
	using the dollar variables. 
	As a working example, the following is taken from the 
	<code class="literal">expr-tree</code> grammar in the examples.
	Note that the type signature is required, else the types in use
	can't be determined by the parser generator.
        </p><pre class="programlisting">
	E :: {Int}		-- type signature needed
	  : E '+' E  { $1 + $3 }
	  | E '*' E  { $1 * $3 }
	  | i        { $1 }
        </pre><p>
	This mode works by converting each of the semantic rules into
	functions (abstracted over the dollar variables mentioned),
	and labelling each <code class="literal">Branch</code> created from a
	reduction of that rule with the function value.
	This amounts to <span class="emphasis"><em>delaying</em></span> the action of the 
	rule, since we must wait until we know the results of all of 
	the sub-analyses before computing any of the results. (Certain
	cases of packing can add new analyses at a later stage.)
	</p><p>
	At the end of parsing, the functions are applied across relevant 
	sub-analyses via a recursive descent. The main interface to this
	is via the class and entry function below. Typically, 
	<code class="literal">decode</code> should be called on the root of the 
	forest, also supplying a function which maps node names to their
	list of analyses (typically a partial application of lookup in
	the forest value).
	The result is a list of semantic values.
	Note that the context of the call to <code class="literal">decode</code>
	should (eventually) supply a concrete type to allow selection 
	of appropriate instance. Ie, you have to indicate in some way
	what type the semantic result should have.
	<code class="literal">Decode_Result a</code> is a synonym generated by
	<span class="application">Happy</span>: for non-monadic semantics,
	it is equivalent to <code class="literal">a</code>; when monads are 
	in use, it becomes the declared monad type.
	See the full <code class="literal">expr-eval</code> example for more 
	information.
	</p><pre class="programlisting">
	class TreeDecode a where
	       decode_b :: (ForestId -&gt; [Branch]) -&gt; Branch -&gt; [Decode_Result a]
	decode :: TreeDecode a =&gt; (ForestId -&gt; [Branch]) -&gt; ForestId -&gt; [Decode_Result a]
        </pre><p>
	The GLR parser generator identifies the types involved in each
	semantic rule, hence the types of the functions, then creates 
	a union containing distinct types. Values of this union are 
	stored in the branches. (The union is actually a bit more complex:
	it must also distinguish patterns of dollar-variable usage, eg
	a function <code class="literal">\x y -&gt; x + y </code> could be applied to 
	the first and second constituents, or to the first and third.)
	The parser generator also creates instances of the 
	<code class="literal">TreeDecode</code> class, which unpacks the semantic
	function and applies it across the decodings of the possible
	combinations of children. Effectively, it does a cartesian product
	operation across the lists of semantic results from each of the 
	children. Eg <code class="literal">[1,2] "+" [3,4]</code> produces
	<code class="literal">[4,5,5,6]</code>.
	Information is extracted from token values using the patterns 
	supplied by the user when declaring tokens and their Haskell
	representation, so the dollar-dollar convention works also.
	</p><p>
	The decoding process could be made more efficient by using
	memoisation techniques, but this hasn't been implemented since
	we believe the other (label) decoding mode is more useful. (If someone
	sends in a patch, we may include it in a future release -- but this
	might be tricky, eg require higher-order polymorphism?
	Plus, are there other ways of using this form of semantic function?)
	</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-semantics-label"></a>3.3.3. Label decoding</h3></div></div></div><p>
	The labelling mode aims to label branches in the forest with 
	information that supports subsequent processing, for example
	the filtering and prioritisation of analyses prior to extraction
	of favoured solutions. As above, code fragments are given in
	braces and can contain dollar-variables. But these variables 
	are expanded to node names in the graph, with the intention of
	easing navigation.
	The following grammar is from the <code class="literal">expr-tree</code>
	example. 
        </p><pre class="programlisting">
	E :: {Tree ForestId Int}
	  : E '+' E      { Plus  $1 $3 }
	  | E '*' E      { Times $1 $3 }
	  | i            { Const $1 }
        </pre><p>
	Here, the semantic values provide more meaningful labels than 
	the plain structural information. In particular, only the 
	interesting parts of the branch are represented, and the 
	programmer can clearly select or label the useful constituents 
	if required. There is no need to remember that it is the first 
	and third child in the branch which we need to extract, because
	the label only contains those values (the `noise' has been dropped).
	Consider also the difference between concrete and abstract syntax.
	The labels are oriented towards abstract syntax.
	Tokens are handled slightly differently here: when they appear
	as children in a reduction, their informational content can 
	be extracted directly, hence the <code class="literal">Const</code> value
	above will be built with the <code class="literal">Int</code> value from 
	the token, not some <code class="literal">ForestId</code>.
        </p><p>
	Note the useful technique of making the label types polymorphic 
	in the position used for forest indices. This allows replacement
	at a later stage with more appropriate values, eg. inserting
	lists of actual subtrees from the final decoding. 
        </p><p>
	Use of these labels is supported by a type class
	<code class="literal">LabelDecode</code>, which unpacks values of the 
	automatically-generated union type <code class="literal">GSem</code>
	to the original type(s). The parser generator will create
	appropriate instances of this class, based on the type information
	in the grammar file. (Note that omitting type information leads
	to a default of <code class="literal">()</code>.)
	Observe that use of the labels is often like traversing an abstract
	syntax, and the structure of the abstract syntax type usually 
	constrains the types of constituents; so once the overall type
	is fixed (eg. with a type cast or signature) then there are no
	problems with resolution of class instances.
	</p><pre class="programlisting">
	class LabelDecode a where
	       unpack :: GSem -&gt; a
        </pre><p>
	Internally, the semantic values are packed in a union type as
	before, but there is no direct abstraction step. Instead, the 
	<code class="literal">ForestId</code> values (from the dollar-variables)
	are bound when the corresponding branch is created from the 
	list of constituent nodes. At this stage, token information 
	is also extracted, using the patterns supplied by the user
	when declaring the tokens.
        </p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-semantics-tree-monad"></a>3.3.4. Monadic tree decoding</h3></div></div></div><p>
	You can use the <code class="literal">%monad</code> directive in the 
	tree-decode mode. 
	Essentially, the decoding process now creates a list of monadic 
	values, using the monad type declared in the directive. 
	The default handling of the semantic functions is to apply the
	relevant <code class="literal">return</code> function to the value being
	returned. You can over-ride this using the <code class="literal">{% ... }</code>
	convention. The declared <code class="literal">(&gt;&gt;=)</code> function is
	used to assemble the computations.
	</p><p>
	Note that no attempt is made to share the results of monadic 
	computations from sub-trees. (You could possibly do this by 
	supplying a memoising lookup function for the decoding process.)
	Hence, the usual behaviour is that decoding produces whole 
	monadic computations, each part of which is computed afresh
	(in depth-first order) when the whole is computed.
	Hence you should take care to initialise any relevant state 
	before computing the results from multiple solutions. 
	</p><p>
	This facility is experimental, and we welcome comments or 
	observations on the approach taken!
	An example is provided (<code class="literal">examples/glr/expr-monad</code>).
	It is the standard example of arithmetic expressions, except that
	the <code class="literal">IO</code> monad is used, and a user exception is
	thrown when the second argument to addition is an odd number.
	Running this example will show a zero (from the exception handler)
	instead of the expected number amongst the results from the other
	parses.
	</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="sec-glr-using.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="sec-glr.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="sec-glr-misc.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">3.2. Basic use of a Happy-generated GLR parser </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 3.4. Further information</td></tr></table></div></body></html>