Sophie

Sophie

distrib > Fedora > 15 > x86_64 > by-pkgid > 1e007a96761035f261351a68e7601417 > files > 618

parrot-docs-3.6.0-2.fc15.noarch.rpm

# Copyright (C) 2008, Parrot Foundation.

=head1 Episode 2: Poking in Compiler Guts

=head2 Introduction

In the first episode, we introduced the Parrot Compiler Tools, and generated a
very simple language using a shell script provided with the Parrot distribution.
We also announced Squaak, a simple programming language developed specially for
this tutorial. Squaak will be the case study to show how PCT can be used as a
very effective set of tools to implement a language for Parrot. A list of
features of Squaak was specified. If you felt lucky, you might even have tried
to do the exercise at the end of the previous episode.

In this episode, we will take a closer look at the generated compiler. We shall
check out the different stages of the compilation process, and show what's going
on in PCT-based compilers.

=head2 Under the Hood

Remember how we invoked our compiler in the previous episode? We can pass a
file, or invoke the compiler without a command line argument, in which case our
compiler enters the interactive mode. Consider the first case, passing the file
test.sq, just as we did before:

 $ ./installable_squaak test.sq

When invoking our compiler like this, the file test.sq is compiled and the
generated code (bytecode) is executed immediately by Parrot. How does this work,
you might wonder. The interpretation of a script is done through a series of
transformations, starting at the script source and ending in a format that can
be executed by Parrot. Compilers built with the PCT (based on the HLLCompiler
class) can take a target option, to show one of the intermediate
representations. This option can have the following values, corresponding to the
four default compilation phases of an HLLCompiler object:

=over 4

=item * --target=parse

=item * --target=past

=item * --target=post

=item * --target=pir

=back

This is an example of using the target option set to "parse", which will print
the parse tree of the input to stdout:

 $ ./installable_squaak --target=parse test.sq

In interactive mode, giving this input:

 say 42;

will print this parse tree (without the line numbers):

  1 "parse" => PMC 'Regex;Match' => "say 42;\n" @ 0 {
  2    <statementlist> => PMC 'Regex;Match' => "say 42;\n" @ 0 {
  3         <statement> => ResizablePMCArray (size:1) [
  4             PMC 'Regex;Match' => "say 42" @ 0 {
  5                 <statement_control> => PMC 'Regex;Match' => "say 42" @ 0 {
  6                     <sym> => PMC 'Regex;Match' => "say" @ 0
  7                     <EXPR> => ResizablePMCArray (size:1) [
  8                         PMC 'Regex;Match' => "42" @ 4 {
  9                             <integer> => PMC 'Regex;Match' => "42" @ 4 {
 10                                 <VALUE> => PMC 'Regex;Match' => "42" @ 4
 11                                 <decint> => \parse[0][0]
 12                             }
 13                         }
 14                     ]
 15                 }
 16             }
 17         ]
 18     }
 19 }

When changing the value of the target option, the output changes into a
different representation of the input. Why don't you try that right now?

So, a HLLCompiler object has four compilation phases: parsing, construction of a
Parrot Abstract Syntax Tree (PAST), construction of a Parrot Opcode Syntax Tree
(POST) and generation of Parrot Intermediate Representation (PIR). After
compilation, the generated PIR is executed immediately.

If your compiler needs additional stages, you can add them to your HLLCompiler
object. For Squaak, we will not need this, but for details, check out
F<compilers/pct/src/pct/HLLCompiler.pir>.

We shall now discuss each compilation phase in more detail. The first two
phases, parsing the input and construction of the PAST are executed
simultaneously. Therefore, these are discussed together.

Parse phase: match objects and PAST construction
During the parsing phase, the input is analyzed using Perl 6's extended regular
expressions, known as Rules (see Synopsis 5 for details). When a rule matches
some input string, a Match object is created. A Match object is a
combined array and hashtable and can be indexed by integers as well as
strings. As rules typically consist of other (sub) rules, it is easy to retrieve
a certain part of the match. For instance, this rule:

 rule if_statement {
     'if' <expression> 'then' <statement> 'end'
 }

has two other subrules: expression and statement. The match object for the rule
C<if_statement> represents the whole string from if to end.  You can retrieve a
the Match for a subrule by indexing into the Match object using the name of
that subrule.  For instance, to get the match for C<< <expression> >>, you
would use C<< $/<expression> >>.  (In nqp, C<< $foo<bar> >> indexes into
C<$foo> using the constant string C<bar> as a hash key.)

During the parse phase, the PAST is constructed. There is a small set of PAST
node types. For instance, C<PAST::Var> to represent variables (identifiers, such
as C<print>) and C<PAST::Val> to represent literal values (for instance, C<"hello">
and C<42>). Later we'll go through the various PAST nodes in more detail.

Now, you might wonder, at which point exactly is this PAST construction
happening? At the end of a successfully matching rule, the rule's parse action
is performed. Such a parse action is just a method that has the same name as
the rule which triggers it (in this case: C<if_statement>).  So, during the
parsing phase, several parse actions are executed, each of which builds a piece
of the total PAST representing the input string.

A Parrot Abstract Syntax Tree is just a compiler-friendly tree-based
representation of your program. It is convenient both for analysis and
optimization, and for further transformation into a lower-level representation
such as POST.

=head2 PAST to POST

After the PAST is constructed, the HLLCompiler transforms this PAST into a
Parrot Opcode Syntax Tree (POST).  The POST representation is also a tree
structure, but these nodes are on a lower abstraction level and correspond very
closely to PIR ops.  For instance, the PAST node type which represents a while
statement (constructed as C<PAST::Op.new( :pasttype('while') )> ) decomposes
into several POST nodes.

The template for a C<while> statement typically consists of a number of labels and
jump instructions. On the POST level, the same while statement is represented by
a set of nodes, each representing a one instruction or a label. This makes it
much easier to transforn POST into executable code.

Usually, as a user of the PCT, you don't need to know details of POST nodes,
which is why this will not be discussed in further detail. Use C<--target=post>
to see what a POST looks like.

=head2 POST to PIR

In the fourth (and final) stage, the POST is transformed into Parrot
Intermediate Representation (PIR). As mentioned, transforming a POST into
something executable is rather straightforward, as POST nodes already represent
individual instructions and labels. Again, normal usage of the PCT does not
require you to know any details about this transformation.

=head2 And now for the good news...

We established the general data flow of PCT-based compilers, which consists of
four stages:

=over 4

=item 1. source to parse tree

=item 2. parse tree to PAST

=item 3. PAST to POST

=item 4. POST to PIR

=back

The first two transformations happen during the parse stage. Now, as you're
reading this tutorial, you're probably interested in using the PCT to implement
Your Favorite Language on top of Parrot. We already saw that a language grammar
is expressed in Perl 6 Rules. What about the other transformations?  Well,
earlier in this episode we mentioned parse actions and that these actions
create PAST nodes. After you have written a parse action for each grammar rule,
you're done!

Say what?

That's right. Once you have correctly constructed a PAST, your compiler can
generate executable PIR, which means you just implemented your first language
on top of Parrot. Of course, you'll still need to implement any language specific
libraries, but that's beside the point.

PCT-based compilers already know how to transform PAST into POST and how to
transform POST into PIR. These transformation stages are already provided by
the PCT.

=head2 What's next?

In this episode we took a closer look at the internals of a PCT-based compiler.
We discussed the four compilation stages which transform an input string (a
program or script, depending on your definition) into PAST, POST and finally
executable PIR.

The next episodes is where the Fun Stuff is: we will be implementing Squaak for
Parrot. Piece by piece, we will implement the parser and the parse actions.
Finally, we'll demonstrate John Conway's "Game of Life" running on Parrot,
implemented in Squaak.

=head2 Exercises

Last episode's exercise was to add a command line banner and prompt for the
interactive mode of our compiler. Given the hints that were provided, it was
probably not too hard to find the solution, which is shown below. This
INIT block can be found in the file src/Squaak/Compiler.pm. The relevant lines are
marked with a comment

  INIT {
      Squaak::Compiler.language('Squaak');
      Squaak::Compiler.parsegrammar(Squaak::Grammar);
      Squaak::Compiler.parseactions(Squaak::Actions);

      Squaak::Compiler.commandline_banner("Squaak for Parrot VM.\n"); # set banner
      Squaak::Compiler.commandline_prompt('> '); # set prompt
  }

Starting in the next episode, the exercises will be more interesting. For now,
it would be useful to browse around through the source files of the compiler,
and see if you understand the relation between the grammar rules in src/Squaak/Grammar.pm
and the methods in src/Squaak/Actions.pm.
It's also useful to experiment with the --target option described in this
episode. If you don't know PIR, now is the time to do some preparation for that.
There's sufficient information to be found on PIR, see the References section
for details. In the mean time, if you have any suggestions, questions and
whatnot, don't hesitate to leave a comment.

=head2 References

=over 4

=item 1. PIR language specification: docs/pdds/pdd19_pir.pod

=back

=cut