Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > 1e007a96761035f261351a68e7601417 > files > 426

parrot-docs-3.6.0-2.fc15.noarch.rpm

# Copyright (C) 2007-2010, Parrot Foundation

=head1 PDD 26: Compiler Tools - Abstract Syntax Tree

=head2 Abstract

This PDD describes the node types and semantics of the Parrot
Abstract Syntax Tree representation.





=head2 Description

The Parrot Abstract Syntax Tree (PAST) is a set of Parrot
classes useful for generating an abstract semantic representation
of a program written in a high level language such as Perl or
Python.  Once a program has been translated into its equivalent
PAST representation, the Parrot Compiler Toolkit can be used to
generate executable PIR or bytecode from the abstract syntax
tree representation.

PAST is designed to provide the common operations and semantics
needed by many of the high level languages targeted by Parrot,
while also being extensible to support the unique needs of
specific languages.

=head2 Implementation

=head3 PAST::Node

C<PAST::Node> is the base class for all nodes in an abstract
syntax tree.  Each C<PAST::Node> element has an array of
children nodes (which may be empty), as well as a number of
attributes and accessor methods for the node.

Every PAST node has C<name>, C<source>, and C<pos> attributes.
The C<name> attribute is the node's name, if any, while
C<source> and C<pos> are used to identify the location of the
node in the original source code.

=over 4

=item init([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])

Initialize a PAST node with the given children and attributes.
Adds each child to the node (using the C<push> method, below)
and calls the appropriate accessor method for each attribute.

=item new([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])

Create a new PAST node with the same type as the invocant
and initialized with the given children and attributes.
Returns the newly created node.

=item push(child)

Add C<child> to the end of the node's array of children.

=item pop()

Remove the last child from the node's array of children.
Returns the child.

=item unshift(child)

Add C<child> to the beginning of the node's array of children.

=item shift()

Remove the first child from the node's array of children.
Returns the child.

=item iterator( )

Return a newly initialized C<Iterator> for the node's list
of children.

=item node(val)

Set the invocant's C<source> and C<pos> attributes to be the
same as C<val>.  If C<val> is another PAST node, then C<source>
and C<pos> are simply copied from that node.  Otherwise, C<val>
is assumed to be a C<Match> object (e.g., from PGE) and the
source and position information are obtained from that.

=item name([value])

Accessor method -- sets/returns the C<name> attribute of the node.

=item named([value])

Accessor method -- for nodes that are being used as named arguments,
sets/returns the name to be associated with the argument.

=item flat([value])

Accessor method -- sets/returns the "flatten" flag on nodes being
used as arguments.

=item attr(STR attrname, PMC value, INT has_value)

Helper method for writing accessors -- if C<has_value> is true
then set the node's value of C<attrname> to C<value>.  Returns
the (resulting) value of C<attrname> for the invocant.

=back

=head3 PAST::Stmts

C<PAST::Stmts> is simply a C<PAST::Node> used to contain a
sequence of other PAST nodes to be evaluated.  It has no
specific methods or attributes.

=head3 PAST::Val

C<PAST::Val> nodes represent constant values in the abstract
syntax tree.  The C<value> attribute represents the value of
the node itself.

=over 4

=item value([value])

Accessor method to get/set the constant value for this node.

=item returns([typename])

Get/set the type of value to be generated by this node.
If not specified, the type is taken from the type of
the value attribute given above.

=back

=head3 PAST::Var

C<PAST::Var> nodes represent variable-like items within the
abstract syntax tree.  The C<name> attribute (inherited from
C<PAST::Node>) identifies the name of the variable as given
by the high level language program.  The other attributes for
C<PAST::Var> nodes provide the details of how the variable
is defined and accessed.

=over 4

=item scope([value])

A variable node's "scope" indicates how the variable is to be
defined or accessed within the program.  The available scopes
include:

=over 4

=item "lexical"

Lexical variables are scoped to the C<PAST::Block> in which
they are declared, including any nested blocks.  If the
node's C<isdecl> attribute is true, then this node defines
a new lexical variable within the current block.  Otherwise,
the node refers to a lexical variable already declared in the current
or outer block.

=item "package"

Package variables represent global or namespace-managed
variables in the program.  The node's C<namespace> attribute
may be used to explicitly identify the namespace of the variable,
otherwise it is assumed to be within the namespace of the
C<PAST::Block> containing the node.

=item "parameter"

Parameter variables are the formal arguments to subroutine and
methods, typically represented as C<PAST::Block> nodes.  The
parameter's lexical name is given by the node's C<name> attribute.
Positional parameters are defined by the order in which they
appear in the C<PAST::Block> node, named parameters have values
for the C<named> attribute.  Optional parameters are identified
via the C<viviself> attribute (see below) indicating how the parameter
is to be initialized if not supplied by the caller.  Slurpy parameters
are indicated via the node's C<slurpy> attribute.

=item "keyed"

Keyed variables represent the elements of aggregates such as
arrays and hashes.  Nodes representing keyed elements have two
children; the first child is the PAST representation of the
aggregate, and the second child is the PAST representation of
the key or index.  The C<vivibase> attribute below may be used
to specify how to generate the aggregate if it doesn't already
exist.

=item "attribute"

Attribute variables represent object attributes (in some languages
they're called "member variables"). The attribute's name is given
by the node's C<name> attribute. Nodes representing attribute
variables have an optional child, representing the object to which
the attribute belongs. If this child is not present, the attribute
is assumed to belong to the current invocant, indicated by the
special variable C<self> (which is implicitly passed to all subs
that are flagged as a C<:method> or C<:vtable>).

=item "register"

Register variables are limited in scope to the C<PAST::Block> node
in which they are declared. This is different from the C<lexical>
scope, which I<includes> any nested C<PAST::Block> nodes. If the
node's C<isdecl> attribute is true, then this node defines a new
register variable within the current block. Register variables
are mapped to Parrot registers, and are useful for handling the
PIR pseudo-variable C<self> and storing intermediate results.
Names given to the C<name> attribute must conform to rules for
PIR identifiers. If no C<name> attribute is set, Parrot registers
are used. In this case, setting the C<isdecl> does not have any
effect.


=back

If C<scope> is not explicitly provided in the node, then PAST will
look at the local symbol tables of any outer C<PAST::Block> nodes
to try to determine the scope of the named variable.  If this still
does not result in a scope, then 'lexical' is assumed.

=item viviself([value])

Accessor method for the C<viviself> attribute, which specifies
how uninitialized variables are to be initialized when first
used.  The C<viviself> attribute may be either a string or array
representing a type (e.g., C<Integer>), or it may be a PAST
representation for generating the variable's initial value.

=item vivibase([value])

Accessor method for the C<vivibase> attribute, which specifies
how the base portion of aggregate variables (see "keyed scope" above)
is to be created if it doesn't already exist.  C<Vivibase>
may either be a string or array representing the type of the
newly created aggregate, or it may be a PAST representation for
generating the aggregate.

=item isdecl([flag])

Accessor method for the C<isdecl> attribute.  A true value
of C<isdecl> indicates that the variable given by this node
is being created at this point within the current lexical
scope.  Otherwise, the node refers to a pre-existing variable
(possibly from an outer scope).

=item lvalue([flag])

Accessor method for the C<lvalue> attribute, which indicates
whether this variable is being used in an lvalue context.

=item slurpy([flag])

Accessor method for the C<slurpy> attribute of parameter variables.
A true value of C<slurpy> indicates that the parameter variable
given by this node is to be created as a slurpy parameter (consuming
all remaining arguments passed in).  Named slurpy parameters are
indicated by having a non-empty C<named> attribute and a true value of
C<slurpy>.

=back

=head3 PAST::Op

C<PAST::Op> nodes represent the operations in an abstract
syntax tree.  The primary function of the node is given by
its C<pasttype> attribute, secondary functions may be indicated
by the node's C<name>, C<pirop>, or other attributes as given
below.

=over 4

=item pasttype([value])

Accessor method for the node's C<pasttype> attribute.  The
C<pasttype> is the primary indicator of the type of operation
to be performed, the operands are typically given as the
children of the node.  Defined values of C<pasttype> are:

=over 4

=item chain

A short-circuiting chain of operations.  In a sequence of nodes
with pasttype 'chain', the right operand of a node serves as
the left operand of its parent.  Each node is evaluated only
once, and the first false result short-circuits the chain.
In other words,  C<<  $x < $y < $z >>  is true only if
$x < $y and $y < $z, but $y only gets evaluated once.

=item copy

Copy the value of the node's second child into the variable
expression given by its first child.

=item bind

Bind the variable given by the node's first child to the
value given by its second child.

=item if

The first, second, and third children represent the
"condition", "then", and "else" parts of a conditional
expression.  The first child is evaluated; if the result
is true then the second child is evaluated and returned
as the result of the C<PAST::Op> node, otherwise the
third child is evaluated and returned as the result.
This implements the standard if-then-else logic needed
by most higher level languages, and can also be used
for implementing the ternary operator.

If the node is missing its second ("then") or third ("else")
child, then the result of the condition is used as the
corresponding result of the operation.  This makes it easy
to implement the "short-circuit and" operator commonly used in many
high level languages.  For example, the standard C<&&> operator
may be implemented using an "if" node, where the left operand is
the first (condition) child, the right operand is the
second (then) child, and the third child is left as null
or uninitialized.

It's also possible to build a "short-circuit or" (C<||>)
operator using this pasttype, by setting the left operand to
C<||> as the first child and the right operand as the I<third>
child (leaving the second child as null).  However, it's probably
simpler to use the "unless" type as described below.

=item unless

Same as C<if> above, except that the second child is evaluated
if the first child evaluates to false and the third child is
evaluated if the first child evaluates to true.

The C<unless> type can be used to implement "short-circuit or"
semantics; simply set the first child to the left operand and
the second child to the right operand, leaving the third
child empty or uninitialized.  If the first child evaluates to
true it is returned as the result of the operation, otherwise the
second child is evaluated and returned as the result.

=item while

Evaluate the first child (condition), if the result is true
then evaluate the second child (body) and repeat.

=item until

Evaluate the first child (condition), if the result is false then evaluate
the second child (body) and repeat.

=item repeat_while, repeat_until

Same as C<while> and C<until> above, except the second child is evaluated
before the conditional first child is evaluated for continuation of
the loop.

=item for

Iterate over the first child in groups of elements given by
C<arity> (default 1).  For each iteration, invoke the second
child, passing the elements as parameters.

=item call

Call the subroutine given by the C<name> attribute, passing
the results of any child nodes as arguments.  If the node
has no C<name> attribute, then the first child is assumed
to evaluate to a callable subroutine, and any remaining
children are used as arguments.

=item callmethod

Invoke the method given by C<name> on the first child,
passing the results of any child nodes as arguments.  If the
node has no C<name> attribute, then the first child is
evaluated as a method to be called, the second child is
the invocant, and any remaining children are arguments to
the method call.

=item pirop

Execute the PIR opcode given by the C<pirop> attribute.  See the
C<pirop> method below for details.  This is also the default
behavior when a C<pirop> attribute is set and C<pasttype> is
not.

=item inline

Execute the sequence of PIR statements given by the
node's C<inline> attribute (a string).  See the C<inline>
method below for details.

=item return

Generate a return exception, using the first child (if any) as
a return value.

=item try

Evaluates the first child, if any exceptions occur then they
are handled by the code given by the second child (if any).

=item xor

Evaluate the child nodes looking for exactly one true result
to be returned.  If two true results are encountered, the
operation immediately short-circuits and returns false.
Otherwise, after all children have been evaluated the result
of any true child is used as the result of the operation, or
the result of the last child if none of the children evaluated
to true.

=back

=item pirop([opcode])

Get/set the PIR opcode to be executed for this node.  Internally
the PAST and POST (Parrot Opcode Syntax Tree) implementations
understand the register types available for common PIR operations
and will handle any needed register or constant conversion of
operands automatically.  Note that except for the C<assign>
opcode, any destination is typically generated automatically
and should not be explicitly given as a child operand to the node.
The table of PIR opcodes that PAST "knows" about is given in
F<compilers/pct/src/PAST/Compiler.pir> .

=item lvalue([flag])

Get/set whether this node is an lvalue, or treats its first
child as an lvalue (e.g., for assignment).

=item inline([STRING code])

Get/set the code to be used for inline PIR when C<pasttype> is
set to "inline".  The C<code> argument is PIR text to be inserted in
the final generated code sequence.  Sequences of "%0", "%1",
"%2", ... "%9" in C<code> are replaced with the evaluated
results of the first, second, third, ..., tenth children nodes.
(If you need more than ten arguments to your inline PIR, consider
making it a subroutine call instead.)

The register to hold the result of the inline PIR operation is
given by "%r", "%t", or "%u" in the C<code> string:

  %r   - Generate a unique PMC register for the result.
  %t   - Generate a unique PMC register for the result,
         and initialize it with an object of type C<returns>
         {{Pm: or possibly C<viviself> }}
         before the execution of the inline PIR.
  %u   - Re-use the first child's PMC (%0) if it's a temporary
         result, otherwise same as %t above.

If none of %r, %t, or %u appear in C<code>, then the first
child's (%0) is used as the result of this operation.

=back

=head3 PAST::Block

C<PAST::Block> nodes represent lexical scopes within an abstract
syntax tree, and roughly translate to individual Parrot subroutines.
A C<PAST::Block> node nested within another C<PAST::Block> node
acts like a nested lexical scope.

If the block has a C<name> attribute, that becomes the name of
the resulting Parrot sub.  Otherwise, a unique name is automatically
generated for the block.

Each PAST::Block node can maintain its own local symbol table, see
the C<symbol> method below.

=over 4

=item blocktype([type])

Get/set the type of the block to C<type>.  The currently
understood values are 'declaration', 'immediate',  and 'method'.  
'Declaration' indicates that a block is simply being defined at 
this point, the result of the node is a reference to the block.  
A C<blocktype> of 'immediate' indicates a block that is to be immediately
executed when it is evaluated in the AST, and the result of the
last operation is used as the return value for the block.

=item closure([value])

Get/set the closure flag for the block to C<value>.  If the closure
flag on a (non-immediate) block is true, then the node returns a 
reference to a clone of the block that has captured the current
lexical context.

=item namespace([value])

Get/set the namespace for this particular block (and any nested
blocks that do not explicitly provide a namespace).  C<Value>
may either be a simple string or an array of strings representing
a nested namespace.

=item hll([value])

Get/set the HLL namespace for this block (and any nested blocks
that do not explicitly provide a C<hll>).

=item symbol(name, [attr1 => val1, attr2 => val2, ...])

Each PAST::Block node can use the C<symbol> method to maintain
its own compile-time notion of a local symbol table.  Each value
of C<name> is given its own hash to hold information about that
symbol for the block (i.e., the symbol table acts like a hash of
hashes indexed by symbol name).  If the C<symbol> method is called
with named arguments, then the method sets the entries in the hash
corresponding to C<name> in the block's symbol table.  If C<symbol>
is called with just a single C<name> argument, then the current hash
for local symbol C<name> is returned.

HLLs are free to place any values in the symbol hashes that
may be useful.  However, the C<scope> entry for a symbol is
typically used to provide the C<scope> attribute for any nested
C<PAST::Var> nodes that do not provide an explicit C<scope> attribute.

=item symbol_defaults([attr1 => val1, attr2 => val2, ...])

Sets default attributes for symbols that aren't explicitly
given by the C<symbol> method above.  For example, an abstract
syntax tree can use this method on a top-level C<PAST::Block>
to specify the C<scope> attribute to be used for all
variable nodes that don't otherwise provide one.

=item subid([value])

Get/set the unique subid to be used for this block.  If
no subid is explicitly set, a unique subid is randomly
generated for the block.

=item pirflags([value])

Get/set any PIR flags to be used when generating the block.

=item compiler([compiler_name])

Specify that the children nodes of this block are to be compiled
using C<compiler_name> instead of being treated as standard PAST
nodes.  This is useful when a program may contain components of
code written in other HLLs.  For example, the C<perl6> compiler
uses this feature to use PGE to compile pre-parsed Perl 6 regular
expressions, rather than trying to represent the semantics of
those expressions in PAST itself.

When code is generated from a C<PAST::Block> node having a
C<compiler> attribute, the compiler is invoked with C<name>
and C<outer> arguments so that any generated subs can have
names and lexical scoping appropriate to the block (it's the
responsibility of the external compiler to support this).

=back

=head3 PAST::CompUnit

C<PAST::CompUnit> nodes are used for the abstract representation
of compilation units.  Specific attributes and semantics are
yet to be determined.

=head2 References

None.

=cut

__END__
Local Variables:
  fill-column:78
End:
vim: expandtab shiftwidth=4: