Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > 1e007a96761035f261351a68e7601417 > files > 375

parrot-docs-3.6.0-2.fc15.noarch.rpm

# Copyright (C) 2001-2008, Parrot Foundation.

=head1 TITLE

PIR and Parrot Programming for Compiler Developers - Frequently Asked Questions

=head1 GENERAL QUESTIONS

=head2 What is Parrot?

Wrong FAQ, start with the Parrot FAQ first. Then come back here because this is
where the fun is.

The Parrot FAQ : http://www.parrotcode.org/faq/


=head2 What is IMC, PIR and IMCC?

IMC stands for Intermediate Code. IMCC stands for Intermediate Code Compiler.
Most of the time you will encounter the term PIR which is for Parrot Intermediate Representation
and means the same thing as IMC. PIR files use the extension C<.pir>.

PIR is an intermediate language that compiles either directly to Parrot Byte
code. It is a possible target language for compilers targeting
the Parrot Virtual Machine. PIR is halfway between
a High Level Language (HLL) and Parrot Assembly (PASM).

IMCC is the current implementation of the PIR language.

=head2 What is the history of IMCC?

IMCC was a toy compiler written by Melvin Smith as a little 2-week experiment
for another toy language, Cola. It was not originally a part of Parrot, and
understandably wasn't designed for public consumption. Parrot's early alpha
versions (0.0.6 and earlier) included only the raw Parrot assembler that
compiled Parrot Assembly language. This was considered the reference
assembler. The Cola compiler, on the other hand, targeted its own little back
end compiler that included a register allocator, basic block tracking and
medium level expression parsing. The backend compiler was eventually named
IMCC and benefited from contributions from Angel Faus, Leo Toetsch, Steve Fink
and Sean O'Rourke. The first version of Perl6 written by Sean used IMCC as its
backend and that's how it currently exists.

Leopold Toetsch added, among many other things, the ability for IMCC to
compile PASM by proxying any instructions that were not valid IMCC through to
be assembled as PASM. This was a great improvement. As Parrot's calling
convention changed to a continuation style (PCC), and generally became more
complex, the PASM instructions required to call or declare subroutines became
just as complex. IMCC abstracted some of the convention and eventually the
core team stopped using the old reference assembler altogether. Leo integrated
IMCC into Parrot and now IMCC is B<the> front-end for the Parrot VM.

=head2 Parrot is a VM, why does it need IMCC builtin?

Static languages, such as Java, can run on VMs that are dedicated to execution
of pre-compiled bytecode with no problems. Languages such as Perl, Ruby and
Python are not so static. They have support for runtime evaluation and
compilation and their parsers are always available. These languages run on
their own "dynamic" interpreters.

Since Parrot is specialized to be a dynamic VM, it must be able to compile
code on the fly. For this reason, IMCC is written in C and integrated into
the VM. IMCC is fast since it does very little type checking, and since most
of Parrot's ops are polymorphic, IMCC punts most of the type checking and
method dispatch to runtime. This allows extremely fast compile times, which
is what scripters need.

=head2 How Is PIR different than Parrot Assembly language?

PASM is an assembly language, raw and low-level. PASM does exactly what you
say, and each PASM instruction represents a single VM opcode.  Assembly
language can be tough to debug, simply due to the amount of instructions that
a high-level compiler generates for a given construct. Assembly language
typically has no concept of basic blocks, namespaces, variable tracking, etc.
You must track your register usage and take care of saving/restoring values
in cases where you run out of registers. This is called spilling.

PIR is medium level and a bit more friendly to write or debug. IMCC also has
a builtin register allocator and spiller. PIR has the concept of a
"subroutine" unit, complete with local variables and high-level sub call
syntax. PIR also allows unlimited symbolic registers. It will take care of
assigning the appropriate register to your variables and will usually find the
most efficient mapping so as to use as few registers as possible for a given
piece of code. If you use more registers than are currently available, IMCC
will generate instructions to save/restore (spill) the registers for you.
This is a significant piece of every compiler.

While it is possible to write more efficient code by hand directly in PASM,
it is rare. PIR is still very close to PASM as far as granularity. It is also
common for IMCC to generate instructions that use less registers than
handwritten PASM. This is good for cache performance.

=head2 Why should I target PIR instead of PASM?

Several reasons. PIR is so much easier to read, understand and debug. When
passing snippets back and forth on the Parrot internals list, IMC is preferred
since the code is much shorter than the equivalent PASM. In some cases it is
necessary to debug the PASM code as bugs in IMCC are found.

Hand writing and debugging of code aside, most PIR code will be mostly
compiler generated. In this respect, the most important technical reason to
use PIR is the amount of abstraction it provides. PIR now completely hides
the Parrot calling conventions. This allows Parrot to change somewhat without
impacting existing compilers. The workload is balanced between the IMCC
team and the compiler authors. The term "modular" springs to mind.

Since development on the old assembler has stopped, IMCC will be the best way
to compile bytecode classes complete with metadata and externally linkable
symbols. It will still be possible to construct classes on the fly with PASM,
but PIR's higher level directives allow it to do compile time construction of
certain things and pack them into the bytecode in a way that does not have an
equivalent set of Parrot instructions. The PASM assembler may or may not ever
catch up with these features.

=head2 Shouldn't I rather target PAST?

Yes, preferably using the PCT, the Parrot Compiler Toolkit.

=head2 Can I use IMCC without Parrot?

Not yet. IMCC is currently tightly integrated to the Parrot bytecode format.
An old idea is to rework IMCC's modularity to make it easy to run separately, but
this is not a top priority since IMCC currently only targets Parrot.
Eventually IMCC will contain a config option to build without linking the
Parrot VM, but IMCC must be able to do lookups of opcodes so it will require
some sort of static opcode metadata.

=head1 PIR PROGRAMMING 101

=head2 Hello world?

The basic block of execution of an IMC program is the subroutine. Subs can be
simple, with no arguments or returns. Line comments are allowed in IMC using #.

=begin PIR

	# Hello world
	.sub main :main
	  print "Hello world.\n"
	.end

=end PIR

=head2 Tutorial

For more examples see the PIR tutorial in F<examples/tutorial>.

=head2 How do I compile and run a PIR module?

Parrot uses the filename extension to detect whether the file is a PIR file
(.pir), a Parrot Assembly file (.pasm) or a pre-compiled
bytecode file (.pbc).

	parrot hello.pir


=head2 How do I see the assembly code that PIR generates?

Use the -o option for Parrot. You can provide an output filename, or the -
character which indicates standard output. If the filename has a .pbc
extension, IMCC will compile the module and assemble it to bytecode.

Beware that compiling to PASM is not well supported and might produce broken PASM.

Examples:

=over 4

=item Create the PASM source from PIR.

	parrot -o hello.pasm hello.pir

=item Compile to bytecode from PIR.

	parrot -o hello.pbc hello.pir

=item Dump PASM to screen

	parrot -o - hello.pir

=back

=head2 Does IMCC do variable interpolation in strings?

No, and it shouldn't. PIR is an intermediate language for compiling high level
languages. Interpolation (print "$count items") is a high level concept and
the specifics are unique to each language. Perl6 already does interpolation
without special support from IMCC.

=head2 What are PIR variables?

PIR has 2 classes of variables, symbolic registers and named variables. Both
are mapped to real registers, but there are a few minor differences. Named
variables must be declared. They may be global or local, and may be qualified
by a namespace. Symbolic registers, on the other hand, do not need declaration,
but their scope never extends outside of a subroutine unit. Symbolic registers
basically give compiler front ends an easy way to generate code from their
parse trees or abstract syntax tree (AST). To generate expressions compilers
have to create temporaries.

=head2 Symbolic Registers (or Temporaries)

Symbolic registers have a $ sign for the first character, have a single letter
representing the register type [S(tring), N(umber), I(nteger) or P(MC)] for
the second character, and one or more digits for the rest.

Example:

=begin PIR_FRAGMENT

	$S1 = "hiya"
	$S2 = $S1 . "mel"
	$I1 = 1 + 2
	$I2 = $I1 * 3

=end PIR_FRAGMENT

This example uses symbolic STRING and INTVAL registers as temporaries. This is
the typical sort of code that compilers generate from the syntax tree.

=head2 Named Variables

Named variables are either local or namespace qualified. Currently
IMCC only supports locals transparently. However, globals are supported with
explicit syntax. The way to declare locals in a subroutine is with the
B<.local> directive. The B<.local> directive also requires a type (B<int>,
B<num>, B<string> or B<pmc>).

Example:

=begin PIR

	.sub _main :main
	   .local int i
	   .local num n
	   i = 7
	   n = 5.003
	.end

=end PIR

=head2 How do I declare global or package variables in PIR?

You can't. You can explicitly create global variables at runtime, however,
but it only works for PMC types, like so:

=begin PIR

	.sub _main :main
	   .local pmc i
	   .local pmc j
	   i = new 'Integer'
	   i = 123
	   # Create the global
	   set_global "i", i

	   # Refer to the global
	   j = get_global "i"
	.end

=end PIR

Happy Hacking.

=cut