Sophie: parrot-docs-3.6.0-2.fc15 noarch

parrot-docs-3.6.0-2.fc15.noarch.rpm

# Copyright (C) 2001-2009, Parrot Foundation.

=head0 Introduction to Parrot

=head1 Welcome to Parrot

This document provides a gentle introduction to the Parrot virtual machine for
anyone considering writing code for Parrot by hand, writing a compiler that
targets Parrot, getting involved with Parrot development or simply wondering
what on earth Parrot is.

=head1 What is Parrot?

=head2 Virtual Machines

Parrot is a virtual machine. To understand what a virtual machine is, consider
what happens when you write a program in a language such as Perl, then run it
with the applicable interpreter (in the case of Perl, the perl executable).
First, the program you have written in a high level language is turned into
simple instructions, for example I<fetch the value of the variable named x>,
I<add 2 to this value>, I<store this value in the variable named y>, etc. A
single line of code in a high level language may be converted into tens of
these simple instructions. This stage is called I<compilation>.

The second stage involves executing these simple instructions. Some languages
(for example, C) are often compiled to instructions that are understood by the
CPU and as such can be executed by the hardware. Other languages, such as Perl,
Python and Java, are usually compiled to CPU-independent instructions.  A
I<virtual machine> (sometimes known as an I<interpreter>) is required to
execute those instructions.

While the central role of a virtual machine is to efficiently execute
instructions, it also performs a number of other functions. One of these is to
abstract away the details of the hardware and operating system that a program
is running on. Once a program has been compiled to run on a virtual machine, it
will run on any platform that the VM has been implemented on. VMs may also
provide security by allowing more fine-grained limitations to be placed on a
program, memory management functionality and support for high level language
features (such as objects, data structures, types, subroutines, etc).

=head2 Design goals

Parrot is designed with the needs of dynamically typed languages (such as Perl
and Python) in mind, and should be able to run programs written in these
languages more efficiently than VMs developed with static languages in mind
(JVM, .NET). Parrot is also designed to provide interoperability between
languages that compile to it. In theory, you will be able to write a class in
Perl, subclass it in Python and then instantiate and use that subclass in a Tcl
program.

Historically, Parrot started out as the runtime for Perl 6. Unlike Perl 5, the
Perl 6 compiler and runtime (VM) are to be much more clearly separated. The
name I<Parrot> was chosen after the 2001 April Fool's Joke which had Perl and
Python collaborating on the next version of their languages. The name reflects
the intention to build a VM to run not just Perl 6, but also many other
languages.


=head1 Parrot concepts and jargon

=head2 Instruction formats

Parrot can currently accept instructions to execute in four forms. PIR (Parrot
Intermediate Representation) is designed to be written by people and generated
by compilers. It hides away some low-level details, such as the way parameters
are passed to functions. PASM (Parrot Assembly) is a level below PIR - it is
still human readable/writable and can be generated by a compiler, but the
author has to take care of details such as calling conventions and register
allocation. PAST (Parrot Abstract Syntax Tree) enables Parrot to accept an
abstract syntax tree style input - useful for those writing compilers.

All of the above forms of input are automatically converted inside Parrot to
PBC (Parrot Bytecode). This is much like machine code, but understood by the
Parrot interpreter. It is not intended to be human-readable or human-writable,
but unlike the other forms execution can start immediately, without the need
for an assembly phase. Parrot bytecode is platform independent.

=head2 The instruction set

The Parrot instruction set includes arithmetic and logical operators, compare
and branch/jump (for implementing loops, if...then constructs, etc), finding
and storing global and lexical variables, working with classes and objects,
calling subroutines and methods along with their parameters, I/O, threads and
more.

=head2 Registers and fundamental data types

The Parrot VM is register based. This means that, like a hardware CPU, it has a
number of fast-access units of storage called registers. There are 4 types of
register in Parrot: integers (I), numbers (N), strings (S) and PMCs (P). There
are N of each of these, named I0,I1,..N0.., etc. Integer registers are the
same size as a word on the machine Parrot is running on and number registers
also map to a native floating point type.
The amount of registers needed is determined per subroutine at compile-time.

=head2 PMCs

PMC stands for Polymorphic Container. PMCs represent any complex data structure
or type, including aggregate data types (arrays, hash tables, etc). A PMC can
implement its own behavior for arithmetic, logical and string operations
performed on it, allowing for language-specific behavior to be introduced. PMCs
can be built in to the Parrot executable or dynamically loaded when they are
needed.

=head2 Garbage Collection

Parrot provides garbage collection, meaning that Parrot programs do not need
to free memory explicitly; it will be freed when it is no longer in use (that
is, no longer referenced) whenever the garbage collector runs.


=head1 Obtaining, building and testing Parrot

=head2 Where to get Parrot

See L<http://www.parrot.org/download> for several ways to get a recent
version of parrot.

=head2 Building Parrot

The first step to building Parrot is to run the F<Configure.pl> program, which
looks at your platform and decides how Parrot should be built. This is done by
typing:

  perl Configure.pl

Once this is complete, run the C<make> program C<Configure.pl> prompts you
with. When this completes, you will have a working C<parrot> executable.

Please report any problems that you encounter while building Parrot so the
developers can fix them. You can do this by creating a login and opening
a new ticket at L<https://trac.parrot.org>.  Please include the F<myconfig>
file that was generated as part of the build process and any errors that you
observed.

=head2 The Parrot test suite

Parrot has an extensive regression test suite. This can be run by typing:

  make test

Substituting make for the name of the make program on your platform. The output
will look something like this:

 C:\Perl\bin\perl.exe t\harness --gc-debug 
   t\library\*.t  t\op\*.t  t\pmc\*.t  t\run\*.t  t\native_pbc\*.t
   imcc\t\*\*.t  t\dynpmc\*.t  t\p6rules\*.t t\src\*.t t\perl\*.t
 t\library\dumper...............ok
 t\library\getopt_long..........ok
 ...
 All tests successful, 4 test and 71 subtests skipped.
 Files=163, Tests=2719, 192 wallclock secs ( 0.00 cusr +  0.00 csys =  0.00 CPU)

It is possible that a number of tests may fail. If this is a small number, then
it is probably little to worry about, especially if you have the latest Parrot
sources from the Git repository. However, please do not let this discourage you
from reporting test failures, using the same method as described for reporting
build problems.


=head1 Some simple Parrot programs

=head2 Hello world!

Create a file called F<hello.pir> that contains the following code.

=begin PIR

  .sub main
      say "Hello world!"
  .end

=end PIR

Then run it by typing:

  parrot hello.pir

As expected, this will display the text C<Hello world!> on the console,
followed by a new line. 

Let's take the program apart. C<.sub main> states that the instructions that
follow make up a subroutine named C<main>, until a C<.end> is encountered. The
second line contains the C<print> instruction. In this case, we are calling the
variant of the instruction that accepts a constant string. The assembler takes
care of deciding which variant of the instruction to use for us.

=head2 Using registers

We can modify hello.pir to first store the string C<Hello world!> in a
register and then use that register with the print instruction.

=begin PIR

  .sub main
      $S0 = "Hello world!"
      say $S0
  .end

=end PIR

PIR does not allow us to set a register directly. We need to prefix the
register name with C<$> when referring to a register. The compiler will map $S0
to one of the available string registers, for example S0, and set the value.
This example also uses the syntactic sugar provided by the C<=> operator.  C<=>
is simply a more readable way of using the C<set> opcode.  

To make PIR even more readable, named registers can be used. These are later
mapped to real numbered registers.

=begin PIR

  .sub main
      .local string hello
      hello = "Hello world!"
      say hello
  .end

=end PIR

The C<.local> directive indicates that the named register is only needed inside
the current subroutine (that is, between C<.sub> and C<.end>). Following
C<.local> is a type. This can be C<int> (for I registers), C<float> (for N
registers), C<string> (for S registers), C<pmc> (for P registers) or the name
of a PMC type.

=head2 PIR vs. PASM

PASM does not handle register allocation or provide support for named
registers.  It also does not have the C<.sub> and C<.end> directives, instead
replacing them with a label at the start of the instructions.

=head2 Summing squares

This example introduces some more instructions and PIR syntax. Lines starting
with a C<#> are comments.

=begin PIR

  .sub main
      # State the number of squares to sum.
      .local int maxnum
      maxnum = 10

      # We'll use some named registers. Note that we can declare many
      # registers of the same type on one line.
      .local int i, total, temp
      total = 0

      # Loop to do the sum.
      i = 1
  loop:
      temp = i * i
      total += temp
      inc i
      if i <= maxnum goto loop

      # Output result.
      print "The sum of the first "
      print maxnum
      print " squares is "
      print total
      print ".\n"
  .end

=end PIR

PIR provides a bit of syntactic sugar that makes it look more high level than
assembly. For example:

=begin PIR_FRAGMENT

  .local pmc temp, i
  temp = i * i

=end PIR_FRAGMENT

Is just another way of writing the more assembly-ish:

=begin PIR_FRAGMENT

  .local pmc temp, i
  mul temp, i, i

=end PIR_FRAGMENT

And:

=begin PIR_FRAGMENT

  .local pmc i, maxnum
  if i <= maxnum goto loop
  # ...
  loop:

=end PIR_FRAGMENT

Is the same as:

=begin PIR_FRAGMENT

  .local pmc i, maxnum
  le i, maxnum, loop
  # ...
  loop:

=end PIR_FRAGMENT

And:

=begin PIR_FRAGMENT

  .local pmc temp, total
  total += temp

=end PIR_FRAGMENT

Is the same as:

=begin PIR_FRAGMENT

  .local pmc  temp, total
  add total, temp

=end PIR_FRAGMENT

As a rule, whenever a Parrot instruction modifies the contents of a register,
that will be the first register when writing the instruction in assembly form.

As is usual in assembly languages, loops and selection are implemented in terms
of conditional branch statements and labels, as shown above. Assembly
programming is one place where using goto is not bad form!

=head2 Recursively computing factorial

In this example we define a factorial function and recursively call it to
compute factorial.

=begin PIR

  .sub factorial
      # Get input parameter.
      .param int n

      # return (n > 1 ? n * factorial(n - 1) : 1)
      .local int result

      if n > 1 goto recurse
      result = 1
      goto return

  recurse:
      $I0 = n - 1
      result = factorial($I0)
      result *= n

  return:
      .return (result)
  .end


  .sub main :main
      .local int f, i

      # We'll do factorial 0 to 10.
      i = 0
  loop:
      f = factorial(i)

      print "Factorial of "
      print i
      print " is "
      print f
      print ".\n"

      inc i
      if i <= 10 goto loop
  .end

=end PIR

The first line, C<.param int n>, specifies that this subroutine takes one
integer parameter and that we'd like to refer to the register it was passed in
by the name C<n> for the rest of the sub.

Much of what follows has been seen in previous examples, apart from the line
reading:

=begin PIR_FRAGMENT

  .local int result
  result = factorial($I0)

=end PIR_FRAGMENT

The last line of PIR actually represents a few lines of PASM. The assembler
builds a PMC that describes the signature, including which register the
arguments are held in. A similar process happens for providing the registers
that the return values should be placed in. Finally, the C<factorial> sub is 
invoked.

Right before the C<.end> of the C<factorial> sub, a C<.return> directive is
used to specify that the value held in the register named C<result> is to be
copied to the register that the caller is expecting the return value in.

The call to C<factorial> in main works in just the same was as the recursive
call to C<factorial> within the sub C<factorial> itself. The only remaining
bit of new syntax is the C<:main>, written after C<.sub main>. By default,
PIR assumes that execution begins with the first sub in the file. This 
behavior can be changed by marking the sub to start in with C<:main>.

=head2 Compiling to PBC

To compile PIR to bytecode, use the C<-o> flag and specify an output file with
the extension F<.pbc>.

  parrot -o factorial.pbc factorial.pir

=head1 Where next?

=head2 Documentation

What documentation you read next depends upon what you are looking to do with
Parrot. The opcodes reference and built-in PMCs reference are useful to dip
into for pretty much everyone. If you intend to write or compile to PIR then
there are a number of documents about PIR that are worth a read. For compiler
writers, the Compiler FAQ is essential reading. If you want to get involved
with Parrot development, the PDDs (Parrot Design Documents) contain some
details of the internals of Parrot; a few other documents fill in the gaps. One
way of helping Parrot development is to write tests, and there is a document
entitled I<Testing Parrot> that will help with this.

=head2 The Parrot Mailing List

Much Parrot development and discussion takes place on the
parrot-dev mailing list. You can subscribe by filling out the form at
L<http://lists.parrot.org/mailman/listinfo/parrot-dev> or read the NNTP
archive at L<http://groups.google.com/group/parrot-dev/>.

=head2 IRC

The Parrot IRC channel is hosted on irc.parrot.org and is named C<#parrot>.
Alternative IRC servers are at irc.pobox.com and irc.rhizomatic.net.

=cut