Sophie: parrot-docs-3.6.0-2.fc15 noarch

parrot-docs-3.6.0-2.fc15.noarch.rpm

\documentclass[11pt,a4paper,oneside]{report}
\usepackage{graphics,graphicx}
\usepackage{colortbl}
%
%\setcounter{secnumdepth}{2}
%\setcounter{tocdepth}{2}

\begin{document}
\tableofcontents
\chapter{Introduction to Parrot}

\section*{Welcome to Parrot}

This document provides a gentle introduction to the Parrot virtual machine for anyone considering writing code for Parrot by hand, writing a compiler that targets Parrot, getting involved with Parrot development or simply wondering what on earth Parrot is.

\section*{What is Parrot?}

\subsection*{Virtual Machines}

Parrot is a virtual machine. To understand what a virtual machine is, consider what happens when you write a program in a language such as Perl, then run it with the applicable interpreter (in the case of Perl, the perl executable). First, the program you have written in a high level language is turned into simple instructions, for example \emph{fetch the value of the variable named x}, \emph{add 2 to this value}, \emph{store this value in the variable named y}, etc. A single line of code in a high level language may be converted into tens of these simple instructions. This stage is called \emph{compilation}.

The second stage involves executing these simple instructions. Some languages (for example, C) are often compiled to instructions that are understood by the CPU and as such can be executed by the hardware. Other languages, such as Perl, Python and Java, are usually compiled to CPU-independent instructions. A \emph{virtual machine} (sometimes known as an \emph{interpreter}) is required to execute those instructions.

While the central role of a virtual machine is to ef\mbox{}ficiently execute instructions, it also performs a number of other functions. One of these is to abstract away the details of the hardware and operating system that a program is running on. Once a program has been compiled to run on a virtual machine, it will run on any platform that the VM has been implemented on. VMs may also provide security by allowing more f\mbox{}ine-grained limitations to be placed on a program, memory management functionality and support for high level language features (such as objects, data structures, types, subroutines, etc).

\subsection*{Design goals}

Parrot is designed with the needs of dynamically typed languages (such as Perl and Python) in mind, and should be able to run programs written in these languages more ef\mbox{}ficiently than VMs developed with static languages in mind (JVM, .NET). Parrot is also designed to provide interoperability between languages that compile to it. In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.

Historically, Parrot started out as the runtime for Perl 6. Unlike Perl 5, the Perl 6 compiler and runtime (VM) are to be much more clearly separated. The name \emph{Parrot} was chosen after the 2001 April Fool's Joke which had Perl and Python collaborating on the next version of their languages. The name ref\mbox{}lects the intention to build a VM to run not just Perl 6, but also many other languages.

\section*{Parrot concepts and jargon}

\subsection*{Instruction formats}

Parrot can currently accept instructions to execute in four forms. PIR (Parrot Intermediate Representation) is designed to be written by people and generated by compilers. It hides away some low-level details, such as the way parameters are passed to functions. PASM (Parrot Assembly) is a level below PIR - it is still human readable/writable and can be generated by a compiler, but the author has to take care of details such as calling conventions and register allocation. PAST (Parrot Abstract Syntax Tree) enables Parrot to accept an abstract syntax tree style input - useful for those writing compilers.

All of the above forms of input are automatically converted inside Parrot to PBC (Parrot Bytecode). This is much like machine code, but understood by the Parrot interpreter. It is not intended to be human-readable or human-writable, but unlike the other forms execution can start immediately, without the need for an assembly phase. Parrot bytecode is platform independent.

\subsection*{The instruction set}

The Parrot instruction set includes arithmetic and logical operators, compare and branch/jump (for implementing loops, if\ldots then constructs, etc), f\mbox{}inding and storing global and lexical variables, working with classes and objects, calling subroutines and methods along with their parameters, I/O, threads and more.

\subsection*{Registers and fundamental data types}

The Parrot VM is register based. This means that, like a hardware CPU, it has a number of fast-access units of storage called registers. There are 4 types of register in Parrot: integers (I), numbers (N), strings (S) and PMCs (P). There are N of each of these, named I0,I1,..N0.., etc. Integer registers are the same size as a word on the machine Parrot is running on and number registers also map to a native f\mbox{}loating point type. The amount of registers needed is determined per subroutine at compile-time.

\subsection*{PMCs}

PMC stands for Polymorphic Container. PMCs represent any complex data structure or type, including aggregate data types (arrays, hash tables, etc). A PMC can implement its own behavior for arithmetic, logical and string operations performed on it, allowing for language-specif\mbox{}ic behavior to be introduced. PMCs can be built in to the Parrot executable or dynamically loaded when they are needed.

\subsection*{Garbage Collection}

Parrot provides garbage collection, meaning that Parrot programs do not need to free memory explicitly; it will be freed when it is no longer in use (that is, no longer referenced) whenever the garbage collector runs.

\section*{Obtaining, building and testing Parrot}

\subsection*{Where to get Parrot}

See http://www.parrot.org/download for several ways to get a recent version of parrot.

\subsection*{Building Parrot}

The f\mbox{}irst step to building Parrot is to run the \emph{Conf\mbox{}igure.pl} program, which looks at your platform and decides how Parrot should be built. This is done by typing:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  perl Configure.pl\end{verbatim}
\vspace{-6pt}
\normalsize
Once this is complete, run the \texttt{make} program \texttt{Conf\mbox{}igure.pl} prompts you with. When this completes, you will have a working \texttt{parrot} executable.

Please report any problems that you encounter while building Parrot so the developers can f\mbox{}ix them. You can do this by creating a login and opening a new ticket at https://trac.parrot.org. Please include the \emph{myconf\mbox{}ig} f\mbox{}ile that was generated as part of the build process and any errors that you observed.

\subsection*{The Parrot test suite}

Parrot has an extensive regression test suite. This can be run by typing:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  make test\end{verbatim}
\vspace{-6pt}
\normalsize
Substituting make for the name of the make program on your platform. The output will look something like this:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 C:\Perl\bin\perl.exe t\harness --gc-debug 
   t\library\*.t  t\op\*.t  t\pmc\*.t  t\run\*.t  t\native_pbc\*.t
   imcc\t\*\*.t  t\dynpmc\*.t  t\p6rules\*.t t\src\*.t t\perl\*.t
 t\library\dumper...............ok
 t\library\getopt_long..........ok
 ...
 All tests successful, 4 test and 71 subtests skipped.
 Files=163, Tests=2719, 192 wallclock secs ( 0.00 cusr +  0.00 csys =  0.00 CPU)\end{verbatim}
\vspace{-6pt}
\normalsize
It is possible that a number of tests may fail. If this is a small number, then it is probably little to worry about, especially if you have the latest Parrot sources from the Git repository. However, please do not let this discourage you from reporting test failures, using the same method as described for reporting build problems.

\section*{Some simple Parrot programs}

\subsection*{Hello world!}

Create a f\mbox{}ile called \emph{hello.pir} that contains the following code.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub main
      say "Hello world!"
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Then run it by typing:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  parrot hello.pir\end{verbatim}
\vspace{-6pt}
\normalsize
As expected, this will display the text \texttt{Hello world!} on the console, followed by a new line.

Let's take the program apart. \texttt{.sub main} states that the instructions that follow make up a subroutine named \texttt{main}, until a \texttt{.end} is encountered. The second line contains the \texttt{print} instruction. In this case, we are calling the variant of the instruction that accepts a constant string. The assembler takes care of deciding which variant of the instruction to use for us.

\subsection*{Using registers}

We can modify hello.pir to f\mbox{}irst store the string \texttt{Hello world!} in a register and then use that register with the print instruction.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub main
      $S0 = "Hello world!"
      say $S0
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
PIR does not allow us to set a register directly. We need to pref\mbox{}ix the register name with \texttt{\$} when referring to a register. The compiler will map \$S0 to one of the available string registers, for example S0, and set the value. This example also uses the syntactic sugar provided by the \texttt{=} operator. \texttt{=} is simply a more readable way of using the \texttt{set} opcode.

To make PIR even more readable, named registers can be used. These are later mapped to real numbered registers.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub main
      .local string hello
      hello = "Hello world!"
      say hello
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{.local} directive indicates that the named register is only needed inside the current subroutine (that is, between \texttt{.sub} and \texttt{.end}). Following \texttt{.local} is a type. This can be \texttt{int} (for I registers), \texttt{f\mbox{}loat} (for N registers), \texttt{string} (for S registers), \texttt{pmc} (for P registers) or the name of a PMC type.

\subsection*{PIR vs. PASM}

PASM does not handle register allocation or provide support for named registers. It also does not have the \texttt{.sub} and \texttt{.end} directives, instead replacing them with a label at the start of the instructions.

\subsection*{Summing squares}

This example introduces some more instructions and PIR syntax. Lines starting with a \texttt{\#} are comments.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub main
      # State the number of squares to sum.
      .local int maxnum
      maxnum = 10

      # We'll use some named registers. Note that we can declare many
      # registers of the same type on one line.
      .local int i, total, temp
      total = 0

      # Loop to do the sum.
      i = 1
  loop:
      temp = i * i
      total += temp
      inc i
      if i <= maxnum goto loop

      # Output result.
      print "The sum of the first "
      print maxnum
      print " squares is "
      print total
      print ".\n"
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
PIR provides a bit of syntactic sugar that makes it look more high level than assembly. For example:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc temp, i
  temp = i * i\end{verbatim}
\vspace{-6pt}
\normalsize
Is just another way of writing the more assembly-ish:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc temp, i
  mul temp, i, i\end{verbatim}
\vspace{-6pt}
\normalsize
And:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc i, maxnum
  if i <= maxnum goto loop
  # ...
  loop:\end{verbatim}
\vspace{-6pt}
\normalsize
Is the same as:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc i, maxnum
  le i, maxnum, loop
  # ...
  loop:\end{verbatim}
\vspace{-6pt}
\normalsize
And:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc temp, total
  total += temp\end{verbatim}
\vspace{-6pt}
\normalsize
Is the same as:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc  temp, total
  add total, temp\end{verbatim}
\vspace{-6pt}
\normalsize
As a rule, whenever a Parrot instruction modif\mbox{}ies the contents of a register, that will be the f\mbox{}irst register when writing the instruction in assembly form.

As is usual in assembly languages, loops and selection are implemented in terms of conditional branch statements and labels, as shown above. Assembly programming is one place where using goto is not bad form!

\subsection*{Recursively computing factorial}

In this example we def\mbox{}ine a factorial function and recursively call it to compute factorial.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub factorial
      # Get input parameter.
      .param int n

      # return (n > 1 ? n * factorial(n - 1) : 1)
      .local int result

      if n > 1 goto recurse
      result = 1
      goto return

  recurse:
      $I0 = n - 1
      result = factorial($I0)
      result *= n

  return:
      .return (result)
  .end


  .sub main :main
      .local int f, i

      # We'll do factorial 0 to 10.
      i = 0
  loop:
      f = factorial(i)

      print "Factorial of "
      print i
      print " is "
      print f
      print ".\n"

      inc i
      if i <= 10 goto loop
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The f\mbox{}irst line, \texttt{.param int n}, specif\mbox{}ies that this subroutine takes one integer parameter and that we'd like to refer to the register it was passed in by the name \texttt{n} for the rest of the sub.

Much of what follows has been seen in previous examples, apart from the line reading:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local int result
  result = factorial($I0)\end{verbatim}
\vspace{-6pt}
\normalsize
The last line of PIR actually represents a few lines of PASM. The assembler builds a PMC that describes the signature, including which register the arguments are held in. A similar process happens for providing the registers that the return values should be placed in. Finally, the \texttt{factorial} sub is invoked.

Right before the \texttt{.end} of the \texttt{factorial} sub, a \texttt{.return} directive is used to specify that the value held in the register named \texttt{result} is to be copied to the register that the caller is expecting the return value in.

The call to \texttt{factorial} in main works in just the same was as the recursive call to \texttt{factorial} within the sub \texttt{factorial} itself. The only remaining bit of new syntax is the \texttt{:main}, written after \texttt{.sub main}. By default, PIR assumes that execution begins with the f\mbox{}irst sub in the f\mbox{}ile. This behavior can be changed by marking the sub to start in with \texttt{:main}.

\subsection*{Compiling to PBC}

To compile PIR to bytecode, use the \texttt{-o} f\mbox{}lag and specify an output f\mbox{}ile with the extension \emph{.pbc}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  parrot -o factorial.pbc factorial.pir\end{verbatim}
\vspace{-6pt}
\normalsize
\section*{Where next?}

\subsection*{Documentation}

What documentation you read next depends upon what you are looking to do with Parrot. The opcodes reference and built-in PMCs reference are useful to dip into for pretty much everyone. If you intend to write or compile to PIR then there are a number of documents about PIR that are worth a read. For compiler writers, the Compiler FAQ is essential reading. If you want to get involved with Parrot development, the PDDs (Parrot Design Documents) contain some details of the internals of Parrot; a few other documents f\mbox{}ill in the gaps. One way of helping Parrot development is to write tests, and there is a document entitled \emph{Testing Parrot} that will help with this.

\subsection*{The Parrot Mailing List}

Much Parrot development and discussion takes place on the parrot-dev mailing list. You can subscribe by f\mbox{}illing out the form at http://lists.parrot.org/mailman/listinfo/parrot-dev or read the NNTP archive at http://groups.google.com/group/parrot-dev/.

\subsection*{IRC}

The Parrot IRC channel is hosted on irc.parrot.org and is named \texttt{\#parrot}. Alternative IRC servers are at irc.pobox.com and irc.rhizomatic.net.

\chapter{Overview}

\section*{The Parrot Interpreter}

This document is an introduction to the structure of and the concepts used by the Parrot shared bytecode compiler/interpreter system. We will primarily concern ourselves with the interpreter, since this is the target platform for which all compiler frontends should compile their code.

\section*{The Software CPU}

Like all interpreter systems of its kind, the Parrot interpreter is a virtual machine; this is another way of saying that it is a software CPU. However, unlike other VMs, the Parrot interpreter is designed to more closely mirror hardware CPUs.

For instance, the Parrot VM will have a register architecture, rather than a stack architecture. It will also have extremely low-level operations, more similar to Java's than the medium-level ops of Perl and Python and the like.

The reasoning for this decision is primarily that by resembling the underlying hardware to some extent, it's possible to compile down Parrot bytecode to ef\mbox{}ficient native machine language.

Moreover, many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can signif\mbox{}icantly reduce the amount of operations and CPU time.

To be more specif\mbox{}ic about the software CPU, it will contain a large number of registers. The current design provides for four groups of N registers; each group will hold a dif\mbox{}ferent data type: integers, f\mbox{}loating-point numbers, strings, and PMCs. (Polymorphic Containers, detailed below.)

Registers will be stored in register frames, which can be pushed and popped onto the register stack. For instance, a subroutine or a block might need its own register frame.

\section*{The Operations}

The Parrot interpreter has a large number of very low level instructions, and it is expected that high-level languages will compile down to a medium-level language before outputting pure Parrot machine code.

Operations will be represented by several bytes of Parrot machine code; the f\mbox{}irst \texttt{INTVAL} will specify the operation number, and the remaining arguments will be operator-specif\mbox{}ic. Operations will usually be targeted at a specif\mbox{}ic data type and register type; so, for instance, the \texttt{dec\_i\_c} takes two \texttt{INTVAL}s as arguments, and decrements contents of the integer register designated by the f\mbox{}irst \texttt{INTVAL} by the value in the second \texttt{INTVAL}. Naturally, operations which act on \texttt{FLOATVAL} registers will use \texttt{FLOATVAL}s for constants; however, since the f\mbox{}irst argument is almost always a register \textbf{number} rather than actual data, even operations on string and PMC registers will take an \texttt{INTVAL} as the f\mbox{}irst argument.

As in Perl, Parrot ops will return the pointer to the next operation in the bytecode stream. Although ops will have a predetermined number and size of arguments, it's cheaper to have the individual ops skip over their arguments returning the next operation, rather than looking up in a table the number of bytes to skip over for a given opcode.

There will be global and private opcode tables; that is to say, an area of the bytecode can def\mbox{}ine a set of custom operations that it will use. These areas will roughly map to the subroutines of the original source; each precompiled module will have its own opcode table.

For a closer look at Parrot ops, see \emph{docs/pdds/pdd06\_pasm.pod}.

\section*{PMCs}

PMCs are roughly equivalent to the \texttt{SV}, \texttt{AV} and \texttt{HV} (and more complex types) def\mbox{}ined in Perl 5, and almost exactly equivalent to \texttt{PythonObject} types in Python. They are a completely abstracted data type; they may be string, integer, code or anything else. As we will see shortly, they can be expected to behave in certain ways when instructed to perform certain operations - such as incrementing by one, converting their value to an integer, and so on.

The fact of their abstraction allows us to treat PMCs as, roughly speaking, a standard API for dealing with data. If we're executing Perl code, we can manufacture PMCs that behave like Perl scalars, and the operations we perform on them will do Perlish things; if we execute Python code, we can manufacture PMCs with Python operations, and the same underlying bytecode will now perform Pythonic activities.

For documentation on the specif\mbox{}ic PMCs that ship with Parrot, see the \emph{docs/pmc} directory.

\section*{Vtables}

The way we achieve this abstraction is to assign to each PMC a set of function pointers that determine how it ought to behave when asked to do various things. In a sense, you can regard a PMC as an object in an abstract virtual class; the PMC needs a set of methods to be def\mbox{}ined in order to respond to method calls. These sets of methods are called \textbf{vtables}.

A vtable is, more strictly speaking, a structure which expects to be f\mbox{}illed with function pointers. The PMC contains a pointer to the vtable structure which implements its behavior. Hence, when we ask a PMC for its length, we're essentially calling the \texttt{length} method on the PMC; this is implemented by looking up the \texttt{length} slot in the vtable that the PMC points to, and calling the resulting function pointer with the PMC as argument: essentially,

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    (pmc->vtable->length)(pmc);\end{verbatim}
\vspace{-6pt}
\normalsize
If our PMC is a string and has a vtable which implements Perl-like string operations, this will return the length of the string. If, on the other hand, the PMC is an array, we might get back the number of elements in the array. (If that's what we want it to do.)

Similarly, if we call the increment operator on a Perl string, we should get the next string in alphabetic sequence; if we call it on a Python value, we may well get an error to the ef\mbox{}fect that Python doesn't have an increment operator suggesting a bug in the compiler front-end. Or it might use a ``super-compatible Python vtable'' doing the right thing anyway to allow sharing data between Python programs and other languages more easily.

At any rate, the point is that vtables allow us to separate out the basic operations common to all programming languages - addition, length, concatenation, and so on - from the specif\mbox{}ic behavior demanded by individual languages. Perl 6 will be Perl by passing Parrot a set of Perlish vtables; Parrot will equally be able to run Python, Tcl, Ruby or whatever by linking in a set of vtables which implement the behaviors of values in those languages. Combining this with the custom opcode tables mentioned above, you should be able to see how Parrot is essentially a language independent base for building runtimes for bytecompiled languages.

One interesting thing about vtables is that you can construct them dynamically. You can f\mbox{}ind out more about vtables in \emph{docs/vtables.pod}.

\section*{String Handling}

Parrot provides a programmer-friendly view of strings. The Parrot string handling subsection handles all the work of memory allocation, expansion, and so on behind the scenes. It also deals with some of the encoding headaches that can plague Unicode-aware languages.

This is done primarily by a similar vtable system to that used by PMCs; each encoding will specify functions such as the maximum number of bytes to allocate for a character, the length of a string in characters, the of\mbox{}fset of a given character in a string, and so on. They will, of course, provide a transcoding function either to the other encodings or just to Unicode for use as a pivot.

The string handling API is explained in \emph{docs/strings.pod}.

\section*{Bytecode format}

We have already explained the format of the main stream of bytecode; operations will be followed by arguments packed in such a format as the individual operations require. This makes up the third section of a Parrot bytecode f\mbox{}ile; frozen representations of Parrot programs have the following structure.

Firstly, a magic number is presented to identify the bytecode f\mbox{}ile as Parrot code. Next comes the f\mbox{}ixup segment, which contains pointers to global variable storage and other memory locations required by the main opcode segment. On disk, the actual pointers will be zeroed out, and the bytecode loader will replace them by the memory addresses allocated by the running instance of the interpreter.

Similarly, the next segment def\mbox{}ines all string and PMC constants used in the code. The loader will reconstruct these constants, f\mbox{}ixing references to the constants in the opcode segment with the addresses of the newly reconstructed data.

As we know, the opcode segment is next. This is optionally followed by a code segment for debugging purposes, which contains a munged form of the original program f\mbox{}ile.

The bytecode format is fully documented in \emph{docs/parrotbyte.pod}.

\chapter{Submitting bug reports and patches}

\section*{ABSTRACT}

How to submit bug reports, patches and new f\mbox{}iles to Parrot.

\section*{How To Submit A Bug Report}

If you encounter an error while working with Parrot and don't understand what is causing it, create a bug report using the \emph{parrotbug} utility. The simplest way to use it is to run

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    % ./parrotbug\end{verbatim}
\vspace{-6pt}
\normalsize
in the distribution's root directory, and follow the prompts.

If you just want to use email to create the bug report, send an email to tickets@parrot.org.

If you know how to f\mbox{}ix the problem you encountered, then think about submitting a patch, or (see below) getting commit privileges.

\section*{A Note on Random Failures}

If you encounter errors that appear intermittently, it may be dif\mbox{}ficult or impossible for Parrot developers to diagnose and solve the problem. It is therefore recommended to control the sources of randomness in Parrot in an attempt to eliminate the intermittency of the bug. There are three common sources of randomness that should be considered.

\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] Pseudo-Random Number Generator

Direct use of a PRNG from within Parrot programs will lead to inconsistent results. If possible, isolate the bug from PRNG use, for example, by logging the random values which trigger the error and then hard coding them.

\item[] Address Space Layout Randomization

Several operating systems provide a security measure known as address space layout randomization. In bugs involving stray pointers, this can cause corruption in random Parrot subsystems. Temporarily disabling this feature may make this problem consistent and therefore debugable.

\item[] Hash Seed

Parrot's hash implementation uses randomization of its seed as a precaution against attacks based on hash collisions. The seed used can be directly controlled using \texttt{parrot}'s \texttt{--hash-seed} parameter. To determine what seeds are causing the error, Parrot can be rebuilt with \texttt{DEBUG\_HASH\_SEED} set to \texttt{1}, which will cause \texttt{parrot} to output the hash seed being used on every invocation.

\end{description}

\vspace{-5pt}
\section*{How To Create A Patch}

Try to keep your patches specif\mbox{}ic to a single change, and ensure that your change does not break any tests. Do this by running \texttt{make test}. If there is no test for the f\mbox{}ixed bug, please provide one.

In the following examples, \emph{parrot} contains the Parrot distribution, and \emph{workingdir} contains \emph{parrot}. The name \emph{workingdir} is just a placeholder for whatever the distribution's parent directory is called on your machine.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    workingdir
        |
        +--> parrot
                |
                +--> LICENSE
                |
                +--> src
                |
                +--> tools
                |
                +--> ...\end{verbatim}
\vspace{-6pt}
\normalsize
\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] \texttt{git}

If you are working with a git repository of parrot then please generate your patch with \texttt{git dif\mbox{}f}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd parrot
    git diff > my_contribution.patch\end{verbatim}
\vspace{-6pt}
\normalsize
\item[] Single \texttt{dif\mbox{}f}

If you are working from a released distribution of Parrot and the change you wish to make af\mbox{}fects only one or two f\mbox{}iles, then you can supply a \texttt{dif\mbox{}f} for each f\mbox{}ile. The \texttt{dif\mbox{}f} should be created in \emph{parrot}. Please be sure to create a unif\mbox{}ied dif\mbox{}f, with \texttt{dif\mbox{}f -u}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd parrot
    diff -u docs/submissions.pod docs/submissions.new > submissions.patch\end{verbatim}
\vspace{-6pt}
\normalsize
Win32 users will probably need to specify \texttt{-ub}.

\item[] Recursive \texttt{dif\mbox{}f}

If the change is more wide-ranging, then create an identical copy of \emph{parrot} in \emph{workingdir} and rename it \emph{parrot.new}. Modify \emph{parrot.new} and run a recursive \texttt{dif\mbox{}f} on the two directories to create your patch. The \texttt{dif\mbox{}f} should be created in \emph{workingdir}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd workingdir
    diff -ur --exclude='.git' parrot parrot.new > docs.patch\end{verbatim}
\vspace{-6pt}
\normalsize
Mac OS X users should also specify \texttt{--exclude=.DS\_Store}.

\item[] \texttt{CREDITS}

Each and every patch is an important contribution to Parrot and it's important that these ef\mbox{}forts are recognized. To that end, the \emph{CREDITS} f\mbox{}ile contains an informal list of contributors and their contributions made to Parrot. Patch submitters are encouraged to include a new or updated entry for themselves in \emph{CREDITS} as part of their patch.

The format for entries in \emph{CREDITS} is def\mbox{}ined at the top of the f\mbox{}ile.

\end{description}

\vspace{-5pt}
\section*{How To Submit A Patch}

\vspace{-5pt}

\begin{enumerate}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item Go to Parrot's ticket tracking system at https://trac.parrot.org/parrot/. Log in, or create an account if you don't have one yet.

\item If there is already a ticket for the bug or feature that your patch relates to, just attach the patch directly to the ticket.

\item Otherwise select ``New Ticket'' at the top of the site. https://trac.parrot.org/parrot/newticket

\item Give a clear and concise Summary. You do \textbf{NOT} need to pref\mbox{}ix the Summary with a \texttt{[PATCH]} identif\mbox{}ier. Instead, in the lower-right corner of the \emph{newticket} page, select status \texttt{new} in the \emph{Patch status} drop-down box.

\item The Description should contain an explanation of the purpose of the patch, and a list of all f\mbox{}iles af\mbox{}fected with summary of the changes made in each f\mbox{}ile. Optionally, the output of the \texttt{dif\mbox{}fstat(1)} utility when run on your patch(s) may be included at the bottom of the message body.

\item Set the Type of the ticket to ``patch''. Set other relevant drop-down menus, such as Version (the version of Parrot where you encountered the problem), Platform, or Severity. As mentioned above, select status \texttt{new} in the \emph{Patch status} drop-down box.

\item Check the box for ``I have f\mbox{}iles to attach to this ticket''. Double-check that you've actually done this, because it's easy to forget.

\textbf{DO NOT} paste the patch f\mbox{}ile content into the Description.

\item Click the ``Create ticket'' button. On the next page attach your patch f\mbox{}ile(s).

\end{enumerate}

\vspace{-5pt}
\section*{Applying Patches}

You may wish to apply a patch submitted by someone else before the patch is incorporated into git

For single \texttt{dif\mbox{}f} patches or \texttt{git} patches, copy the patch f\mbox{}ile to \emph{parrot}, and run:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd parrot
    git apply some.patch\end{verbatim}
\vspace{-6pt}
\normalsize
For recursive \texttt{dif\mbox{}f} patches, copy the patch f\mbox{}ile to \emph{workingdir}, and run:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd workingdir
    git apply some.patch\end{verbatim}
\vspace{-6pt}
\normalsize
In order to be on the safe side run 'make test' before actually committing the changes.

\subsection*{Conf\mbox{}iguration of f\mbox{}iles to ignore}

Sometimes new f\mbox{}iles will be created in the conf\mbox{}iguration and build process of Parrot. These f\mbox{}iles should not show up when checking the distribution with

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    git status\end{verbatim}
\vspace{-6pt}
\normalsize
or

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    perl tools/dev/manicheck.pl\end{verbatim}
\vspace{-6pt}
\normalsize
In order to keep the two dif\mbox{}ferent checks synchronized, the MANIFEST and MANIFEST.SKIP f\mbox{}ile should be regenerated with:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    perl tools/dev/mk_manifest_and_skip.pl\end{verbatim}
\vspace{-6pt}
\normalsize
\section*{How To Submit Something New}

If you have a new feature to add to Parrot, such as a new test.

\vspace{-5pt}

\begin{enumerate}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item Add your new f\mbox{}ile path(s), relative to \emph{parrot}, to the f\mbox{}ile MANIFEST. Create a patch for the MANIFEST f\mbox{}ile according to the instructions in \textbf{How To Submit A Patch}.

\item If you have a new test script ending in \texttt{.t}, some mailers may become confused and consider it an application/x-trof\mbox{}f. One way around this (for *nix users) is to dif\mbox{}f the f\mbox{}ile against /dev/null like this:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    cd parrot
    diff -u /dev/null newfile.t > newfile.patch\end{verbatim}
\vspace{-6pt}
\normalsize
\item Go to Parrot's ticket tracking system at https://trac.parrot.org/parrot/. Log in, or create an account if you don't have one yet.

\item Select ``New Ticket'' https://trac.parrot.org/parrot/newticket.

\item Give a clear and concise Summary.

Pref\mbox{}ix it with a \texttt{[NEW]} identif\mbox{}ier.

\item The Description should contain an explanation of the purpose of the feature you are adding. Optionally, include the output of the \texttt{dif\mbox{}fstat(1)} utility when run on your patch(es).

\item Set the Type of the ticket to ``patch''. Set other relevant drop-down menus, such as Version, Platform, or Severity.

\item Check the box for ``I have f\mbox{}iles to attach to this ticket''

Double-check that you've actually done this, because it's easy to forget.

\textbf{DO NOT} paste the content of the new f\mbox{}ile or f\mbox{}iles into the body of the message.

\item Click the ``Create ticket'' button. On the next page attach the patch for MANIFEST and your new f\mbox{}ile(s).

\end{enumerate}

\vspace{-5pt}
\section*{What Happens Next?}

If you created a new ticket for the submission, you will be taken to the page for the new ticket and can check on the progress of your submission there. This identif\mbox{}ier should be used in all correspondence concerning the submission.

Everyone on Trac sees the submission and can comment on it. A developer with git commit privileges can commit it to git once it is clear that it is the right thing to do.

However developers with commit privileges may not commit your changes immediately if they are large or complex, as we need time for peer review.

A list of active tickets can be found here: http://trac.parrot.org/parrot/report/1

A list of all the unresolved patches is at: http://trac.parrot.org/parrot/report/15

\section*{Patches for the Parrot website}

The http://www.parrot.org website is hosted in a Drupal CMS. Submit changes through the usual ticket interface in Trac.

\section*{Getting Commit Privileges}

If you are interested in getting commit privileges to Parrot, here is the procedure:

\vspace{-5pt}

\begin{enumerate}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item Submit several high quality patches (and have them committed) via the process described in this document. This process may take weeks or months.

\item Obtain a Trac account at https://trac.parrot.org/parrot

\item Submit a Parrot Contributor License Agreement; this document signif\mbox{}ies that you have the authority to license your work to Parrot Foundation for inclusion in their projects. You may need to discuss this with your employer if you contribute to Parrot on work time or with work resources, or depending on your employment agreement.

http://www.parrot.org/f\mbox{}iles/parrot\_cla.pdf

\item Request commit access via the \texttt{parrot-dev} mailing list, or via IRC (\#parrot on irc.parrot.org). The existing committers will discuss your request in the next couple of weeks.

If approved, a metacommitter will update the permissions to allow you to commit to Parrot; see \texttt{RESPONSIBLE\_PARTIES} for the current list. Welcome aboard!

\end{enumerate}

\vspace{-5pt}
Thanks for your help!

\chapter{Parrot's command line options}

\section*{OVERVIEW}

This document describes Parrot's command line options.

\section*{SYNOPSIS}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 parrot [-options] <file> [arguments ...]\end{verbatim}
\vspace{-6pt}
\normalsize
\section*{ENVIRONMENT}

\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] PARROT\_RUNTIME

If this environment variable is set, parrot will use this path as its runtime pref\mbox{}ix instead of the compiled in path.

\item[] PARROT\_GC\_DEBUG

Turn on the \emph{--gc-debug} f\mbox{}lag.

\end{description}

\vspace{-5pt}
\section*{OPTIONS}

\subsection*{Assembler options}

\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] -a, --pasm

Assume PASM input on stdin.

\item[] -c, --pbc

Assume PBC f\mbox{}ile on stdin, run it.

\item[] -d, --imcc-debug [hexbits]

The \textbf{-d} switch takes an optional argument which is considered to hold a hex value of debug bits. Without a value, debug is set to 1.

The individual bits can be listed on the command line by use of the \textbf{--help-debug} switch.

To produce really huge output on \emph{stderr} run \texttt{``parrot \textbf{-d 0f\mbox{}ff\mbox{}f} \ldots ''}. Note: If the argument is separated by whitespace from the \textbf{-d} switch, it has to start with a number.

\item[] -h, --help

Print command line option summary.

\item[] --help-debug

Print debugging and tracing f\mbox{}lag bits summary.

\item[] -o outputf\mbox{}ile, --output=outputf\mbox{}ile

Act like an assembler. Don't run code, unless \textbf{-r} is given too. If the outputf\mbox{}ile ends with \emph{.pbc}, a PBC f\mbox{}ile is written. If it ends with \emph{.pasm}, a PASM output is generated, even from PASM input. This can be handy to check various optimizations, including \texttt{-Op}.

\item[] --output-pbc

Act like an assembler, but always output bytecode, even if the output f\mbox{}ile does not end in \emph{.pbc}

\item[] -r, --run-pbc

Only useful after \texttt{-o} or \texttt{--output-pbc}. Run the program from the compiled in-memory image. If two \texttt{-r} options are given, the \emph{.pbc} f\mbox{}ile is read from disc and run. This is mainly needed for tests.

\item[] -v, --verbose

One \texttt{-v} shows which f\mbox{}iles are worked on and prints a summary over register usage and optimization stats per \emph{subroutine}. With two \texttt{-v} switches, \texttt{parrot} prints a line per individual processing step too.

\item[] -y, --yydebug

Turn on yydebug in \emph{yacc}/\emph{bison}.

\item[] -V, --version

Print version information and exit.

\item[] -Ox

Optimize

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 -O0 no optimization (default)
 -O1 optimizations without life info (e.g. branches)
 -O  same
 -O2 optimizations with life info
 -Op rewrite I and N PASM registers most used first
 -Ot select fastest runcore
 -Oc turns on the optional/experimental tail call optimizations\end{verbatim}
\vspace{-6pt}
\normalsize
See \emph{docs/dev/optimizer.pod} for more information on the optimizer. Note that optimization is currently experimental and these options are likely to change.

\item[] -E, --pre-process-only

Preprocess source f\mbox{}ile (expand macros) and print result to stdout:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ parrot -E t/op/macro_10.pasm
  $ parrot -E t/op/macro_10.pasm | parrot -- -\end{verbatim}
\vspace{-6pt}
\normalsize
\end{description}

\vspace{-5pt}
\subsection*{Runcore Options}

These options select the runcore, which is useful for performance tuning and debugging. See ``About runcores`` for details.

\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] -R, --runcore CORE

Select the runcore. The following cores are available in Parrot, but not all may be available on your system:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  slow, bounds  bounds checking core (default)
  gcdebug       performs a full GC run before every op dispatch (good for
                debugging GC problems)
  trace         bounds checking core w/ trace info (see 'parrot --help-debug')
  profiling     see F<docs/dev/profilling.pod>\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{jit}, \texttt{switch-jit}, and \texttt{cgp-jit} options are currently aliases for the \texttt{fast}, \texttt{switch}, and \texttt{cgp} options, respectively. We do not recommend their use in new code; they will continue working for existing code per our deprecation policy.

\item[] -p, --prof\mbox{}ile

Run with the slow core and print an execution prof\mbox{}ile.

\item[] -t, --trace

Run with the slow core and print trace information to \textbf{stderr}. See \texttt{parrot --help-debug} for available f\mbox{}lag bits.

\end{description}

\vspace{-5pt}
\subsection*{VM Options}

\vspace{-5pt}

\begin{description}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item[] -w, --warnings

Turn on warnings. See \texttt{parrot --help-debug} for available f\mbox{}lag bits.

\item[] -D, --parrot-debug

Turn on interpreter debug f\mbox{}lag. See \texttt{parrot --help-debug} for available f\mbox{}lag bits.

\item[] --hash-seed <hexnum>

Sets the hash seed to the provided value. Only useful for debugging intermittent failures, and harmful in production.

\item[] --gc-debug

Turn on GC (Garbage Collection) debugging. This imposes some stress on the GC subsystem and can slow down execution considerably.

\item[] -G, --no-gc

This turns of\mbox{}f GC. This may be useful to f\mbox{}ind GC related bugs. Don't use this option for longer running programs: as memory is no longer recycled, it may quickly become exhausted.

\item[] --leak-test, --destroy-at-end

Free all memory of the last interpreter. This is useful when running leak checkers.

\item[] -., --wait

Read a keystroke before starting. This is useful when you want to attach a debugger on platforms such as Windows.

\item[] --runtime-pref\mbox{}ix

Print the runtime pref\mbox{}ix path and exit.

\end{description}

\vspace{-5pt}
\subsection*{<f\mbox{}ile>}

If the f\mbox{}ile ends in \emph{.pbc} it will be interpreted immediately.

If the f\mbox{}ile ends in \emph{.pasm}, then it is parsed as PASM code. Otherwise, it is parsed as PIR code. In both cases, it will then be run, unless the \texttt{-o} f\mbox{}lag was given.

If the \texttt{f\mbox{}ile} is a single dash, input from \texttt{stdin} is read.

\subsection*{[arguments \ldots ]}

Optional arguments passed to the running program as ARGV. The program is assumed to know what to do with these.

\section*{Generated f\mbox{}iles}

\section*{About runcores}

The runcore (or runloop) tells Parrot how to f\mbox{}ind the C code that implements each instruction. Parrot provides more than one way to do this, partly because no single runcore will perform optimally on all architectures (or even for all problems on a given architecture), and partly because some of the runcores have specif\mbox{}ic debugging and tracing capabilities.

In the default ``slow'' runcore, each opcode is a separate C function. That's pretty easy in pseudocode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    slow_runcore( op ):
        while ( op ):
            op = op_function( op )
            check_for_events()\end{verbatim}
\vspace{-6pt}
\normalsize
The GC debugging runcore is similar:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    gcdebug_runcore( op ):
        while ( op ):
            perform_full_gc_run()
            op = op_function( op )
            check_for_events()\end{verbatim}
\vspace{-6pt}
\normalsize
Of course, this is much slower, but is extremely helpful for pinning memory corruption problems that af\mbox{}fect GC down to single-instruction resolution. See http://www.oreillynet.com/onlamp/blog/2007/10/debugging\_gc\_problems\_in\_parro.html for more information.

The trace and prof\mbox{}ile cores are also based on the ``slow'' core, doing full bounds checking, and also printing runtime information to stderr.

\section*{Operation table}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 Command Line          Action         Output
 ---------------------------------------------
 parrot x.pir          run
 parrot x.pasm         run
 parrot x.pbc          run
 -o x.pasm x.pir       ass            x.pasm
 -o x.pasm y.pasm      ass            x.pasm
 -o x.pbc  x.pir       ass            x.pbc
 -o x.pbc  x.pasm      ass            x.pbc
 -o x.pbc -r x.pasm    ass/run pasm   x.pbc
 -o x.pbc -r -r x.pasm ass/run pbc    x.pbc
 -o x.o    x.pbc       obj\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots where the possible actions are:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  run ... yes, run the program
  ass ... assemble sourcefile
  obj ..  produce native (ELF) object file for the EXEC subsystem\end{verbatim}
\vspace{-6pt}
\normalsize
\section*{FILES}

\emph{main.c}

\chapter{PIR Guide}

\section{Introduction}

Parrot is a language-neutral virtual machine for dynamic languages such as Ruby, Python, PHP, and Perl. It hosts a powerful suite of compiler tools tailored to dynamic languages and a next generation regular expression engine. Its architecture dif\mbox{}fers from virtual machines such as the JVM or CLR, with optimizations for dynamic languages, the use of registers instead of stacks, and pervasive continuations used for all f\mbox{}low control.

The name ``Parrot'' was inspired by Monty Python's Parrot sketch. As an April Fools' Day joke in 2001, Simon Cozens published ``Programming Parrot'', a f\mbox{}ictional interview between Guido van Rossum and Larry Wall detailing their plans to merge Python and Perl into a new language called Parrot (\emph{http://www.perl.com/pub/a/2001/04/01/parrot.htm}).

Parrot Intermediate Representation (PIR) is Parrot's native low-level language. PIR is fundamentally an assembly language, but it has some higher-level features such as operator syntax, syntactic sugar for subroutine and method calls, automatic register allocation, and more friendly conditional syntax. Parrot libraries---including most of Parrot's compiler tools---are often written in PIR. Even so, PIR is more rigid and ``close to the machine'' than some higher-level languages like C, which makes it a good window into the inner workings of the virtual machine.

\subsection*{Parrot Resources}

\index{www.parrot.org website} The starting point for all things related to Parrot is the main website \emph{http://www.parrot.org/}. The site lists additional resources, well as recent news and information about the project.

The Parrot Foundation holds the copyright over Parrot and helps support its development and community.

\subsubsection*{Documentation}

\index{docs.parrot.org website} \index{online documentation (docs.parrot.org)} Parrot includes extensive documentation in the distribution. The full documentation for the latest release is available online at \emph{http://docs.parrot.org/}.

\subsubsection*{Mailing Lists}

\index{parrot-dev mailing list} \index{mailing lists}

The primary mailing list for Parrot is \emph{parrot-dev}.\footnote{parrot-dev@lists.parrot.org} If you're interested in developing Parrot, the \emph{parrot-commits} and \emph{parrot-tickets} lists are useful. More information on the Parrot mailing lists, as well as subscription options, is available on the mailing list info page \emph{http://lists.parrot.org/mailman/listinfo}.

The archives for \emph{parrot-dev} are available on Google Groups at \emph{http://groups.google.com/group/parrot-dev} and as NNTP at \emph{nntp://news.gmane.org/gmane.comp.compilers.parrot.devel}.

\subsubsection*{IRC}

\index{\#parrot (IRC channel)} \index{IRC channel (\#parrot)}

Parrot developers and users congregate on IRC at \texttt{\#parrot} on the \emph{irc://irc.parrot.org} server. It's a good place to ask questions or discuss Parrot in real time.

\subsubsection*{Issue Tracking \& Wiki}

\index{trac.parrot.org website} \index{issue tracking (trac.parrot.org)}

Parrot developers track bugs, feature requests, and roadmap tasks at \emph{https://trac.parrot.org/}, the open source Trac issue tracker. Users can submit new tickets and track the status of existing tickets. The site also includes a wiki used in project development, a source code browser, and the project roadmap.

\subsection*{Parrot Development}

\index{development cycles}

Parrot's f\mbox{}irst release occurred in September 2001. The 1.0 release took place on March 17, 2009. The Parrot project makes releases on the third Tuesday of each month. Two releases a year --- occuring every January and July --- are ``supported'' releases intended for production use. The other ten releases are development releases intended for language implementers and testers.

Development proceeds in cycles around releases. Activity just before a release focuses on closing tickets, f\mbox{}ixing bugs, reviewing documentation, and preparing for the release. Immediately after the release, larger changes occur: merging branches, adding large features, or removing deprecated features. This allows developers to ensure that changes have suf\mbox{}ficient testing time before the next release. These regular releases also encourage feedback from casual users and testers.

\subsection*{Licensing}

\index{license}

The Parrot foundation supports the Parrot development community and holds trademarks and copyrights to Parrot. The project is available under the Artistic License 2.0, allowing free use in commercial and open source/free software contexts.

\section{Getting Started}

The simplest way to install Parrot is to use a pre-compiled binary for your operating system or distribution. Packages are available for many systems, including Debian, Ubuntu, Fedora, Mandriva, FreeBSD, Cygwin, and MacPorts. The Parrot website lists all known packages.\footnote{\emph{http://www.parrot.org/download}} A binary installer for Windows is also available from the Parrot Win32 project on SourceForge.\footnote{\emph{http://parrotwin32.sourceforge.net/}} If packages aren't available on your system, you can download a source tarball for the latest supported release from \emph{http://www.parrot.org/release/supported}.

You need a C compiler and a make utility to build Parrot from source code---usually \texttt{gcc} and \texttt{make}, but Parrot can build with standard compiler toolchains on dif\mbox{}ferent operating systems. Perl 5.8 is also a prerequiste for conf\mbox{}iguring and building Parrot.

\index{compiling} If you have these dependencies installed, build the core virtual machine and compiler toolkit and run the standard test suite with the commands:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ perl Configure.pl
  $ make
  $ make test\end{verbatim}
\vspace{-6pt}
\normalsize
\index{installation} By default, Parrot installs to directories \emph{bin/}, \emph{lib/}, and \emph{include/} under \emph{/usr/local}. If you have privileges to write to these directories, install Parrot with:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ make install\end{verbatim}
\vspace{-6pt}
\normalsize
To install Parrot in a dif\mbox{}ferent location, use the \texttt{--pref\mbox{}ix} option to \emph{Conf\mbox{}igure.pl}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $ perl Configure.pl --prefix=/home/me/parrot\end{verbatim}
\vspace{-6pt}
\normalsize
Setting the pref\mbox{}ix to \emph{/home/me/parrot} installs the Parrot executable in \emph{/home/me/parrot/bin/parrot}.

If you intend to develop a language on Parrot, install the Parrot developer tools as well:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ make install-dev\end{verbatim}
\vspace{-6pt}
\normalsize
\index{.pir f\mbox{}iles} Once you've installed Parrot, create a test f\mbox{}ile called \emph{news.pir}.\footnote{Files containing PIR code use the \emph{.pir} extension.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'news'
    say "Here is the news for Parrots."
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Now run this f\mbox{}ile with:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ parrot news.pir\end{verbatim}
\vspace{-6pt}
\normalsize
which will print:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  Here is the news for Parrots.\end{verbatim}
\vspace{-6pt}
\normalsize
\section{Basic Syntax}

\label{CHP-3}

\index{PIR syntax} PIR has a relatively simple syntax. Every line is a comment, a label, a statement, or a directive. Each statement or directive stands on its own line. There is no end-of-line symbol (such as a semicolon in C).

\subsection*{Comments}

\index{comments} A comment begins with the \texttt{\#} symbol, and continues until the end of the line. Comments can stand alone on a line or follow a statement or directive.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    # This is a regular comment. The PIR
    # interpreter ignores this.\end{verbatim}
\vspace{-6pt}
\normalsize
\index{Pod documentation} PIR also treats inline documentation in Pod format as a comment. An equals sign as the f\mbox{}irst character of a line marks the start of a Pod block. A \texttt{=cut} marker signals the end of a Pod block.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  =head2

  This is Pod documentation, and is treated like a
  comment. The PIR interpreter ignores this.

  =cut\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Labels}

\index{labels} A label attaches a name to a line of code so other statements can refer to it. Labels can contain letters, numbers, and underscores. By convention, labels use all capital letters to stand out from the rest of the source code. It's f\mbox{}ine to put a label on the same line as a statement or directive:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    GREET: say "'Allo, 'allo, 'allo."\end{verbatim}
\vspace{-6pt}
\normalsize
Labels on separate lines improve readability, especially when outdented:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  GREET:
    say "'Allo, 'allo, 'allo."\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Statements}

\label{CHP-3-SECT-1}

\index{statements}\index{opcodes} A statement is either an opcode or syntactic sugar for one or more opcodes. An opcode is a native instruction for the virtual machine; it consists of the name of the instruction followed by zero or more arguments.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  say "Norwegian Blue"\end{verbatim}
\vspace{-6pt}
\normalsize
PIR also provides higher-level constructs, including symbolic operators:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I1 = 2 + 5\end{verbatim}
\vspace{-6pt}
\normalsize
\index{operators} These special statement forms are just syntactic sugar for regular opcodes. The \texttt{+} symbol corresponds to the \texttt{add} opcode, the \texttt{-} symbol to the \texttt{sub} opcode, and so on. The previous example is equivalent to:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  add $I1, 2, 5\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Directives}

\index{directives} Directives resemble opcodes, but they begin with a period (\texttt{.}). Some directives specify actions that occur at compile time. Other directives represent complex operations that require the generation of multiple instructions. The \texttt{.local} directive, for example, declares a named variable.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local string hello\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Literals}

\index{literals} Integers and f\mbox{}loating point numbers are numeric literals. They can be positive or negative.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = 42       # positive
  $I1 = -1       # negative\end{verbatim}
\vspace{-6pt}
\normalsize
\index{integers} Integer literals can also be binary, octal, or hexadecimal:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I1 = 0b01010  # binary
  $I2 = 0o72     # octal
  $I3 = 0xA5     # hexadecimal\end{verbatim}
\vspace{-6pt}
\normalsize
\index{numbers (f\mbox{}loating-point)} Floating point number literals have a decimal point and can use scientif\mbox{}ic notation:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $N0 = 3.14
  $N2 = -1.2e+4\end{verbatim}
\vspace{-6pt}
\normalsize
\index{strings} String literals are enclosed in single or double-quotes.\footnote{See the section on Strings in Chapter 4 for an explanation of the dif\mbox{}ferences between the quoting types.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "This is a valid literal string"
  $S1 = 'This is also a valid literal string'\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Variables}

\index{variables} PIR variables can store four dif\mbox{}ferent kinds of values---integers, numbers (f\mbox{}loating point), strings, and objects. Parrot's objects are called PMCs, for ``\emph{P}oly\emph{M}orphic \emph{C}ontainer''.

The simplest kind of variable is a register variable. The name of a register variable always starts with a dollar sign (\texttt{\$}), followed by a single character which specif\mbox{}ies the type of the variable---integer (\texttt{I}), number (\texttt{N}), string (\texttt{S}), or PMC (\texttt{P})---and ends with a unique number. You need not predeclare register variables:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "Who's a pretty boy, then?"
  say $S0\end{verbatim}
\vspace{-6pt}
\normalsize
\index{named variables} PIR also has named variables; the \texttt{.local} directive declares them. As with register variables, there are four valid types: \texttt{int}, \texttt{num}, \texttt{string}, and \texttt{pmc}. You \emph{must} declare named variables; once declared, they behave exactly the same as register variables.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local string hello
  hello = "'Allo, 'allo, 'allo."
  say hello\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Constants}

\index{constants} The \texttt{.const} directive declares a named constant. Named constants are similar to named variables, but the values set in the declaration may never change. Like \texttt{.local}, \texttt{.const} takes a type and a name. It also requires a literal argument to set the value of the constant.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .const int    frog = 4                       # integer
  .const string name = "Superintendent Parrot" # string
  .const num    pi   = 3.14159                 # floating point\end{verbatim}
\vspace{-6pt}
\normalsize
You may use a named constant anywhere you may use a literal, but you must declare the named constant beforehand. This example declares a named string constant \texttt{hello} and prints the value:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .const string hello = "Hello, Polly."
  say hello\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Keys}

\index{keys} A key is a special kind of constant used for accessing elements in complex variables (such as an array). A key is either an integer or a string; and it's always enclosed in square brackets (\texttt{[} and \texttt{]}). You do not have to declare literal keys. This code example stores the string ``foo'' in \$P0 as element 5, and then retrieves it.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0[5] = "foo"
  $S1    = $P0[5]\end{verbatim}
\vspace{-6pt}
\normalsize
PIR supports multi-part keys. Use a semicolon to separate each part.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0['my';'key'] = 472
  $I1             = $P0['my';'key']\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Control Structures}

\index{control structures}\index{goto instruction} Rather than providing a pre-packaged set of control structures like \texttt{if} and \texttt{while}, PIR gives you the building blocks to construct your own.\footnote{PIR has many advanced features, but at heart it \textbf{is} an assembly language.} The most basic of these building blocks is \texttt{goto}, which jumps to a named label.\footnote{This is not your father's \texttt{goto}. It can only jump inside a subroutine, and only to a named label.} In this code example, the \texttt{say} statement will run immediately after the \texttt{goto} statement:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    goto GREET
      # ... some skipped code ...
  GREET:
    say "'Allo, 'allo, 'allo."\end{verbatim}
\vspace{-6pt}
\normalsize
\index{conditional branch} Variations on the basic \texttt{goto} check whether a particular condition is true or false before jumping:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  if $I0 > 5 goto GREET\end{verbatim}
\vspace{-6pt}
\normalsize
You can construct any traditional control structure from PIR's built-in control structures.

\subsection*{Subroutines}

\index{subroutines} A PIR subroutine starts with the \texttt{.sub} directive and ends with the \texttt{.end} directive. Parameter declarations use the \texttt{.param} directive; they resemble named variable declarations. This example declares a subroutine named \texttt{greeting}, that takes a single string parameter named \texttt{hello}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'greeting'
      .param string hello
      say hello
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{That's All Folks}

You now know everything you need to know about PIR. Everything else you read or learn about PIR will use one of these fundamental language structures. The rest is vocabulary.

\begin{figure}[!h]
\begin{center}
\framebox{
\begin{minipage}{3.5in}
\vspace{3pt}

\begin{center}
\large{\bfseries{Parrot Assembly Language}}
\end{center}

Parrot Assembly Language (PASM) is another low-level language native to the virtual machine. PASM is a pure assembly language, with none of the syntactic sugar that makes PIR friendly for library development. PASM's primary purpose is to act as a plain English representation of the bytecode format. Its typical use is for debugging, rather than for writing libraries. Use PIR or a higher-level language for development tasks.

PASM f\mbox{}iles use the \emph{.pasm} f\mbox{}ile extension.

\vspace{3pt}
\end{minipage}
}
\end{center}
\end{figure}
\section{Variables}

Parrot is a register-based virtual machine. It has four typed register sets---integers, f\mbox{}loating-point numbers, strings, and objects. All variables in PIR are one of these four types. When you work with register variables or named variables, you're actually working directly with register storage locations in the virtual machine.

If you've ever worked with an assembly language before, you may immediately jump to the conclusion that \texttt{\$I0} is the zeroth integer register in the register set, but Parrot is a bit smarter than that. The number of a register variable does not necessarily correspond to the register used internally; Parrot's compiler maps registers as appropriate for speed and memory considerations. The only guarantee Parrot gives you is that you'll always get the same storage location when you use \texttt{\$I0} in the same subroutine.

\subsection*{Assignment}

\index{assignment} \index{= operator} The most basic operation on a variable is assignment using the \texttt{=} operator:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = 42        # set integer variable to the value 42
  $N3 = 3.14159   # set number variable to approximation of pi
  $I1 = $I0       # set $I1 to the value of $I0\end{verbatim}
\vspace{-6pt}
\normalsize
\index{null opcode} The \texttt{null} opcode sets an integer or number variable to a zero value, and undef\mbox{}ines a string or object.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  null $I0  # 0
  null $N0  # 0.0
  null $S0  # NULL
  null $P0  # PMCNULL\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Working with Numbers}

\index{integers}\index{numbers (f\mbox{}loating-point)} PIR has an extensive set of instructions that work with integers, f\mbox{}loating-point numbers, and numeric PMCs. Many of these instructions have a variant that modif\mbox{}ies the result in place:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $I1 + $I2
  $I0 += $I1\end{verbatim}
\vspace{-6pt}
\normalsize
\index{+ operator} The f\mbox{}irst form of \texttt{+} stores the sum of the two arguments in the result variable, \texttt{\$I0}. The second variant, \texttt{+=}, adds the single argument to \texttt{\$I0} and stores the sum back in \texttt{\$I0}.

The arguments can be Parrot literals, variables, or constants. If the result is an integer type, like \texttt{\$I0}, the arguments must also be integers. A number result, like \texttt{\$N0}, usually requires number arguments, but many numeric instructions also allow the f\mbox{}inal argument to be an integer. Instructions with a PMC result may accept an integer, f\mbox{}loating-point, or PMC f\mbox{}inal argument:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = $P1 * $P2
  $P0 = $P1 * $I2
  $P0 = $P1 * $N2
  $P0 *= $P1
  $P0 *= $I1
  $P0 *= $N1\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Unary numeric opcodes}

\index{unary numeric opcodes} Unary opcodes have a single argument. They either return a result or modify the argument in place. Some of the most common unary numeric opcodes are \texttt{inc} (increment)\index{inc opcode}, \texttt{dec} (decrement)\index{dec opcode}, \texttt{abs} (absolute value)\index{abs opcode}, \texttt{neg} (negate)\index{neg opcode}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $N0 = abs -5.0  # the absolute value of -5.0 is 5.0
  $I0 = 120
  inc $I1         # 120 incremented by 1 is 121\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Binary numeric opcodes}

\index{binary numeric opcodes}

Binary opcodes have two arguments and a result. Parrot provides addition (\texttt{+}\index{+ operator} or \texttt{add}\index{add opcode}), subtraction (\texttt{-}\index{- operator} or \texttt{sub}\index{sub opcode}), multiplication (\texttt{*}\index{* operator} or \texttt{mul}\index{mul opcode}), division (\texttt{/}\index{/ operator} or \texttt{div}\index{div opcode}), modulus (\texttt{\%}\index{\% operator} or \texttt{mod}\index{mod opcode}), and exponent (\texttt{pow}\index{pow opcode}) opcodes, as well as \texttt{gcd}\index{gcd opcode} (greatest common divisor) and \texttt{lcm}\index{lcm opcode} (least common multiple).

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = 12 / 5
  $I0 = 12 % 5\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Floating-point operations}

The most common f\mbox{}loating-point operations are \texttt{ln}\index{ln opcode} (natural log), \texttt{log2}\index{log2 opcode} (log base 2), \texttt{log10}\index{log10 opcode} (log base 10), and \texttt{exp}\index{exp opcode} (\emph{e}x), as well as a full set of trigonometric opcodes such as \texttt{sin}\index{sin opcode} (sine), \texttt{cos}\index{cos opcode} (cosine), \texttt{tan}\index{tan opcode} (tangent), \texttt{sec}\index{sec opcode} (secant), \texttt{sinh}\index{sinh opcode} (hyperbolic sine), \texttt{cosh}\index{cosh opcode} (hyperbolic cosine), \texttt{tanh}\index{tanh opcode} (hyperbolic tangent), \texttt{sech}\index{sech opcode} (hyperbolic secant), \texttt{asin}\index{asin opcode} (arc sine), \texttt{acos}\index{acos opcode} (arc cosine), \texttt{atan}\index{atan opcode} (arc tangent), \texttt{asec}\index{asec opcode} (arc secant), \texttt{exsec}\index{exsec opcode} (exsecant), \texttt{hav}\index{hav opcode} (haversine), and \texttt{vers}\index{vers opcode} (versine). All angle arguments for the \index{trigonometric opcodes} trigonometric opcodes are in radians:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'trans_ops'

  # ...

  $N0 = sin $N1
  $N0 = exp 2\end{verbatim}
\vspace{-6pt}
\normalsize
The majority of the f\mbox{}loating-point operations have a single argument and a single result. The arguments can generally be either an integer or number, but many of these opcodes require the result to be a number.

\subsubsection*{Logical and Bitwise Operations}

\index{logical opcodes} The logical opcodes evaluate the truth of their arguments. They are most useful to make decisions for control f\mbox{}low. Integers and numeric PMCs are false if they're 0 and true otherwise. Strings are false if they're the empty string or a single character ``0'', and true otherwise. PMCs are true when their \texttt{get\_bool}\index{get\_bool vtable function} vtable function returns a nonzero value.

The \texttt{and}\index{and opcode} opcode returns the f\mbox{}irst argument if it's false and the second argument otherwise:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = and 0, 1  # returns 0
  $I0 = and 1, 2  # returns 2\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{or}\index{or opcode} opcode returns the f\mbox{}irst argument if it's true and the second argument otherwise:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $I0 = or 1, 0  # returns 1
  $I0 = or 0, 2  # returns 2

  $P0 = or $P1, $P2\end{verbatim}
\vspace{-6pt}
\normalsize
Both \texttt{and} and \texttt{or} are short-circuiting ops. If they can determine what value to return from the f\mbox{}irst argument, they'll never evaluate the second. This is signif\mbox{}icant only for PMCs, as they might have side ef\mbox{}fects on evaluation.

The \texttt{xor}\index{xor opcode} opcode returns the f\mbox{}irst argument if it is the only true value, returns the second argument if it is the only true value, and returns false if both values are true or both are false:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = xor 1, 0  # returns 1
  $I0 = xor 0, 1  # returns 1
  $I0 = xor 1, 1  # returns 0
  $I0 = xor 0, 0  # returns 0\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{not}\index{not opcode} opcode returns a true value when the argument is false and a false value if the argument is true:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = not $I1
  $P0 = not $P1\end{verbatim}
\vspace{-6pt}
\normalsize
\index{bitwise opcodes} The bitwise opcodes operate on their values a single bit at a time. \texttt{band}\index{band opcode}, \texttt{bor}\index{bor opcode}, and \texttt{bxor}\index{bxor opcode} return a value that is the logical AND, OR, or XOR of each bit in the source arguments. They each take two arguments.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $I0 = bor $I1, $I2
  $P0 = bxor $P1, $I2\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{band}, \texttt{bor}, and \texttt{bxor} also have variants that modify the result in place.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $I0 = band $I1
  $P0 = bor $P1\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{bnot}\index{bnot opcode} is the logical NOT of each bit in the source argument.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $I0 = bnot $I1\end{verbatim}
\vspace{-6pt}
\normalsize
\index{shl opcode} \index{shr opcode} \index{lsr opcode} The logical and arithmetic shift operations shift their values by a specif\mbox{}ied number of bits:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $I0 = shl $I1, $I2        # shift $I1 left by count $I2
  $I0 = shr $I1, $I2        # arithmetic shift right
  $P0 = lsr $P1, $P2        # logical shift right\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Working with Strings}

\index{strings} Parrot strings are buf\mbox{}fers of variable-sized data. The most common use of strings is to store text data. Strings can also hold binary or other non-textual data, though this is rare.\footnote{In general, a custom PMC is more useful.} Parrot strings are f\mbox{}lexible and powerful, to handle the complexity of human-readable (and computer-representable) text data. String operations work with string literals, variables, and constants, and with string-like PMCs.

\subsubsection*{Escape Sequences}

\index{string escapes} \index{escape sequences}

Strings in double-quotes allow escape sequences using backslashes. Strings in single-quotes only allow escapes for nested quotes:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "This string is \n on two lines"
  $S0 = 'This is a \n one-line string with a slash in it'\end{verbatim}
\vspace{-6pt}
\normalsize
Table 4.1 shows the escape sequences Parrot supports in double-quoted strings.

\begin{table}[!h]
\caption{String Escapes}
\begin{center}
\begin{tabular}{|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Escape}} & \textbf{\textsf{Meaning}}\\ \hline
\texttt{$\backslash$a} & An ASCII alarm character\\ \hline
\texttt{$\backslash$b} & An ASCII backspace character\\ \hline
\texttt{$\backslash$t} & A tab\\ \hline
\texttt{$\backslash$n} & A newline\\ \hline
\texttt{$\backslash$v} & A vertical tab\\ \hline
\texttt{$\backslash$f} & A form feed\\ \hline
\texttt{$\backslash$r} & A carriage return\\ \hline
\texttt{$\backslash$e} & An escape\\ \hline
\texttt{$\backslash$$\backslash$} & A backslash\\ \hline
\texttt{$\backslash$''} & A quote\\ \hline
\texttt{$\backslash$x}\emph{NN} & A character represented by 1-2 hexadecimal digits\\ \hline
\texttt{$\backslash$x\{}\emph{NNNNNNNN}\texttt{\}} & A character represented by 1-8 hexadecimal digits\\ \hline
\texttt{$\backslash$o}\emph{NNN} & A character represented by 1-3 octal digits\\ \hline
\texttt{$\backslash$u}\emph{NNNN} & A character represented by 4 hexadecimal digits\\ \hline
\texttt{$\backslash$U}\emph{NNNNNNNN} & A character represented by 8 hexadecimal digits\\ \hline
\texttt{$\backslash$c}\emph{X} & A control character \emph{X}\\ \hline
\end{tabular}
\end{center}
\end{table}
\subsubsection*{Heredocs}

\index{heredocs} If you need more f\mbox{}lexibility in def\mbox{}ining a string, use a heredoc string literal. The \texttt{<<} operator starts a heredoc. The string terminator immediately follows. All text until the terminator is part of the string. The terminator must appear on its own line, must appear at the beginning of the line, and may not have any trailing whitespace.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S2 = <<"End_Token"
  This is a multi-line string literal. Notice that
  it doesn't use quotation marks.
  End_Token\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Concatenating strings}

\index{. operator} \index{strings;concatenation}

Use the \texttt{.} operator to concatenate strings. The following example concatenates the string ``cd'' onto the string ``ab'' and stores the result in \texttt{\$S1}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "ab"
  $S1 = $S0 . "cd"  # concatenates $S0 with "cd"
  say $S1           # prints "abcd"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{.= operator} Concatenation has a \texttt{.=} variant to modify the result in place. In the next example, the \texttt{.=} operation appends ``xy'' onto the string ``abcd'' in \texttt{\$S1}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S1 .= "xy"       # appends "xy" to $S1
  say $S1           # prints "abcdxy"\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Repeating strings}

\index{repeat opcode} The \texttt{repeat} opcode repeats a string a specif\mbox{}ied number of times:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "a"
  $S1 = repeat $S0, 5
  say $S1              # prints "aaaaa"\end{verbatim}
\vspace{-6pt}
\normalsize
In this example, \texttt{repeat} generates a new string with ``a'' repeated f\mbox{}ive times and stores it in \texttt{\$S1}.

\subsubsection*{Length of a string}

\index{length opcode} The \texttt{length} opcode returns the length of a string in characters. This won't be the same as the length in \emph{bytes} for multibyte encoded strings:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "abcd"
  $I0 = length $S0                # the length is 4
  say $I0\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{length} has no equivalent for PMC strings.

\subsubsection*{Substrings}

The simplest version of the \texttt{substr}\index{substr opcode} opcode takes three arguments: a source string, an of\mbox{}fset position, and a length. It returns a substring of the original string, starting from the of\mbox{}fset position (0 is the f\mbox{}irst character) and spanning the length:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = substr "abcde", 1, 2        # $S0 is "bc"\end{verbatim}
\vspace{-6pt}
\normalsize
This example extracts a two-character string from ``abcde'' at a one-character of\mbox{}fset from the beginning of the string (starting with the second character). It generates a new string, ``bc'', in the destination register \texttt{\$S0}.

When the of\mbox{}fset position is negative, it counts backward from the end of the string. Thus an of\mbox{}fset of -1 starts at the last character of the string.

\texttt{substr} no longer has a four-argument form, as in-place string operations have been removed. There is a \texttt{replace} operator which will perform the replacement and return a new\_string without modifying the old\_string. The arguments are new\_string, old\_string, of\mbox{}fset, count and replacement\_string. The old\_string is copied to the new\_string with the replacement\_string inserted from of\mbox{}fset replacing the content for count characters.

This example replaces the substring ``bc'' in \texttt{\$S1} with the string ``XYZ'', and returns ``aXYZde'' in \texttt{\$S0}, \texttt{\$S1} is not changed:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S1 = "abcde"
  $S0 = replace $S1, 1, 2, "XYZ"
  say $S0                        # prints "aXYZde"
  say $S1                        # prints "abcde"\end{verbatim}
\vspace{-6pt}
\normalsize
When the of\mbox{}fset position in a \texttt{replace} is one character beyond the original string length, \texttt{replace} appends the replacement string just like the concatenation operator. If the replacement string is an empty string, the opcode removes the characters from the original string in the new string.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S1 = "abcde"
  $S1 = replace $S1, 1, 2, "XYZ"
  say $S1                        # prints "aXYZde"\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Converting characters}

The \texttt{chr}\index{chr opcode} opcode takes an integer value and returns the corresponding character in the ASCII character set as a one-character string. The \texttt{ord}\index{ord opcode} opcode takes a single character string and returns the integer value of the character at the f\mbox{}irst position in the string. The integer value of the character will dif\mbox{}fer depending on the current encoding of the string:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = chr 65              # $S0 is "A"
  $I0 = ord $S0             # $I0 is 65, if $S0 is ASCII/UTF-8\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{ord} has a two-argument variant that takes a character of\mbox{}fset to select a single character from a multicharacter string. The of\mbox{}fset must be within the length of the string:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = ord "ABC", 2        # $I0 is 67\end{verbatim}
\vspace{-6pt}
\normalsize
A negative of\mbox{}fset counts backward from the end of the string, so -1 is the last character.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = ord "ABC", -1       # $I0 is 67\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Formatting strings}

\index{strings;formatting}

The \texttt{sprintf}\index{sprintf opcode} opcode generates a formatted string from a series of values. It takes two arguments: a string specifying the format, and an array PMC containing the values to be formatted. The format string and the result can be either strings or PMCs:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = sprintf $S1, $P2
  $P0 = sprintf $P1, $P2\end{verbatim}
\vspace{-6pt}
\normalsize
The format string is similar to C's \texttt{sprintf} function with extensions for Parrot data types. Each format f\mbox{}ield in the string starts with a \texttt{\%} and ends with a character specifying the output format. Table 4.2 lists the available output format characters.

\begin{table}[!h]
\caption{Format characters}
\begin{center}
\begin{tabular}{|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Format}} & \textbf{\textsf{Meaning}}\\ \hline
\texttt{\%c} & A single character.\\ \hline
\texttt{\%d} & A decimal integer.\\ \hline
\texttt{\%i} & A decimal integer.\\ \hline
\texttt{\%u} & An unsigned integer.\\ \hline
\texttt{\%o} & An octal integer.\\ \hline
\texttt{\%x} & A hex integer, preceded by 0x (when \# is specif\mbox{}ied).\\ \hline
\texttt{\%X} & A hex integer with a capital X (when \# is specif\mbox{}ied).\\ \hline
\texttt{\%b} & A binary integer, preceded by 0b (when \# is specif\mbox{}ied).\\ \hline
\texttt{\%B} & A binary integer with a capital B (when \# is specif\mbox{}ied).\\ \hline
\texttt{\%p} & A pointer address in hex.\\ \hline
\texttt{\%f} & A f\mbox{}loating-point number.\\ \hline
\texttt{\%e} & A f\mbox{}loating-point number in scientif\mbox{}ic notation (displayed with a lowercase ``e'').\\ \hline
\texttt{\%E} & The same as \texttt{\%e}, but displayed with an uppercase E.\\ \hline
\texttt{\%g} & The same as \texttt{\%e} or \texttt{\%f}, whichever f\mbox{}its best.\\ \hline
\texttt{\%G} & The same as \texttt{\%g}, but displayed with an uppercase E.\\ \hline
\texttt{\%s} & A string.\\ \hline
\end{tabular}
\end{center}
\end{table}
Each format f\mbox{}ield supports several specif\mbox{}ier options: \emph{f\mbox{}lags}, \emph{width}, \emph{precision}, and \emph{size}. Table 4.3 lists the format f\mbox{}lags.

\begin{table}[!h]
\caption{Format f\mbox{}lags}
\begin{center}
\begin{tabular}{|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Flag}} & \textbf{\textsf{Meaning}}\\ \hline
0 & Pad with zeros.\\ \hline
<space> & Pad with spaces.\\ \hline
\texttt{+} & Pref\mbox{}ix numbers with a sign.\\ \hline
\texttt{-} & Align left.\\ \hline
\texttt{\#} & Pref\mbox{}ix a leading 0 for octal, 0x for hex, or force a decimal point.\\ \hline
\end{tabular}
\end{center}
\end{table}
The \emph{width} is a number def\mbox{}ining the minimum width of the output from a f\mbox{}ield. The \emph{precision} is the maximum width for strings or integers, and the number of decimal places for f\mbox{}loating-point f\mbox{}ields. If either \emph{width} or \emph{precision} is an asterisk (\texttt{*}), it takes its value from the next argument in the PMC.

The \emph{size} modif\mbox{}ier def\mbox{}ines the type of the argument the f\mbox{}ield takes. Table 4.4 lists the size f\mbox{}lags. The values in the aggregate PMC must have a type compatible with the specif\mbox{}ied \emph{size}.

\begin{table}[!h]
\caption{Size f\mbox{}lags}
\begin{center}
\begin{tabular}{|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Character}} & \textbf{\textsf{Meaning}}\\ \hline
\texttt{h} & short integer or single-precision f\mbox{}loat\\ \hline
\texttt{l} & long\\ \hline
\texttt{H} & huge value (long long or long double)\\ \hline
\texttt{v} & Parrot INTVAL or FLOATVAL\\ \hline
\texttt{O} & opcode\_t pointer\\ \hline
\texttt{P} & \texttt{PMC}\\ \hline
\texttt{S} & String\\ \hline
\end{tabular}
\end{center}
\end{table}
\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = sprintf "int %#Px num %+2.3Pf\n", $P2
  say $S0       # prints "int 0x2a num +10.000"\end{verbatim}
\vspace{-6pt}
\normalsize
The format string of this \texttt{sprintf} example has two format f\mbox{}ields. The f\mbox{}irst, \texttt{\%\#Px}, extracts a PMC argument (\texttt{P}) from the aggregate \texttt{\$P2} and formats it as a hexadecimal integer (\texttt{x}) with a leading 0x (\texttt{\#}). The second format f\mbox{}ield, \texttt{\%+2.3Pf}, takes a PMC argument (\texttt{P}) and formats it as a f\mbox{}loating-point number (\texttt{f}) with a minimum of two whole digits and a maximum of three decimal places (\texttt{2.3}) and a leading sign (\texttt{+}).

The test f\mbox{}iles \emph{t/op/string.t} and \emph{t/op/sprintf.t} have many more examples of format strings.

\subsubsection*{Joining strings}

The \texttt{join}\index{join opcode} opcode joins the elements of an array PMC into a single string. The f\mbox{}irst argument separates the individual elements of the PMC in the f\mbox{}inal string result.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "ResizablePMCArray"
  push $P0, "hi"
  push $P0, 0
  push $P0, 1
  push $P0, 0
  push $P0, "parrot"
  $S0 = join "__", $P0
  say $S0                # prints "hi__0__1__0__parrot"\end{verbatim}
\vspace{-6pt}
\normalsize
This example builds a \texttt{Array} in \texttt{\$P0} with the values \texttt{``hi''}, \texttt{0}, \texttt{1}, \texttt{0}, and \texttt{``parrot''}. It then joins those values (separated by the string \texttt{``\_\_''}) into a single string stored in \texttt{\$S0}.

\subsubsection*{Splitting strings}

Splitting a string yields a new array containing the resulting substrings of the original string.

This example splits the string ``abc'' into individual characters and stores them in an array in \texttt{\$P0}. It then prints out the f\mbox{}irst and third elements of the array.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = split "", "abc"
  $P1 = $P0[0]
  say $P1                # 'a'
  $P1 = $P0[2]
  say $P1                # 'c'\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Testing for substrings}

The \texttt{index}\index{index opcode} opcode searches for a substring within a string. If it f\mbox{}inds the substring, it returns the position where the substring was found as a character of\mbox{}fset from the beginning of the string. If it fails to f\mbox{}ind the substring, it returns -1:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = index "Beeblebrox", "eb"
  say $I0                           # prints 2
  $I0 = index "Beeblebrox", "Ford"
  say $I0                           # prints -1\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{index} also has a three-argument version, where the f\mbox{}inal argument def\mbox{}ines an of\mbox{}fset position for starting the search.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = index "Beeblebrox", "eb", 3
  say $I0                           # prints 5\end{verbatim}
\vspace{-6pt}
\normalsize
This example f\mbox{}inds the second ``eb'' in ``Beeblebrox'' instead of the f\mbox{}irst, because the search skips the f\mbox{}irst three characters in the string.

\subsubsection*{Bitwise Operations}

The numeric bitwise opcodes also have string variants for AND, OR, and XOR: \texttt{bors}\index{bors opcode}, \texttt{bands}\index{bands opcode}, and \texttt{bxors}\index{bxors opcode}. These take string or string-like PMC arguments and perform the logical operation on each byte of the strings to produce the result string. Remember that in-place string operations are no longer available.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'bit_ops'

  # ...

  $P0 = bors $P1
  $P0 = bands $P1
  $S0 = bors $S1, $S2
  $P0 = bxors $P1, $S2\end{verbatim}
\vspace{-6pt}
\normalsize
The bitwise string opcodes produce meaningful results only when used with simple ASCII strings, because Parrot performs bitwise operations per byte.

\subsubsection*{Copy-On-Write}

Strings use copy-on-write (COW)\index{copy-on-write}\index{COW (copy-on-write)} optimizations. A call to \texttt{\$S1 = \$S0} doesn't immediately make a copy of \texttt{\$S0}, it only makes both variables point to the same string. Parrot doesn't make a copy of the string until one of two strings is modif\mbox{}ied.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = "Ford"
  $S1 = $S0
  $S1 = "Zaphod"
  say $S0                # prints "Ford"
  say $S1                # prints "Zaphod"\end{verbatim}
\vspace{-6pt}
\normalsize
Modifying one of the two variables causes Parrot to create a new string. This example preserves the existing value in \texttt{\$S0} and assigns the new value to the new string in \texttt{\$S1}. The benef\mbox{}it of copy-on-write is avoiding the cost of copying strings until the copies are necessary.

\subsubsection*{Encodings and Charsets}

\index{charset} \index{ASCII character set} \index{encoding} Years ago, strings only needed to support the ASCII character set (or charset), a mapping of 128 bit patterns to symbols and English-language characters. This worked as long as everyone using a computer read and wrote English and only used a small handful of punctuation symbols. In other words, it was woefully insuf\mbox{}ficient. A modern string system must manage charsets in order to make sense out of all the string data in the world. A modern string system must also handle dif\mbox{}ferent encodings---ways to represent various charsets in memory and on disk.

Every string in Parrot has an associated encoding and character set. The default format is 8-bit ASCII, which is almost universally supported. Double-quoted string constants can have an optional pref\mbox{}ix specifying the string's format.\footnote{As you might suspect, single-quoted strings do not support this.} Parrot tracks information about encoding and charset internally, and automatically converts strings when necessary to preserve these characteristics. Strings constants may have pref\mbox{}ixes of the form \texttt{format:}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = utf8:"Hello UTF-8 Unicode World!"
  $S1 = utf16:"Hello UTF-16 Unicode World!"
  $S2 = ascii:"This is 8-bit ASCII"
  $S3 = binary:"This is raw, unformatted binary data"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{ISO 8859-1 character set} \index{Latin 1 character set} \index{UCS-2 encoding} \index{UTF-8 encoding} \index{UTF-16 encoding} Parrot supports the formats \texttt{ascii}, \texttt{binary}, \texttt{iso-8859-1} (Latin 1), \texttt{utf8}, \texttt{utf16}, \texttt{ucs2}, and \texttt{ucs4}.

The \texttt{binary} format treats the string as a buf\mbox{}fer of raw unformatted binary data. It isn't really a string per se, because binary data contains no readable characters. This exists to support libraries which manipulate binary data that doesn't easily f\mbox{}it into any other primitive data type.

When Parrot operates on two strings (as in concatenation or comparison), they must both use the same character set and encoding. Parrot will automatically upgrade one or both of the strings to the next highest compatible format as necessary. ASCII strings will automatically upgrade to UTF-8 strings if needed, and UTF-8 will upgrade to UTF-16. All of these conversions happen inside Parrot, so the programmer doesn't need to worry about the details.

\subsection*{Working with PMCs}

\index{Polymorphic Containers (PMCs)} \index{PMCs (Polymorphic Containers)} Polymorphic Containers (PMCs) are the basis for complex data types and object-oriented behavior in Parrot. In PIR, any variable that isn't a low-level integer, number, or string is a PMC. PMC variables act much like the low-level variables, but you have to instantiate a new PMC object before you use it. The \texttt{new} opcode creates a new PMC object of the specif\mbox{}ied type.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'String'
  $P0 = "That's a bollard and not a parrot"
  say $P0\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates a \texttt{String} object, stores it in the PMC register variable \texttt{\$P0}, assigns it the value ``That's a bollard and not a parrot'', and prints it.

Every PMC has a type that indicates what data it can store and what behavior it supports. The \texttt{typeof}\index{typeof opcode} opcode reports the type of a PMC. When the result is a string variable, \texttt{typeof} returns the name of the type:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "String"
  $S0 = typeof $P0               # $S0 is "String"
  say $S0                        # prints "String"\end{verbatim}
\vspace{-6pt}
\normalsize
When the result is a PMC variable, \texttt{typeof} returns the \texttt{Class} PMC for that object type.

\subsubsection*{Scalars}

\index{scalar PMCs} \index{PMCs (Polymorphic Containers);scalar} In most of the examples shown so far, PMCs duplicate the behavior of integers, numbers, and strings. Parrot provides a set of PMCs for this exact purpose. \texttt{Integer}, \texttt{Float}, and \texttt{String} are thin overlays on Parrot's low-level integers, numbers, and strings.

A previous example showed a string literal assigned to a PMC variable of type \texttt{String}. Direct assignment of a literal to a PMC works for all the low-level types and their PMC equivalents:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'Integer'
  $P0 = 5

  $P1 = new 'String'
  $P1 = "5 birds"

  $P2 = new 'Float'
  $P2 = 3.14\end{verbatim}
\vspace{-6pt}
\normalsize
\index{boxing}

You may also assign non-constant low-level integer, number, or string registers directly to a PMC. The PMC handles the conversion from the low-level type to its own internal storage.\footnote{This conversion of a simpler type to a more complex type is ``boxing''.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = 5
  $P0 = new 'Integer'
  $P0 = $I0

  $S1 = "5 birds"
  $P1 = new 'String'
  $P1 = $S1

  $N2 = 3.14
  $P2 = new 'Float'
  $P2 = $N2\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{box} opcode is a handy shortcut to create the appropriate PMC object from an integer, number, or string literal or variable.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = box 3    # $P0 is an "Integer"

  $P1 = box $S1  # $P1 is a "String"

  $P2 = box 3.14 # $P2 is a "Float"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{unboxing} In the reverse situation, when assigning a PMC to an integer, number, or string variable, the PMC also has the ability to convert its value to the low-level type.\footnote{The reverse of ``boxing'' is ``unboxing''.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = box 5
  $S0 = $P0           # the string "5"
  $N0 = $P0           # the number 5.0
  $I0 = $P0           # the integer 5

  $P1 = box "5 birds"
  $S1 = $P1           # the string "5 birds"
  $I1 = $P1           # the integer 5
  $N1 = $P1           # the number 5.0

  $P2 = box 3.14
  $S2 = $P2           # the string "3.14"
  $I2 = $P2           # the integer 3
  $N2 = $P2           # the number 3.14\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates \texttt{Integer}\index{Integer PMC}, \texttt{Float}\index{Float PMC}, and \texttt{String}\index{String PMC} PMCs, and shows the ef\mbox{}fect of assigning each one back to a low-level type.

Converting a string to an integer or number only makes sense when the contents of the string are a number. The \texttt{String} PMC will attempt to extract a number from the beginning of the string, but otherwise will return a false value.

\begin{figure}[!h]
\begin{center}
\framebox{
\begin{minipage}{3.5in}
\vspace{3pt}

\begin{center}
\large{\bfseries{Type Conversions}}
\end{center}

\index{type conversions} Parrot also handles conversions between the low-level types where possible, converting integers to strings (\texttt{\$S0 = \$I1}), numbers to strings (\texttt{\$S0 = \$N1}), numbers to integers (\texttt{\$I0 = \$N1}), integers to numbers (\texttt{\$N0 = \$I1}), and even strings to integers or numbers (\texttt{\$I0 = \$S1} and \texttt{\$N0 = \$S1}).

\vspace{3pt}
\end{minipage}
}
\end{center}
\end{figure}
\subsubsection*{Aggregates}

\index{aggregate PMCs} \index{PMCs (Polymorphic Containers);aggregate} PMCs can def\mbox{}ine complex types that hold multiple values, commonly called aggregates. Two basic aggregate types are ordered arrays and associative arrays. The primary dif\mbox{}ference between these is that ordered arrays use integer keys for indexes and associative arrays use string keys.

Aggregate PMCs support the use of numeric or string keys. PIR also of\mbox{}fers a extensive set of operations for manipulating aggregate data types.

\subsubsection*{Ordered Arrays}

\index{arrays} \index{ordered arrays} Parrot provides several ordered array PMCs, dif\mbox{}ferentiated by whether the array should store booleans, integers, numbers, strings, or other PMCs, and whether the array should maintain a f\mbox{}ixed size or dynamically resize for the number of elements it stores.

The core array types are \texttt{FixedPMCArray}, \texttt{ResizablePMCArray}, \texttt{FixedIntegerArray}, \texttt{ResizableIntegerArray}, \texttt{FixedFloatArray}, \texttt{ResizableFloatArray}, \texttt{FixedStringArray}, \texttt{ResizableStringArray}, \texttt{FixedBooleanArray}, and \texttt{ResizableBooleanArray}. The array types that start with ``Fixed'' have a f\mbox{}ixed size and do not allow elements to be added outside their allocated size. The ``Resizable'' variants automatically extend themselves as more elements are added.\footnote{With some additional overhead for checking array bounds and reallocating array memory.} The array types that include ``String'', ``Integer'', or ``Boolean'' in the name use alternate packing methods for greater memory ef\mbox{}ficiency.

Parrot's core ordered array PMCs all have zero-based integer keys. Extracting or inserting an element into the array uses PIR's standard key syntax, with the key in square brackets after the variable name. An lvalue key sets the value for that key. An rvalue key extracts the value for that key in the aggregate to use as the argument value:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0    = new "ResizablePMCArray" # create a new array object
  $P0[0] = 10                      # set first element to 10
  $P0[1] = $I31                    # set second element to $I31
  $I0    = $P0[0]                  # get the first element\end{verbatim}
\vspace{-6pt}
\normalsize
Setting the array to an integer value directly (without a key) sets the number of elements of the array. Assigning an array directly to an integer retrieves the number of elements of the array.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = 2    # set array size
  $I1 = $P0  # get array size\end{verbatim}
\vspace{-6pt}
\normalsize
This is equivalent to using the \texttt{elements} opcode to retrieve the number of items currently in an array:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  elements $I0, $P0 # get element count\end{verbatim}
\vspace{-6pt}
\normalsize
Some other useful instructions for working with ordered arrays are \texttt{push}, \texttt{pop}, \texttt{shift}, and \texttt{unshift}, to add or remove elements. \texttt{push} and \texttt{pop} work on the end of the array, the highest numbered index. \texttt{shift} and \texttt{unshift} work on the start of the array, adding or removing the zeroth element, and renumbering all the following elements.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  push $P0, 'banana' # add to end
  $S0 = pop $P0      # fetch from end

  unshift $P0, 74    # add to start
  $I0 = shift $P0    # fetch from start\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Associative Arrays}

\index{associative arrays} \index{hashes} \index{dictionaries} An associative array is an unordered aggregate that uses string keys to identify elements. You may know them as ``hash tables'', ``hashes'', ``maps'', or ``dictionaries''. Parrot provides one core associative array PMC, called \texttt{Hash}. String keys work very much like integer keys. An lvalue key sets the value of an element, and an rvalue key extracts the value of an element. The string in the key must always be in single or double quotes.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  new $P1, "Hash"          # create a new associative array
  $P1["key"] = 10          # set key and value
  $I0        = $P1["key"]  # get value for key\end{verbatim}
\vspace{-6pt}
\normalsize
Assigning a \texttt{Hash}\index{Hash PMC} PMC (without a key) to an integer result fetches the number of elements in the hash.\footnote{You may not set a \texttt{Hash} PMC directly to an integer value.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I1 = $P1         # number of entries\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{exists}\index{exists opcode} opcode tests whether a keyed value exists in an aggregate. It returns 1 if it f\mbox{}inds the key in the aggregate and 0 otherwise. It doesn't care if the value itself is true or false, only that an entry exists for that key:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  new $P0, "Hash"
  $P0["key"] = 0
  exists $I0, $P0["key"] # does a value exist at "key"?
  say $I0                # prints 1\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{delete}\index{delete opcode} opcode removes an element from an associative array:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  delete $P0["key"]\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Iterators}

\index{iterators} \index{PMCs (Polymorphic Containers); iterators} An iterator extracts values from an aggregate PMC one at a time. Iterators are most useful in loops which perform an action on every element in an aggregate. The \texttt{iter} opcode creates a new iterator from an aggregate PMC. It takes one argument, the PMC over which to iterate:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = iter $P2\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{shift}\index{shift opcode} opcode extracts the next value from the iterator.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      $P5 = shift $P1\end{verbatim}
\vspace{-6pt}
\normalsize
Evaluating the iterator PMC as a boolean returns whether the iterator has reached the end of the aggregate:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      if $P1 goto iter_repeat\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot provides predef\mbox{}ined constants for working with iterators. \texttt{.ITERATE\_FROM\_START} and \texttt{.ITERATE\_FROM\_END} constants select whether an ordered array iterator starts from the beginning or end of the array. These two constants have no ef\mbox{}fect on associative array iterators, as their elements are unordered.

Load the iterator constants with the \texttt{.include}\index{.include directive} directive to include the f\mbox{}ile \emph{iterator.pasm}. To use them, set the iterator PMC to the value of the constant:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      .include "iterator.pasm"

      # ...

      $P1 = .ITERATE_FROM_START\end{verbatim}
\vspace{-6pt}
\normalsize
With all of those separate pieces in one place, this example loads the iterator constants, creates an ordered array of ``a'', ``b'', ``c'', creates an iterator from that array, and then loops over the iterator using a conditional \texttt{goto} to checks the boolean value of the iterator and another unconditional \texttt{goto}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      .include "iterator.pasm"
      $P2 = new "ResizablePMCArray"
      push $P2, "a"
      push $P2, "b"
      push $P2, "c"

      $P1 = iter $P2
      $P1 = .ITERATE_FROM_START

  iter_loop:
      unless $P1 goto iter_end
      $P5 = shift $P1
      say $P5                        # prints "a", "b", "c"
      goto iter_loop
  iter_end:\end{verbatim}
\vspace{-6pt}
\normalsize
Associative array iterators work similarly to ordered array iterators. When iterating over associative arrays, the \texttt{shift} opcode extracts keys instead of values. The key looks up the value in the original hash PMC.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      $P2      = new "Hash"
      $P2["a"] = 10
      $P2["b"] = 20
      $P2["c"] = 30

      $P1      = iter $P2

  iter_loop:
      unless $P1 goto iter_end
      $S5 = shift $P1          # the key "a", "b", or "c"
      $I9 = $P2[$S5]           # the value 10, 20, or 30
      say $I9
      goto iter_loop
  iter_end:\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates an associative array \texttt{\$P2} that contains three keys ``a'', ``b'', and ``c'', assigning them the values 10, 20, and 30. It creates an iterator (\texttt{\$P1}) from the associative array using the \texttt{iter} opcode, and then starts a loop over the iterator. At the start of each loop, the \texttt{unless} instruction checks whether the iterator has any more elements. If there are no more elements, \texttt{goto} jumps to the end of the loop, marked by the label \texttt{iter\_end}. If there are more elements, the \texttt{shift} opcode extracts the next key. Keyed assignment stores the integer value of the element indexed by the key in \texttt{\$I9}. After printing the integer value, \texttt{goto} jumps back to the start of the loop, marked by \texttt{iter\_loop}.

\subsubsection*{Multi-level Keys}

\index{keys} \index{multi-level keys} Aggregates can hold any data type, including other aggregates. Accessing elements deep within nested data structures is a common operation, so PIR provides a way to do it in a single instruction. Complex keys specify a series of nested data structures, with each individual key separated by a semicolon.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0           = new "Hash"
  $P1           = new "ResizablePMCArray"
  $P1[2]        = 42
  $P0["answer"] = $P1

  $I1 = 2
  $I0 = $P0["answer";$I1]
  say $I0\end{verbatim}
\vspace{-6pt}
\normalsize
This example builds up a data structure of an associative array containing an ordered array. The complex key \texttt{["answer''; \$I1]} retrieves an element of the array within the hash. You can also set a value using a complex key:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0["answer";0] = 5\end{verbatim}
\vspace{-6pt}
\normalsize
The individual keys are integer or string literals, or variables with integer or string values.

\subsubsection*{Copying and Cloning}

\index{PMCs (Polymorphic Containers); copying vs. cloning} PMC registers don't directly store the data for a PMC, they only store a pointer to the structure that stores the data. As a result, the \texttt{=} operator doesn't copy the entire PMC, it only copies the pointer to the PMC data. If you later modify the copy of the variable, it will also modify the original.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "String"
  $P0 = "Ford"
  $P1 = $P0
  $P1 = "Zaphod"
  say $P0                # prints "Zaphod"
  say $P1                # prints "Zaphod"\end{verbatim}
\vspace{-6pt}
\normalsize
In this example, \texttt{\$P0} and \texttt{\$P1} are both pointers to the same internal data structure. Setting \texttt{\$P1} to the string literal ``Zaphod'', it overwrites the previous value ``Ford''. Both \texttt{\$P0} and \texttt{\$P1} refer to the \texttt{String} PMC ``Zaphod''.

The \texttt{clone} \index{clone opcode} opcode makes a deep copy of a PMC, instead of copying the pointer like \texttt{=}\index{= operator} does.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "String"
  $P0 = "Ford"
  $P1 = clone $P0
  $P0 = "Zaphod"
  say $P0        # prints "Zaphod"
  say $P1        # prints "Ford"\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates an identical, independent clone of the PMC in \texttt{\$P0} and puts it in \texttt{\$P1}. Later changes to \texttt{\$P0} have no ef\mbox{}fect on the PMC in \texttt{\$P1}.\footnote{With low-level strings, the copies created by \texttt{clone} are copy-on-write\index{copy-on-write} exactly the same as the copy created by \texttt{=}.}

To assign the \emph{value} of one PMC to another PMC that already exists, use the \texttt{assign}\index{assign opcode} opcode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "Integer"
  $P1 = new "Integer"
  $P0 = 42
  assign $P1, $P0    # note: $P1 must exist already
  inc $P0
  say $P0            # prints 43
  say $P1            # prints 42\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates two \texttt{Integer} PMCs, \texttt{\$P1} and \texttt{\$P2}, and gives the f\mbox{}irst one the value 42. It then uses \texttt{assign} to pass the same integer value on to \texttt{\$P1}. Though \texttt{\$P0} increments, \texttt{\$P1} doesn't change. The result for \texttt{assign} must have an existing object of the right type in it, because \texttt{assign} neither creates a new duplicate object (as does \texttt{clone}) or reuses the source object (as does \texttt{=}).

\subsubsection*{Properties}

\index{properties} \index{PMCs (Polymorphic Containers); properties}

PMCs can have additional values attached to them as ``properties'' of the PMC. Most properties hold extra metadata about the PMC.

The \texttt{setprop}\index{setprop opcode} opcode sets the value of a named property on a PMC. It takes three arguments: the PMC on which to set a property, the name of the property, and a PMC containing the value of the property.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  setprop $P0, "name", $P1\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{getprop}\index{getprop opcode} opcode returns the value of a property. It takes two arguments: the name of the property and the PMC from which to retrieve the property value.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P2 = getprop "name", $P0\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates a \texttt{String} object in \texttt{\$P0} and an \texttt{Integer} object with the value 1 in \texttt{\$P1}. \texttt{setprop} sets a property named ``eric'' on the object in \texttt{\$P0} and gives the property the value of \texttt{\$P1}. \texttt{getprop} retrieves the value of the property ``eric'' on \texttt{\$P0} and stores it in \texttt{\$P2}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "String"
  $P0 = "Half-a-Bee"
  $P1 = new "Integer"
  $P1 = 1

  setprop $P0, "eric", $P1  # set a property on $P0
  $P2 = getprop "eric", $P0 # retrieve a property from $P0

  say $P2                   # prints 1\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot stores PMC properties in an associative array where the name of the property is the key.

\texttt{delprop}\index{delprop opcode} deletes a property from a PMC.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  delprop $P1, "constant" # delete property\end{verbatim}
\vspace{-6pt}
\normalsize
You can fetch a complete hash of all properties on a PMC with \texttt{prophash}\index{prophash opcode}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = prophash $P1 # set $P0 to the property hash of $P1\end{verbatim}
\vspace{-6pt}
\normalsize
Fetching the value of a non-existent property returns an \texttt{Undef} PMC.

\subsubsection*{Vtable Functions}

\index{vtable functions} You may have noticed that a simple operation sometimes has a dif\mbox{}ferent ef\mbox{}fect on dif\mbox{}ferent PMCs. Assigning a low-level integer value to a \texttt{Integer} PMC sets its integer value of the PMC, but assigning that same integer to an ordered array sets the size of the array.

Every PMC def\mbox{}ines a standard set of low-level operations called vtable functions. When you perform an assignment like:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
   $P0 = 5\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots Parrot calls the \texttt{set\_integer\_native} vtable function on the PMC referred to by register \texttt{\$P0}.

\index{polymorphic substitution} Parrot has a f\mbox{}ixed set of vtable functions, so that any PMC can stand in for any other PMC; they're polymorphic.\footnote{Hence the name ``Polymorphic Container''.} Every PMC def\mbox{}ines some behavior for every vtable function. The default behavior is to throw an exception reporting that the PMC doesn't implement that vtable function. The full set of vtable functions for a PMC def\mbox{}ines the PMC's basic interface, but PMCs may also def\mbox{}ine methods to extend their behavior beyond the vtable set.

\subsection*{Namespaces}

\index{namespaces} \index{global variables} Parrot performs operations on variables stored in small register sets local to each subroutine. For more complex tasks,\footnote{\ldots and for most high-level languages that Parrot supports.} it's also useful to have variables that live beyond the scope of a single subroutine. These variables may be global to the entire program or restricted to a particular library. Parrot stores long-lived variables in a hierarchy of namespaces.

The opcodes \texttt{set\_global}\index{set\_global opcode} and \texttt{get\_global}\index{get\_global opcode} store and fetch a variable in a namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "String"
  $P0 = "buzz, buzz"
  set_global "bee", $P0
  # ...
  $P1 = get_global "bee"
  say $P1                        # prints "buzz, buzz"\end{verbatim}
\vspace{-6pt}
\normalsize
The f\mbox{}irst two statements in this example create a \texttt{String} PMC in \texttt{\$P0} and assign it a value. In the third statement, \texttt{set\_global} stores that PMC as the named global variable \texttt{bee}. At some later point in the program, \texttt{get\_global} retrieves the global variable by name, and stores it in \texttt{\$P1} to print.

Namespaces can only store PMC variables. Parrot boxes all primitive integer, number, or string values into the corresponding PMCs before storing them in a namespace.

The name of every variable stored in a particular namespace must be unique. You can't have store both an \texttt{Integer} PMC and an array PMC both named ``bee'', stored in the same namespace.\footnote{You may wonder why anyone would want to do this. We wonder the same thing, but Perl 5 does it all the time. The Perl 6 implementation on Parrot includes type sigils in the names of the variables it stores in namespaces so each name is unique, e.g. \texttt{\$bee}, \texttt{@bee}\ldots .}

\subsubsection*{Namespace Hierarchy}

\index{hierarchical namespaces} \index{namespaces; hierarchy}

A single global namespace would be far too limiting for most languages or applications. The risk of accidental collisions---where two libraries try to use the same name for some variable---would be quite high for larger code bases. Parrot maintains a collection of namespaces arranged as a tree, with the \texttt{parrot} namespace as the root. Every namespace you declare is a child of the \texttt{parrot} namespace (or a child of a child\ldots .).

The \texttt{set\_global} and \texttt{get\_global} opcodes both have alternate forms that take a key name to access a variable in a particular namespace within the tree. This code example stores a variable as \texttt{bill} in the Duck namespace and retrieves it again:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  set_global ["Duck"], "bill", $P0
  $P1 = get_global ["Duck"], "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
The key name for the namespace can have multiple levels, which correspond to levels in the namespace hierarchy. This example stores a variable as \texttt{bill} in the Electric namespace under the General namespace in the hierarchy.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  set_global ["General";"Electric"], "bill", $P0
  $P1 = get_global ["General";"Electric"], "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{root namespace} \index{namespaces; root}

The \texttt{set\_global} and \texttt{get\_global} opcode operate on the currently selected namespace. The default top-level namespace is the ``root'' namespace. The \texttt{.namespace}\index{.namespace directive} directive allows you to declare any namespace for subsequent code. If you select the General Electric namespace, then store or retrieve the \texttt{bill} variable without specifying a namespace, you will work with the General Electric bill, not the Duck bill.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .namespace ["General";"Electric"]
  #...
  set_global "bill", $P0
  $P1 = get_global "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
Passing an empty key to the \texttt{.namespace} directive resets the selected namespace to the root namespace. The brackets are required even when the key is empty.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .namespace [ ]\end{verbatim}
\vspace{-6pt}
\normalsize
When you need to be absolutely sure you're working with the root namespace regardless of what namespace is currently active, use the \texttt{set\_root\_global}\index{set\_root\_global opcode} and \texttt{get\_root\_global}\index{get\_root\_global opcode} opcodes instead of \texttt{set\_global} and \texttt{get\_global}. This example sets and retrieves the variable \texttt{bill} in the Dollar namespace, which is directly under the root namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  set_root_global ["Dollar"], "bill", $P0
  $P1 = get_root_global ["Dollar"], "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{HLL namespaces} \index{namespaces; hll} To prevent further collisions, each high-level language running on Parrot operates within its own virtual namespace root. The default virtual root is \texttt{parrot}, and the \texttt{.HLL}\index{.HLL directive} directive (for \emph{H}igh-\emph{L}evel \emph{L}anguage) selects an alternate virtual root for a particular high-level language:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .HLL 'ruby'\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{set\_hll\_global}\index{set\_hll\_global opcode} and \texttt{get\_hll\_global}\index{get\_hll\_global opcode} opcodes are like \texttt{set\_root\_global} and \texttt{get\_root\_global}, except they always operate on the virtual root for the currently selected HLL. This example stores and retrieves a \texttt{bill} variable in the Euro namespace, under the Dutch HLL namespace root:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .HLL 'Dutch'
  #...
  set_hll_global ["Euro"], "bill", $P0
  $P1 = get_hll_global ["Euro"], "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{NameSpace PMC}

\index{NameSpace PMC} Namespaces are just PMCs. They implement the standard vtable functions and a few extra methods. The \texttt{get\_namespace}\index{get\_namespace opcode} opcode retrieves the currently selected namespace as a PMC object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_namespace\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{get\_root\_namespace}\index{get\_root\_namespace opcode} opcode retrieves the namespace object for the root namespace. The \texttt{get\_hll\_namespace}\index{get\_hll\_namespace opcode} opcode retrieves the virtual root for the currently selected HLL.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_root_namespace
  $P0 = get_hll_namespace\end{verbatim}
\vspace{-6pt}
\normalsize
Each of these three opcodes can take a key argument to retrieve a namespace under the currently selected namespace, root namespace, or HLL root namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_namespace ["Duck"]
  $P0 = get_root_namespace ["General";"Electric"]
  $P0 = get_hll_namespace ["Euro"]\end{verbatim}
\vspace{-6pt}
\normalsize
Once you have a namespace object you can use it to retrieve variables from the namespace instead of using a keyed lookup. This example f\mbox{}irst looks up the Euro namespace in the currently selected HLL, then retrieves the \texttt{bill} variable from that namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_hll_namespace ["Euro"]
  $P1 = get_global $P0, "bill"\end{verbatim}
\vspace{-6pt}
\normalsize
Namespaces also provide a set of methods to provide more complex behavior than the standard vtable functions allow. The \texttt{get\_name}\index{get\_name method} method returns the name of the namespace as a \texttt{ResizableStringArray}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P3 = $P0.'get_name'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{get\_parent}\index{get\_parent method} method retrieves a namespace object for the parent namespace that contains this one:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P5 = $P0.'get_parent'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{get\_class}\index{get\_class method} method retrieves any Class PMC associated with the namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P6 = $P0.'get_class'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{add\_var}\index{add\_var method} and \texttt{f\mbox{}ind\_var}\index{f\mbox{}ind\_var method} methods store and retrieve variables in a namespace in a language-neutral way:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'add_var'("bee", $P3)
  $P1 = $P0.'find_var'("bee")\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{f\mbox{}ind\_namespace}\index{f\mbox{}ind\_namespace method} method looks up a namespace, just like the \texttt{get\_namespace} opcode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = $P0.'find_namespace'("Duck")\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{add\_namespace}\index{add\_namespace method} method adds a new namespace as a child of the namespace object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'add_namespace'($P1)\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{make\_namespace}\index{make\_namespace method} method looks up a namespace as a child of the namespace object and returns it. If the requested namespace doesn't exist, \texttt{make\_namespace} creates a new one and adds it under that name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = $P0.'make_namespace'("Duck")\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Aliasing}

\index{aliasing} Just like regular assignment, the various operations to store a variable in a namespace only store a pointer to the PMC. If you modify the local PMC after storing in a namespace, those changes will also appear in the stored global. To store a true copy of the PMC, \texttt{clone} it before you store it.

Leaving the global variable as an alias for a local variable has its advantages. If you retrieve a stored global into a register and modify it:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = get_global "feather"
  inc $P1\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots you modify the value of the stored global, so you don't need to call \texttt{set\_global} again.

\section{Control Structures}

The semantics of control structures in high-level languages vary broadly. Rather than dictating one particular set of semantics for control structures, or attempting to provide multiple implementations of common control structures to f\mbox{}it the semantics of all major target languages, PIR provides a simple set of conditional and unconditional branch instructions.\footnote{In fact, all control structures in all languages ultimately compile down to conditional and unconditional branches, so you're just getting a peek into the inner workings of your software.}

\subsection*{Conditionals and Unconditionals}

\index{goto instruction} \index{unconditional branch} An unconditional branch always jumps to a specif\mbox{}ied label. PIR has only one unconditional branch instruction, \texttt{goto}. In this example, the f\mbox{}irst \texttt{say} statement never runs because the \texttt{goto} always skips over it to the label \texttt{skip\_all\_that}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      goto skip_all_that
      say "never printed"

  skip_all_that:
      say "after branch"\end{verbatim}
\vspace{-6pt}
\normalsize
\index{conditional branch} A conditional branch jumps to a specif\mbox{}ied label only when a particular condition is true. The condition may be as simple as checking the truth of a particular variable or as complex as a comparison operation.

In this example, the \texttt{if/goto}\index{if instruction} skips to the label \texttt{maybe\_skip} only if the value stored in \texttt{\$I0} is true. If \texttt{\$I0} is false, it will print ``might be printed'' and then print ``after branch'':

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      if $I0 goto maybe_skip
      say "might be printed"
  maybe_skip:
      say "after branch"\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Boolean Truth}

\index{boolean truth} Parrot's \texttt{if} and \texttt{unless} instructions evaluate a variable as a boolean to decide whether to jump. In PIR, an integer is false if it's 0 and true if it's any non-zero value. A number is false if it's 0.0 and true otherwise. A string is false if it's the empty string (\texttt{``''}) or a string containing only a zero (\texttt{``0''}), and true otherwise. Evaluating a PMC as a boolean calls the vtable function \texttt{get\_bool}\index{get\_bool vtable function} to check if it's true or false, so each PMC is free to determine what its boolean value should be.

\subsubsection*{Comparisons}

\index{comparison operators} In addition to a simple check for the truth of a variable, PIR provides a collection of comparison operations for conditional branches. These jump when the comparison is true.

This example compares \texttt{\$I0} to \texttt{\$I1} and jumps to the label \texttt{success} if \texttt{\$I0} is less than \texttt{\$I1}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      if $I0 < $I1 goto success
      say "comparison false"
  success:
      say "comparison true"\end{verbatim}
\vspace{-6pt}
\normalsize
The full set of comparison operators in PIR are \texttt{==} (equal), \texttt{!=} (not equal), \texttt{<} (less than), \texttt{<=} (less than or equal), \texttt{>} (greater than), and \texttt{>=} (greater than or equal).

\subsubsection*{Complex Conditions}

PIR disallows nested expressions. You cannot embed a statement within another statement. If you have a more complex condition than a simple truth test or comparison, you must build up your condition with a series of instructions that produce a f\mbox{}inal, single truth value.

This example performs two operations, addition and multiplication, then uses \texttt{and}\index{and opcode} to check if the results of both operations were true. The \texttt{and} opcode stores a boolean value (0 or 1) in the integer variable \texttt{\$I2}; the code uses this value in an ordinary truth test:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $I0 = 4 + 5
    $I1 = 63 * 0
    $I2 = and $I0, $I1

    if $I2 goto true
    say "maybe printed"
  true:\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{If/Else Construct}

\texttt{if control structure} High-level languages often use the keywords \emph{if} and \emph{else} for simple conditional control structures. These control structures perform an action when a condition is true and skip the action when the condition is false. PIR's \texttt{if} instruction can build up simple conditionals.

This example checks the truth of the condition \texttt{\$I0}. If \texttt{\$I0} is true, it jumps to the \texttt{do\_it} label, and runs the body of the conditional construct. If \texttt{\$I0} is false, it continues on to the next statement, a \texttt{goto} instruction that skips over the body of the conditional to the label \texttt{dont\_do\_it}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    if $I0 goto do_it
    goto dont_do_it
  do_it:
    say "in the body of the if"
  dont_do_it:\end{verbatim}
\vspace{-6pt}
\normalsize
The control f\mbox{}low of this example may seem backwards. In a high-level language, \emph{if} often means \emph{``if the condition is true, run the next few lines of code''}. In an assembly language, it's often more straightforward to write \emph{``if the condition is true, \textbf{skip} the next few lines of code''}. Because of the reversed logic, you may f\mbox{}ind it easier to build a simple conditional construct using the \texttt{unless}\index{unless instruction} instruction instead of \texttt{if}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    unless $I0 goto dont_do_it
    say "in the body of the if"
  dont_do_it:\end{verbatim}
\vspace{-6pt}
\normalsize
This example produces the same output as the previous example, but the logic is simpler. When \texttt{\$I0} is true, \texttt{unless} does nothing and the body of the conditional runs. When \texttt{\$I0} is false, \texttt{unless} skips over the body of the conditional by jumping to \texttt{dont\_do\_it}.

\texttt{else control structure} An \emph{if/else} control structure is easier to build using the \texttt{if} instruction than \texttt{unless}. To build an \emph{if/else}, insert the body of the else right after the f\mbox{}irst \texttt{if} instruction.

This example checks if \texttt{\$I0} is true. If so, it jumps to the label \texttt{true} and runs the body of the \emph{if} construct. If \texttt{\$I0} is false, the \texttt{if} instruction does nothing, and the code continues to the body of the \emph{else} construct. When the body of the else has f\mbox{}inished, the \texttt{goto} jumps to the end of the \emph{if/else} control structure by skipping over the body of the \emph{if} construct:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    if $I0 goto true
    say "in the body of the else"
    goto done
  true:
    say "in the body of the if"
  done:\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Switch Construct}

\index{switch control structure} A \emph{switch} control structure selects one action from a list of possible actions by comparing a single variable to a series of values until it f\mbox{}inds one that matches. The simplest way to achieve this in PIR is with a series of \texttt{unless} instructions:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $S0 = 'a'

  option1:
    unless $S0 == 'a' goto option2
    say "matched: a"
    goto end_of_switch

  option2:
    unless $S0 == 'b' goto default
    say "matched: b"
    goto end_of_switch

  default:
    say "I don't understand"

  end_of_switch:\end{verbatim}
\vspace{-6pt}
\normalsize
This example uses \texttt{\$S0} as the \emph{case} of the switch construct. It compares that case against the f\mbox{}irst value \texttt{a}. If they match, it prints the string ``matched: a'', then jumps to the end of the switch at the label \texttt{end\_of\_switch}. If the f\mbox{}irst case doesn't match \texttt{a}, the \texttt{goto} jumps to the label \texttt{option2} to check the second option. The second option compares the case against the value \texttt{b}. If they match, it prints the string ``matched: b'', then jumps to the end of the switch. If the case doesn't match the second option, the \texttt{goto} goes on to the default case, prints ``I don't understand'', and continues to the end of the switch.

\subsection*{Do-While Loop}

A \emph{do-while}\index{do-while loop} loop runs the body of the loop once, then checks a condition at the end to decide whether to repeat it. A single conditional branch can build this style of loop:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $I0 = 0                 # counter

  redo:                     # start of loop
    inc $I0
    say $I0
    if $I0 < 10 goto redo   # end of loop\end{verbatim}
\vspace{-6pt}
\normalsize
This example prints the numbers 1 to 10. The f\mbox{}irst time through, it executes all statements up to the \texttt{if} instruction. If the condition evaluates as true (\texttt{\$I0} is less than 10), it jumps to the \texttt{redo} label and runs the loop body again. The loop ends when the condition evaluates as false.

Here's a slightly more complex example that calculates the factorial \texttt{5!}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      .local int product, counter

      product = 1
      counter = 5

  redo:                         # start of loop
      product *= counter
      dec counter
      if counter > 0 goto redo  # end of loop

      say product\end{verbatim}
\vspace{-6pt}
\normalsize
Each time through the loop it multiplies \texttt{product} by the current value of the \texttt{counter}, decrements the counter, and jumps to the start of the loop. The loop ends when \texttt{counter} has counted down to 0.

\subsection*{While Loop}

\index{while loop} A \emph{while} loop tests the condition at the start of the loop instead of at the end. This style of loop needs a conditional branch combined with an unconditional branch. This example also calculates a factorial, but with a \emph{while} loop:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      .local int product, counter
      product = 1
      counter = 5

  redo:                         # start of loop
      if counter <= 0 goto end_loop
      product *= counter
      dec counter
      goto redo
  end_loop:                     # end of loop

      say product\end{verbatim}
\vspace{-6pt}
\normalsize
This code tests the counter \texttt{counter} at the start of the loop to see if it's less than or equal to 0, then multiplies the current product by the counter and decrements the counter. At the end of the loop, it unconditionally jumps back to the start of the loop and tests the condition again. The loop ends when the counter \texttt{counter} reaches 0 and the \texttt{if} jumps to the \texttt{end\_loop} label. If the counter is a negative number or zero before the loop starts the f\mbox{}irst time, the body of the loop will never execute.

\subsection*{For Loop}

\index{for loop} A \emph{for} loop is a counter-controlled loop with three declared components: a starting value, a condition to determine when to stop, and an operation to step the counter to the next iteration. A \emph{for} loop in C looks something like:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  for (i = 1; i <= 10; i++) {
    ...
  }\end{verbatim}
\vspace{-6pt}
\normalsize
where \texttt{i} is the counter, \texttt{i = 1} sets the start value, \texttt{i <= 10} checks the stop condition, and \texttt{i++} steps to the next iteration. A \emph{for} loop in PIR requires one conditional branch and two unconditional branches.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  loop_init:
    .local int counter
    counter = 1

  loop_test:
    if counter <= 10 goto loop_body
    goto loop_end

  loop_body:
    say counter

  loop_continue:
    inc counter
    goto loop_test

  loop_end:\end{verbatim}
\vspace{-6pt}
\normalsize
The f\mbox{}irst time through the loop, this example sets the initial value of the counter in \texttt{loop\_init}. It then goes on to test that the loop condition is met in \texttt{loop\_test}. If the condition is true (\texttt{counter} is less than or equal to 10) it jumps to \texttt{loop\_body} and executes the body of the loop. If the the condition is false, it will jump straight to \texttt{loop\_end} and the loop will end. The body of the loop prints the current counter then goes on to \texttt{loop\_continue}, which increments the counter and jumps back up to \texttt{loop\_test} to continue on to the next iteration. Each iteration through the loop tests the condition and increments the counter, ending the loop when the condition is false. If the condition is false on the very f\mbox{}irst iteration, the body of the loop will never run.

\section{Subroutines}

\index{subroutines} Subroutines in PIR are roughly equivalent to the subroutines or methods of a high-level language. They're the most basic building block of code reuse in PIR. Each high-level language has dif\mbox{}ferent syntax and semantics for def\mbox{}ining and calling subroutines, so Parrot's subroutines need to be f\mbox{}lexible enough to handle a broad array of behaviors.

A subroutine declaration starts with the \texttt{.sub}\index{.sub directive} directive and ends with the \texttt{.end}\index{.end directive} directive. This example def\mbox{}ines a subroutine named \texttt{hello} that prints a string ``Hello, Polly.'':

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'hello'
      say "Hello, Polly."
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The quotes around the subroutine name are optional as long as the name of the subroutine uses only plain alphanumeric ASCII characters. You must use quotes if the subroutine name uses Unicode characters, characters from some other character set or encoding, or is otherwise an invalid PIR identif\mbox{}ier.

A subroutine call consists of the name of the subroutine to call followed by a list of (zero or more) arguments in parentheses. You may precede the call with a list of (zero or more) return values. This example calls the subroutine \texttt{fact} with two arguments and assigns the result to \texttt{\$I0}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = 'fact'(count, product)\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Modif\mbox{}iers}

\index{modif\mbox{}iers} \index{subroutines; modif\mbox{}iers} A modif\mbox{}ier is an annotation to a basic subroutine declaration\footnote{or parameter declaration} that selects an optional feature. Modif\mbox{}iers all start with a colon (\texttt{:}). A subroutine can have multiple modif\mbox{}iers.

When you execute a PIR f\mbox{}ile as a program, Parrot normally runs the f\mbox{}irst subroutine it encounters, but you can mark any subroutine as the f\mbox{}irst one to run with the \texttt{:main}\index{:main subroutine modif\mbox{}ier} modif\mbox{}ier:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'first'
      say "Polly want a cracker?"
  .end

  .sub 'second' :main
      say "Hello, Polly."
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
This code prints ``Hello, Polly.'' but not ``Polly want a cracker?''. The \texttt{f\mbox{}irst} subroutine is f\mbox{}irst in the source code, but \texttt{second} has the \texttt{:main} modif\mbox{}ier. Parrot will never call \texttt{f\mbox{}irst} in this program. If you remove the \texttt{:main} modif\mbox{}ier, the code will print ``Polly want a cracker?'' instead.

The \texttt{:load}\index{:load subroutine modif\mbox{}ier} modif\mbox{}ier tells Parrot to run the subroutine when it loads the current f\mbox{}ile as a library. The \texttt{:init}\index{:init subroutine modif\mbox{}ier} modif\mbox{}ier tells Parrot to run the subroutine only when it executes the f\mbox{}ile as a program (and \emph{not} as a library). The \texttt{:immediate}\index{:immediate subroutine modif\mbox{}ier} modif\mbox{}ier tells Parrot to run the subroutine as soon as it gets compiled. The \texttt{:postcomp}\index{:postcomp subroutine modif\mbox{}ier} modif\mbox{}ier also runs the subroutine right after compilation, but only if the subroutine was declared in the main program f\mbox{}ile (when \emph{not} loaded as a library).

By default, Parrot stores all subroutines in the namespace currently active at the point of their declaration. The \texttt{:anon}\index{:anon subroutine modif\mbox{}ier} modif\mbox{}ier tells Parrot not to store the subroutine in the namespace. The \texttt{:nsentry}\index{:nsentry subroutine modif\mbox{}ier} modif\mbox{}ier stores the subroutine in the currently active namespace with a dif\mbox{}ferent name. For example, Parrot will store this subroutine in the current namespace as \texttt{bar}, not \texttt{foo}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'foo' :nsentry('bar')
    #...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Chapter 7 on \emph{``Classes and Objects''} explains other subroutine modif\mbox{}iers.

\subsection*{Parameters and Arguments}

\index{subroutines; parameters} \index{.param directive} The \texttt{.param} directive def\mbox{}ines the parameters for the subroutine and creates local named variables for them (similar to \texttt{.local}):

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param int c\end{verbatim}
\vspace{-6pt}
\normalsize
\index{.return directive} The \texttt{.return} directive returns control f\mbox{}low to the calling subroutine. To return results, pass them as arguments to \texttt{.return}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .return($P0)\end{verbatim}
\vspace{-6pt}
\normalsize
This example implements the factorial algorithm using two subroutines, \texttt{main} and \texttt{fact}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  # factorial.pir
  .sub 'main' :main
     .local int count
     .local int product
     count   = 5
     product = 1

     $I0 = 'fact'(count, product)

     say $I0
  .end

  .sub 'fact'
     .param int c
     .param int p

  loop:
     if c <= 1 goto fin
     p = c * p
     dec c
     branch loop
  fin:
     .return (p)
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
This example def\mbox{}ines two local named variables, \texttt{count} and \texttt{product}, and assigns them the values 1 and 5. It calls the \texttt{fact} subroutine with both variables as arguments. The \texttt{fact} subroutine uses the \texttt{.param} directive to retrieve these parameters and the \texttt{.return} directive to return the result. The f\mbox{}inal printed result is 120.

\subsubsection*{Positional Parameters}

\index{positional parameters} The default way of matching the arguments passed in a subroutine call to the parameters def\mbox{}ined in the subroutine's declaration is by position. If you declare three parameters---an integer, a number, and a string:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'foo'
    .param int a
    .param num b
    .param string c
    # ...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots then calls to this subroutine must also pass three arguments---an integer, a number, and a string:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  'foo'(32, 5.9, "bar")\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot will assign each argument to the corresponding parameter in order from f\mbox{}irst to last. Changing the order of the arguments or leaving one out is an error.

\subsubsection*{Named Parameters}

\index{named parameters} Named parameters are an alternative to positional parameters. Instead of passing parameters by their position in the string, Parrot assigns arguments to parameters by their name. Consequently you may pass named parameters in any order. Declare named parameters with with the \texttt{:named}\index{:named parameter modif\mbox{}ier} modif\mbox{}ier.

This example declares two named parameters in the subroutine \texttt{shoutout}---\texttt{name} and \texttt{years}---each declared with the \texttt{:named} modif\mbox{}ier and followed by the name to use when pass arguments. The string name can match the parameter name (as with the \texttt{name} parameter), but it can also be dif\mbox{}ferent (as with the \texttt{years} parameter):

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 .sub 'shoutout'
    .param string name :named("name")
    .param string years :named("age")
    $S0  = "Hello " . name
    $S1  = "You are " . years
    $S1 .= " years old"
    say $S0
    say $S1
 .end\end{verbatim}
\vspace{-6pt}
\normalsize
Pass named arguments to a subroutine as a series of name/value pairs, with the elements of each pair separated by an arrow \texttt{=>}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 .sub 'main' :main
    'shoutout'("age" => 42, "name" => "Bob")
 .end\end{verbatim}
\vspace{-6pt}
\normalsize
The order of the arguments does not matter:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 .sub 'main' :main
    'shoutout'("name" => "Bob", "age" => 42)
 .end\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Optional Parameters}

\index{optional parameters} Another alternative to the required positional parameters is optional parameters. Some parameters are unnecessary for certain calls. Parameters marked with the \texttt{:optional}\index{:optional parameter modif\mbox{}ier} modif\mbox{}ier do not produce errors about invalid parameter counts if they are not present. A subroutine with optional parameters should gracefully handle the missing argument, either by providing a default value or by performing an alternate action that doesn't need that value.

Checking the value of the optional parameter isn't enough to know whether the call passed such an argument, because the user might have passed a null or false value intentionally. PIR also provides an \texttt{:opt\_f\mbox{}lag}\index{:opt\_f\mbox{}lag parameter modif\mbox{}ier} modif\mbox{}ier for a boolean check whether the caller passed an argument:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param string name     :optional
  .param int    has_name :opt_flag\end{verbatim}
\vspace{-6pt}
\normalsize
When an integer parameter with the \texttt{:opt\_f\mbox{}lag} modif\mbox{}ier immediately follows an \texttt{:optional} parameter, it will be true if the caller passed the argument and false otherwise.

This example demonstrates how to provide a default value for an optional parameter:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    .param string name     :optional
    .param int    has_name :opt_flag

    if has_name goto we_have_a_name
    name = "default value"
  we_have_a_name:\end{verbatim}
\vspace{-6pt}
\normalsize
When the \texttt{has\_name} parameter is true, the \texttt{if} control statement jumps to the \texttt{we\_have\_a\_name} label, leaving the \texttt{name} parameter unmodif\mbox{}ied. When \texttt{has\_name} is false (when the caller passed no argument for \texttt{name}) the \texttt{if} statement does nothing. The next line sets the \texttt{name} parameter to a default value.

The \texttt{:opt\_f\mbox{}lag} parameter never takes an argument from the passed-in argument list. It's purely for bookkeeping within the subroutine.

Optional parameters can be positional or named parameters. Optional parameters must appear at the end of the list of positional parameters after all the required parameters. An optional parameter must immediately precede its \texttt{:opt\_f\mbox{}lag} parameter whether it's named or positional:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'question'
    .param int value     :named("answer") :optional
    .param int has_value :opt_flag
    #...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
You can call this subroutine with a named argument or with no argument:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  'question'("answer" => 42)
  'question'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Aggregating Parameters}

\index{aggregating parameters} \index{:slurpy parameter modif\mbox{}ier} Another alternative to a sequence of positional parameters is an aggregating parameter which bundles a list of arguments into a single parameter. The \texttt{:slurpy} modif\mbox{}ier creates a single array parameter containing all the provided arguments:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param pmc args :slurpy
  $P0 = args[0]           # first argument
  $P1 = args[1]           # second argument\end{verbatim}
\vspace{-6pt}
\normalsize
As an aggregating parameter will consume all subsequent parameters, you may use an aggregating parameter with other positional parameters only after all other positional parameters:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param string first
  .param int second
  .param pmc the_rest :slurpy

  $P0 = the_rest[0]           # third argument
  $P1 = the_rest[1]           # fourth argument\end{verbatim}
\vspace{-6pt}
\normalsize
When you combine \texttt{:named} and \texttt{:slurpy} on a parameter, the result is a single associative array containing the named arguments passed into the subroutine call:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param pmc all_named :slurpy :named

  $P0 = all_named['name']             # 'name' => 'Bob'
  $P1 = all_named['age']              # 'age'  => 42\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Flattening Arguments}

\index{f\mbox{}lattening arguments} \index{:f\mbox{}lat argument modif\mbox{}ier} A f\mbox{}lattening argument breaks up a single argument to f\mbox{}ill multiple parameters. It's the complement of an aggregating parameter. The \texttt{:f\mbox{}lat} modif\mbox{}ier splits arguments (and return values) into a f\mbox{}lattened list. Passing an array PMC to a subroutine with \texttt{:f\mbox{}lat}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "ResizablePMCArray"
  $P0[0] = "Bob"
  $P0[1] = 42
  'foo'($P0 :flat)\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots allows the elements of that array to f\mbox{}ill the required parameters:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param string name  # Bob
  .param int age      # 42\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Arguments on the Command Line}

\index{command-line arguments}

Arguments passed to a PIR program on the command line are available to the \texttt{:main} subroutine of that program as strings in a \texttt{ResizableStringArray} PMC. If you call a program \emph{args.pir}, passing it three arguments:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ parrot args.pir foo bar baz\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots they will be accessible at index 1, 2, and 3 of the PMC parameter.\footnote{Index 0 is unused.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'main' :main
    .param pmc all_args
    $S1 = all_args[1]   # foo
    $S2 = all_args[2]   # bar
    $S3 = all_args[3]   # baz
    # ...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Because \texttt{all\_args} is a \texttt{ResizableStringArray} PMC, you can loop over the results, access them individually, or even modify them.

\subsection*{Compiling and Loading Libraries}

\index{libraries} In addition to running PIR f\mbox{}iles on the command-line, you can also load a library of pre-compiled bytecode directly into your PIR source f\mbox{}ile. The \texttt{load\_bytecode}\index{load\_bytecode opcode} opcode takes a single argument: the name of the bytecode f\mbox{}ile to load. If you create a f\mbox{}ile named \emph{foo\_f\mbox{}ile.pir} containing a single subroutine:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  # foo_file.pir
  .sub 'foo_sub'              # .sub stores a global sub
     say "in foo_sub"
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots and compile it to bytecode using the \texttt{-o} command-line switch\index{-o command-line switch}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $ parrot -o foo_file.pbc foo_file.pir\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots you can then load the compiled bytecode into \emph{main.pir} and directly call the subroutine def\mbox{}ined in \emph{foo\_f\mbox{}ile.pir}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  # main.pir
  .sub 'main' :main
    load_bytecode "foo_file.pbc"    # compiled foo_file.pir
    foo_sub()
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{load\_bytecode} opcode also works with source f\mbox{}iles, as long as Parrot has a compiler registered for that type of f\mbox{}ile:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  # main2.pir
  .sub 'main' :main
    load_bytecode "foo_file.pir"  # PIR source code
    foo_sub()
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Sub PMC}

\index{Sub PMC} Subroutines are a PMC type in Parrot. You can store them in PMC registers and manipulate them just as you do with other PMCs. Parrot stores subroutines in namespaces; retrieve them with the \texttt{get\_global}\index{get\_global opcode} opcode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_global "my_sub"\end{verbatim}
\vspace{-6pt}
\normalsize
To f\mbox{}ind a subroutine in a dif\mbox{}ferent namespace, f\mbox{}irst look up the appropriate the namespace object, then use that as the f\mbox{}irst parameter to \texttt{get\_global}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_namespace ["My";"Namespace"]
  $P1 = get_global $P0, "my_sub"\end{verbatim}
\vspace{-6pt}
\normalsize
You can invoke a Sub object directly:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0(1, 2, 3)\end{verbatim}
\vspace{-6pt}
\normalsize
You can get or even \emph{change} its name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0               # Get the current name
  $P0 = "my_new_sub"      # Set a new name\end{verbatim}
\vspace{-6pt}
\normalsize
\index{inspect opcode} You can get a hash of the complete metadata for the subroutine:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = inspect $P0\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots which contains the f\mbox{}ields:

\vspace{-5pt}

\begin{itemize}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item pos\_required

The number of required positional parameters

\item pos\_optional

The number of optional positional parameters

\item named\_required

The number of required named parameters

\item named\_optional

The number of optional named parameters

\item pos\_slurpy

True if the sub has an aggregating parameter for positional args

\item named\_slurpy

True if the sub has an aggregating parameter for named args

\end{itemize}

\vspace{-5pt}
Instead of fetching the entire inspection hash, you can also request individual pieces of metadata:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = inspect $P0, "pos_required"\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{arity}\index{arity method} method on the sub object returns the total number of def\mbox{}ined parameters of all varieties:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'arity'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{get\_namespace}\index{get\_namespace method} method on the sub object fetches the namespace PMC which contains the Sub:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = $P0.'get_namespace'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Evaluating a Code String}

\index{code strings, evaluating} One way of producing a code object during a running program is by compiling a code string. In this case, it's a \index{bytecode segment object} bytecode segment object.

The f\mbox{}irst step is to fetch a compiler object for the target language:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = compreg "PIR"\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot registers a compiler for PIR by default, so it's always available. The following example fetches a compiler object for PIR and places it in the named variable \texttt{compiler}. It then generates a code object from a string by calling \texttt{compiler} as a subroutine and places the resulting bytecode segment object into the named variable \texttt{generated} and then invokes it as a subroutine:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .local pmc compiler, generated
  .local string source
  source    = ".sub foo\n$S1 = 'in eval'\nprint $S1\n.end"
  compiler  = compreg "PIR"                
  generated = compiler(source)
  generated()
  say "back again"\end{verbatim}
\vspace{-6pt}
\normalsize
You can register a compiler or assembler for any language inside the Parrot core and use it to compile and invoke code from that language.

In the following example, the \texttt{compreg} opcode registers the subroutine-like object \texttt{\$P10} as a compiler for the language ``MyLanguage'':

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  compreg "MyLanguage", $P10\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Lexicals}

\index{lexical variables} \index{scope} Variables stored in a namespace are global variables. They're accessible from anywhere in the program if you specify the right namespace path. High-level languages also have lexical variables which are only accessible from the local section of code (or \emph{scope}) where they appear, or in a section of code embedded within that scope.\footnote{A scope is roughly equivalent to a block in C.} In PIR, the section of code between a \texttt{.sub} and a \texttt{.end} def\mbox{}ines a scope for lexical variables.

While Parrot stores global variables in namespaces, it stores lexical variables in lexical pads\footnote{Think of a pad like a box to hold a collection of lexical variables.}. Each lexical scope has its own pad. The \texttt{store\_lex} opcode stores a lexical variable in the current pad. The \texttt{f\mbox{}ind\_lex} opcode retrieves a variable from the current pad:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new "Integer"       # create a variable
  $P0 = 10                  # assign value to it
  store_lex "foo", $P0      # store with lexical name "foo"
  # ...
  $P1 = find_lex "foo"      # get the lexical "foo" into $P1
  say $P1                   # prints 10\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{.lex}\index{.lex directive} directive def\mbox{}ines a local variable that follows these scoping rules:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
      .local pmc foo
      .lex 'foo', foo\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{LexPad and LexInfo PMCs}

\index{LexPad PMC} \index{LexInfo PMC} Parrot uses two dif\mbox{}ferent PMCs to store information about a subroutine's lexical variables: the \texttt{LexPad} PMC and the \texttt{LexInfo} PMC. Neither of these PMC types are usable directly from PIR code; Parrot uses them internally to store information about lexical variables.

\texttt{LexInfo} PMCs store information about lexical variables at compile time. Parrot generates this read-only information during compilation to represent what it knows about lexical variables. Not all subroutines get a \texttt{LexInfo} PMC by default; subroutines need to indicate to Parrot that they require a \texttt{LexInfo} PMC. One way to do this is with the \texttt{.lex} directive. Of course, the \texttt{.lex} directive only works for languages that know the names of their lexical variables at compile time. Languages where this information is not available can mark the subroutine with \texttt{:lex} instead.

\texttt{LexPad} PMCs store run-time information about lexical variables. This includes their current values and type information. Parrot creates a new \texttt{LexPad} PMC for subs that have a \texttt{LexInfo} PMC already. It does so for each invocation of the subroutine, which allows for recursive subroutine calls without overwriting lexical variables.

The \texttt{get\_lexinfo}\index{get\_lexinfo method} method on a sub retrieves its associated \texttt{LexInfo} PMC:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = get_global "MySubroutine"
  $P1 = $P0.'get_lexinfo'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{LexInfo} PMC supports a few introspection operations. The \texttt{elements} opcode retrieves the number of elements it contains. String key access operations retrieve entries from the \texttt{LexInfo} PMC as if it were an associative array.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = elements $P1    # number of lexical variables
  $P0 = $P1["name"]     # lexical variable "name"\end{verbatim}
\vspace{-6pt}
\normalsize
There is no easy way to retrieve the current \texttt{LexPad} PMC in a given subroutine, but they are of limited use in PIR.

\subsubsection*{Nested Scopes}

\index{nested lexical scopes} PIR has no separate syntax for blocks or lexical scopes; subroutines def\mbox{}ine lexical scopes in PIR. Because PIR disallows nested \texttt{.sub}/\texttt{.end} declarations, it needs a way to identify which lexical scopes are the parents of inner lexical scopes. The \texttt{:outer}\index{:outer subroutine modif\mbox{}ier} modif\mbox{}ier declares a subroutine as a nested inner lexical scope of another existing subroutine. The modif\mbox{}ier takes one argument, the name of the outer subroutine:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'foo'
    # defines lexical variables
  .end

  .sub 'bar' :outer('foo')
    # can access foo's lexical variables
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Sometimes a name alone isn't suf\mbox{}ficient to uniquely identify the outer subroutine. The \texttt{:subid}\index{:subid subroutine modif\mbox{}ier} modif\mbox{}ier allows the outer subroutine to declare a truly unique name usable with \texttt{:outer}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'foo' :subid('barsouter')
    # defines lexical variables
  .end

  .sub 'bar' :outer('barsouter')
    # can access foo's lexical variables
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{get\_outer}\index{get\_outer method} method on a \texttt{Sub} PMC retrieves its \texttt{:outer} sub.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = $P0.'get_outer'()\end{verbatim}
\vspace{-6pt}
\normalsize
If there is no \texttt{:outer} sub, this will return a null PMC. The \texttt{set\_outer} method on a \texttt{Sub} object sets the \texttt{:outer} sub:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'set_outer'($P1)\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{Scope and Visibility}

High-level languages such as Perl, Python, and Ruby allow nested scopes, or blocks within blocks that have their own lexical variables. This construct is common even in C:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  {
      int x = 0;
      int y = 1;
      {
          int z = 2;
          /* x, y, and z are all visible here */
      }

      /* only x and y are visible here */
  }\end{verbatim}
\vspace{-6pt}
\normalsize
In the inner block, all three variables are visible. The variable \texttt{z} is only visible inside that block. The outer block has no knowledge of \texttt{z}. A na\"ive translation of this code to PIR might be:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .param int x
  .param int y
  .param int z
  x = 0
  y = 1
  z = 2
  #...\end{verbatim}
\vspace{-6pt}
\normalsize
This PIR code is similar, but the handling of the variable \texttt{z} is dif\mbox{}ferent: \texttt{z} is visible throughout the entire current subroutine. It was not visible throughout the entire C function. A more accurate translation of the C scopes uses \texttt{:outer} PIR subroutines instead:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'MyOuter'
      .local pmc x, y
      .lex 'x', x
      .lex 'y', y
      x = new 'Integer'
      x = 10
      'MyInner'()
      # only x and y are visible here
      say y                             # prints 20
  .end

  .sub 'MyInner' :outer('MyOuter')
      .local pmc x, new_y, z
      .lex 'z', z
      find_lex x, 'x'
      say x                            # prints 10
      new_y = new 'Integer'
      new_y = 20
      store_lex 'y', new_y
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{f\mbox{}ind\_lex} and \texttt{store\_lex} opcodes don't just access the value of a variable directly in the scope where it's declared, they interact with the \texttt{LexPad} PMC to f\mbox{}ind lexical variables within outer lexical scopes. All lexical variables from an outer lexical scope are visible from the inner lexical scope.

Note that you can only store PMCs---not primitive types---as lexicals.

\subsection*{Multiple Dispatch}

\index{multiple dispatch} \index{subroutines; signatures} Multiple dispatch subroutines (or \emph{multis}) have several variants with the same name but dif\mbox{}ferent sets of parameters. The set of parameters for a subroutine is its \emph{signature}. When a multi is called, the dispatch operation compares the arguments passed in to the signatures of all the variants and invokes the subroutine with the best match.

Parrot stores all multiple dispatch subs with the same name in a namespace within a single PMC called a \texttt{MultiSub}\index{MultiSub PMC}. The \texttt{MultiSub} is an invokable list of subroutines. When a multiple dispatch sub is called, the \texttt{MultiSub} PMC searches its list of variants for the best matching candidate.

The \texttt{:multi}\index{:multi subroutine modif\mbox{}ier} modif\mbox{}ier on a \texttt{.sub} declares a \texttt{MultiSub}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'MyMulti' :multi()
      # does whatever a MyMulti does
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Each variant in a \texttt{MultiSub} must have a unique type or number of parameters declared, so the dispatcher can calculate a best match. If you had two variants that both took four integer parameters, the dispatcher would never be able to decide which one to call when it received four integer arguments.

\index{multi signature} The \texttt{:multi} modif\mbox{}ier takes one or more arguments def\mbox{}ining the \emph{multi signature}. The multi signature tells Parrot what particular combination of input parameters the multi accepts:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'Add' :multi(I, I)
    .param int x
    .param int y
     $I0 = x + y
    .return($I0)
  .end

  .sub 'Add' :multi(N, N)
    .param num x
    .param num y
    $N0 = x + y
    .return($N0)
  .end

  .sub 'Start' :main
    $I0 = Add(1, 2)      # 3
    $N0 = Add(3.14, 2.0) # 5.14
    $S0 = Add("a", "b")  # ERROR! No (S, S) variant!
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Multis can take I, N, S, and P types, but they can also use \texttt{\_} (underscore) to denote a wildcard, and a string which names a PMC type:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'Add' :multi(I, I)          # two integers
    #...
  .end

  .sub 'Add' :multi(I, 'Float')    # integer and Float PMC
    #...
  .end

  .sub 'Add' :multi('Integer', _)  # Integer PMC and wildcard
    #...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
When you call a \texttt{MultiSub}, Parrot will try to take the most specif\mbox{}ic best-match variant, but will fall back to more general variants if it cannot f\mbox{}ind a perfect match. If you call \texttt{Add} with \texttt{(1, 2)}, Parrot will dispatch to the \texttt{(I, I)} variant. If you call it with \texttt{(1, ``hi'')}, Parrot will match the \texttt{(I, \_)} variant, as the string in the second argument doesn't match \texttt{I} or \texttt{Float}. Parrot can also promote one of the I, N, or S values to an Integer, Float, or String PMC.

\index{Manhattan Distance} To make the decision about which multi variant to call, Parrot calculates the \emph{Manhattan Distance} between the argument signature and the parameter signature of each variant. Every dif\mbox{}ference between each element counts as one step. A dif\mbox{}ference can be a promotion from a primitive type to a PMC, the conversion from one primitive type to another, or the matching of an argument to a \texttt{\_} wildcard. After Parrot calculates the distance to each variant, it calls the one with the lowest distance. Notice that it's possible to def\mbox{}ine a variant that is impossible to call: for every potential combination of arguments there is a better match. This is uncommon, but possible in systems with many multis and a limited number of data types.

\subsection*{Continuations}

\index{continuations} \index{subroutines; continuations} Continuations are subroutines that take snapshots of control f\mbox{}low. They are frozen images of the current execution state of the VM. Once you have a continuation, you can invoke it to return to the point where the continuation was f\mbox{}irst created. It's like a magical timewarp that allows the developer to arbitrarily move control f\mbox{}low back to any previous point in the program.

Continuations are like any other PMC; create one with the \texttt{new} opcode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'Continuation'\end{verbatim}
\vspace{-6pt}
\normalsize
The new continuation starts in an undef\mbox{}ined state. If you attempt to invoke a new continuation without initializing it, Parrot will throw an exception. To prepare the continuation for use, assign it a destination label with the \texttt{set\_addr}\index{set\_addr opcode} opcode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = new 'Continuation'
    set_addr $P0, my_label

  my_label:
    # ...\end{verbatim}
\vspace{-6pt}
\normalsize
To jump to the continuation's stored label and return the context to the state it was in \emph{at the point of its creation}, invoke the continuation:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0()\end{verbatim}
\vspace{-6pt}
\normalsize
Even though you can use the subroutine call notation \texttt{\$P0()} to invoke the continuation, you cannot pass arguments or obtain return values.

\subsubsection*{Continuation Passing Style}

\index{continuation passing style (CPS)} \index{CPS (continuation passing style)} Parrot uses continuations internally for control f\mbox{}low. When Parrot invokes a subroutine, it creates a continuation representing the current point in the program. It passes this continuation as an invisible parameter to the subroutine call. To return from that subroutine, Parrot invokes the continuation to return to the point of creation of that continuation. If you have a continuation, you can invoke it to return to its point of creation any time you want.

This type of f\mbox{}low control---invoking continuations instead of performing bare jumps---is called Continuation Passing Style (CPS).

\subsubsection*{Tailcalls}

Many subroutines set up and call another subroutine and then return the result of the second call directly. This is a \index{tailcall} tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  call add_two(5)

  subroutine add_two(value)
    value = add_one(value)
    return add_one(value)\end{verbatim}
\vspace{-6pt}
\normalsize
In this example, the subroutine \texttt{add\_two} makes two calls to \texttt{add\_one}. The second call to \texttt{add\_one} is the return value. \texttt{add\_one} gets called; its result gets returned to the caller of \texttt{add\_two}. Nothing in \texttt{add\_two} uses that return value directly.

A simple optimization is available for this type of code. The second call to \texttt{add\_one} can return to the same place that \texttt{add\_two} returns; it's perfectly safe and correct to use the same return continuation that \texttt{add\_two} uses. The two subroutine calls can share a return continuation.

\index{.tailcall directive} PIR provides the \texttt{.tailcall} directive to identify similar situations. Use it in place of the \texttt{.return} directive. \texttt{.tailcall} performs this optimization by reusing the return continuation of the parent subroutine to make the tailcall:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'main' :main
      .local int value
      value = add_two(5)
      say value
  .end

  .sub 'add_two'
      .param int value
      .local int val2
      val2 = add_one(value)
      .tailcall add_one(val2)
  .end

  .sub 'add_one'
      .param int a
      .local int b
      b = a + 1
      .return (b)
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
This example prints the correct value \texttt{7}.

\subsection*{Coroutines}

\index{coroutines} \index{subroutines; coroutines} Coroutines are similar to subroutines except that they have an internal notion of \emph{state}. In addition to performing a normal \texttt{.return} to return control f\mbox{}low back to the caller and destroy the execution environment of the subroutine, coroutines may also perform a \texttt{.yield} operation. \texttt{.yield} returns a value to the caller like \texttt{.return} can, but it does not destroy the execution state of the coroutine. The next call to the coroutine continues execution from the point of the last \texttt{.yield}, not at the beginning of the coroutine.

Inside a coroutine continuing from a \texttt{.yield}, the entire execution environment is the same as it was when the coroutine \texttt{.yield}ed. This means that the parameter values don't change, even if the next invocation of the coroutine had dif\mbox{}ferent arguments passed in.

Coroutines look like ordinary subroutines. They do not require any special modif\mbox{}ier or any special syntax to mark them as being a coroutine. What sets them apart is the use of the \texttt{.yield}\index{.yield directive} directive. \texttt{.yield} plays several roles:

\vspace{-5pt}

\begin{itemize}

\setlength{\topsep}{0pt}
\setlength{\itemsep}{0pt}
\item Identif\mbox{}ies coroutines

When Parrot sees a \texttt{.yield}, it knows to create a Coroutine PMC object instead of a \texttt{Sub} PMC.

\item Creates a continuation

\texttt{.yield} creates a continuation in the coroutine and stores the continuation object in the coroutine object for later resuming from the point of the \texttt{.yield}.

\item Returns a value

\texttt{.yield} can return a value \footnote{\ldots or many values, or no values.} to the caller. It is basically the same as a \texttt{.return} in this regard.

\end{itemize}

\vspace{-5pt}
Here is a simple coroutine example:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'MyCoro'
    .yield(1)
    .yield(2)
    .yield(3)
    .return(4)
  .end

  .sub 'main' :main
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
This contrived example demonstrates how the coroutine stores its state. When Parrot encounters the \texttt{.yield}, the coroutine stores its current execution environment. At the next call to the coroutine, it picks up where it left of\mbox{}f.

\subsection*{Native Call Interface}

The \index{NCI (native call interface)} Native Call Interface (NCI) is a special version of the Parrot calling conventions for calling functions in shared C libraries with a known signature. This is a simplif\mbox{}ied version of the f\mbox{}irst test in \emph{t/pmc/nci.t}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    .local pmc library
    library = loadlib "libnci_test"         # library object
    say "loaded"

    .local pmc ddfunc
    ddfunc = dlfunc library, "nci_dd", "dd" # function object
    say "dlfunced"

    .local num result
    result = ddfunc( 4.0 )                  # call the function

    ne result, 8.0, nok_1
    say "ok 1"
    end
  nok_1:
    say "not ok 1"

    #...\end{verbatim}
\vspace{-6pt}
\normalsize
This example shows two new opcodes: \texttt{loadlib} and \texttt{dlfunc}. The \texttt{loadlib}\index{loadlib opcode} opcode obtains a handle for a shared library. It searches for the shared library in the current directory, in \emph{runtime/parrot/dynext}, and in a few other conf\mbox{}igured directories. It also tries to load the provided f\mbox{}ilename unaltered and with appended extensions like \emph{.so} or \emph{.dll}. Which extensions it tries depends on the operating system Parrot is running on.

The \texttt{dlfunc}\index{dlfunc opcode} opcode gets a function object from a previously loaded library (second argument) of a specif\mbox{}ied name (third argument) with a known function signature (fourth argument). The function signature is a string where the f\mbox{}irst character is the return value and the rest of the parameters are the function parameters. Table 6-1 lists the characters used in NCI function signatures.

\begin{table}[!h]
\caption{Function signature letters}
\begin{center}
\label{CHP-6-TABLE-1}

\begin{tabular}{|l|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Character}} & \textbf{\textsf{Register}} & \textbf{\textsf{C type}}\\ \hline
\texttt{v} & - & void (no return value)\\ \hline
\texttt{c} & \texttt{I} & char\\ \hline
\texttt{s} & \texttt{I} & short\\ \hline
\texttt{i} & \texttt{I} & int\\ \hline
\texttt{l} & \texttt{I} & long\\ \hline
\texttt{f} & \texttt{N} & f\mbox{}loat\\ \hline
\texttt{d} & \texttt{N} & double\\ \hline
\texttt{t} & \texttt{S} & char *\\ \hline
\texttt{p} & \texttt{P} & void * (or other pointer)\\ \hline
\texttt{I} & - & Parrot\_Interp *interpreter\\ \hline
\texttt{C} & - & a callback function pointer\\ \hline
\texttt{D} & - & a callback function pointer\\ \hline
\texttt{Y} & \texttt{P} & the subroutine \texttt{C} or \texttt{D} calls into\\ \hline
\texttt{Z} & \texttt{P} & the argument for \texttt{Y}\\ \hline
\end{tabular}
\end{center}
\end{table}
\section{Classes and Objects}

Many of Parrot's core classes---such as \texttt{Integer}, \texttt{String}, or \texttt{ResizablePMCArray}---are written in C, but you can also write your own classes in PIR. PIR doesn't have the shiny syntax of high-level object-oriented languages, but it provides the necessary features to construct well-behaved objects every bit as powerful as those of high-level object systems.

\index{objects} Parrot developers often use the word ``PMCs'' to refer to the objects def\mbox{}ined in C classes and ``objects'' to refer to the objects def\mbox{}ined in PIR. In truth, all PMCs are objects and all objects are PMCs, so the distinction is a community tradition with no of\mbox{}ficial meaning.

\subsection*{Class Declaration}

\index{classes} The \texttt{newclass}\index{newclass opcode} opcode def\mbox{}ines a new class. It takes a single argument, the name of the class to def\mbox{}ine.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = newclass 'Foo'\end{verbatim}
\vspace{-6pt}
\normalsize
Just as with Parrot's core classes, the \texttt{new}\index{new opcode} opcode instantiates a new object of a named class.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = new 'Foo'\end{verbatim}
\vspace{-6pt}
\normalsize
In addition to a string name for the class, \texttt{new} can also instantiate an object from a class object or from a keyed namespace name.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = newclass 'Foo'
  $P1 = new $P0

  $P2 = new ['Bar';'Baz']\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Attributes}

\index{attributes} \index{classes;attributes} The \texttt{addattribute} opcode def\mbox{}ines a named attribute---or \emph{instance variable}---in the class:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = newclass 'Foo'
  addattribute $P0, 'bar'\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{setattribute}\index{setattribute} opcode sets the value of a declared attribute. You must declare an attribute before you may set it. The value of an attribute is always a PMC, never an integer, number, or string.\footnote{Though it can be an \texttt{Integer}, \texttt{Number}, or \texttt{String} PMC.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P6 = box 42
    setattribute $P1, 'bar', $P6\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{getattribute}\index{getattribute opcode} opcode fetches the value of a named attribute. It takes an object and an attribute name as arguments and returns the attribute PMC:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P10 = getattribute $P1, 'bar'\end{verbatim}
\vspace{-6pt}
\normalsize
Because PMCs are containers, you may modify an object's attribute by retrieving the attribute PMC and modifying its value. You don't need to call \texttt{setattribute} for the change to stick:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P10 = getattribute $P1, 'bar'
    $P10 = 5\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Instantiation}

With a created class, we can use the \texttt{new} opcode to instantiate an object of that class in the same way we can instantiate a new PMC.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = newclass "Foo"
    $P1 = new $P0\end{verbatim}
\vspace{-6pt}
\normalsize
Or, if we don't have the class object handy, we can do it by name too:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P1 = new "Foo"\end{verbatim}
\vspace{-6pt}
\normalsize
PMCs have two VTABLE interface functions for dealing with instantiating a new object: \texttt{init} and \texttt{init\_pmc}. The former is called when a new PMC is created, the later is called when a new PMC is created with an initialization argument.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    .namespace ["Foo"]
    .sub 'init' :vtable
        say "Creating a new Foo"
    .end

    .sub 'init_pmc' :vtable
        .param pmc args
        print "Creating a new Foo with argument "
        say args
    .end

    .namespace[]
    .sub 'main' :main
        $P1 = new ['Foo']       # init
        $P2 = new ['Foo'], $P1  # init_pmc
    .end\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Methods}

\index{methods} \index{classes;methods} \index{subroutines;methods} Methods in PIR are subroutines stored in the class object. Def\mbox{}ine a method with the \texttt{.sub} directive and the \texttt{:method}\index{:method subroutine modif\mbox{}ier} modif\mbox{}ier:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub half :method
    $P0 = getattribute self, 'bar'
    $P1 = $P0 / 2
    .return($P1)
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
This method returns the integer value of the \texttt{bar} attribute of the object divided by two. Notice that the code never declares the named variable \texttt{self}. Methods always make the invocant object---the object on which the method was invoked---available in a local variable called \texttt{self}\index{self variable}.

The \texttt{:method} modif\mbox{}ier adds the subroutine to the class object associated with the currently selected namespace, so every class def\mbox{}inition f\mbox{}ile must contain a \texttt{.namespace}\index{.namespace directive} declaration. Class f\mbox{}iles for languages may also contain an \texttt{.HLL}\index{.HLL directive} declaration to associate the namespace with the appropriate high-level language:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .HLL 'php'
  .namespace [ 'Foo' ]\end{verbatim}
\vspace{-6pt}
\normalsize
Method calls in PIR use a period (\texttt{.}) to separate the object from the method name. The method name is either a literal string in quotes or a string variable. The method call looks up the method in the invocant object using the string name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = $P1.'half'()

    $S2 = 'double'
    $P0 = $P1.$S2()\end{verbatim}
\vspace{-6pt}
\normalsize
You can also pass a method object to the method call instead of looking it up by string name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P2 = get_global 'triple'
    $P0 = $P1.$P2()\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot always treats a PMC used in the method position as a method object, so you can't pass a \texttt{String} PMC as the method name.

Methods can have multiple arguments and multiple return values just like subroutines:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  ($P0, $S1) = $P2.'method'($I3, $P4)\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{can}\index{can opcode} opcode checks whether an object has a particular method. It returns 0 (false) or 1 (true):

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $I0 = can $P3, 'add'\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Inheritance}

\index{inheritance} \index{classes;inheritance} The \texttt{subclass}\index{subclass opcode} opcode creates a new class that inherits methods and attributes from another class. It takes two arguments: the name of the parent class and the name of the new class:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P3 = subclass 'Foo', 'Bar'\end{verbatim}
\vspace{-6pt}
\normalsize
\texttt{subclass} can also take a class object as the parent class instead of a class name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P3 = subclass $P2, 'Bar'\end{verbatim}
\vspace{-6pt}
\normalsize
\index{multiple inheritance} The \texttt{addparent}\index{addparent opcode} opcode also adds a parent class to a subclass. This is especially useful for multiple inheritance, as the \texttt{subclass} opcode only accepts a single parent class:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P4 = newclass 'Baz'
  addparent $P3, $P4
  addparent $P3, $P5\end{verbatim}
\vspace{-6pt}
\normalsize
To override an inherited method in the child class, def\mbox{}ine a method with the same name in the subclass. This example code overrides \texttt{Bar}'s \texttt{who\_am\_i} method to return a more meaningful name:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .namespace [ 'Bar' ]

  .sub 'who_am_i' :method
    .return( 'I am proud to be a Bar' )
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\index{new opcode} Object creation for subclasses is the same as for ordinary classes:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P5 = new 'Bar'\end{verbatim}
\vspace{-6pt}
\normalsize
Calls to inherited methods are just like calls to methods def\mbox{}ined in the class:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P1.'increment'()\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{isa} opcode checks whether an object is an instance of or inherits from a particular class. It returns 0 (false) or 1 (true):

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $I0 = isa $P3, 'Foo'
    $I0 = isa $P3, 'Bar'\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Overriding Vtable Functions}

\index{overriding vtable functions} \index{vtable functions;overriding} The \texttt{Object} PMC\index{Object PMC} is a core PMC written in C that provides basic object-like behavior. Every object instantiated from a PIR class inherits a default set of vtable functions from \texttt{Object}, but you can override them with your own PIR subroutines.

The \texttt{:vtable}\index{:vtable subroutine modif\mbox{}ier} modif\mbox{}ier marks a subroutine as a vtable override. As it does with methods, Parrot stores vtable overrides in the class associated with the currently selected namespace:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub 'init' :vtable
    $P6 = new 'Integer'
    setattribute self, 'bar', $P6
    .return()
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
Subroutines acting as vtable overrides must either have the name of an actual vtable function or include the vtable function name in the \texttt{:vtable} modif\mbox{}ier:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .sub foozle :vtable('init')
    # ...
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
You must call methods on objects explicitly, but Parrot calls vtable functions implicitly in multiple contexts. For example, creating a new object with \texttt{\$P3 = new 'Foo'} will call \texttt{init} with the new \texttt{Foo} object.

As an example of some of the common vtable overrides, the \texttt{=}\index{= operator} operator (or \texttt{set}\index{set opcode} opcode) calls \texttt{Foo}'s vtable function \texttt{set\_integer\_native} when its left-hand side is a \texttt{Foo} object and the argument is an integer literal or integer variable:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P3 = 30\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{+}\index{+ operator} operator (or \texttt{add}\index{add opcode} opcode) calls \texttt{Foo}'s \texttt{add} vtable function when it adds two \texttt{Foo} objects:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P3 = new 'Foo'
    $P3 = 3
    $P4 = new 'Foo'
    $P4 = 1774

    $P5 = $P3 + $P4
    # or:
    add $P5, $P3, $P4\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{inc}\index{inc opcode} opcode calls \texttt{Foo}'s \texttt{increment} vtable function when it increments a \texttt{Foo} object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    inc $P3\end{verbatim}
\vspace{-6pt}
\normalsize
Parrot calls \texttt{Foo}'s \texttt{get\_integer} and \texttt{get\_string} vtable functions to retrieve an integer or string value from a \texttt{Foo} object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $I10 = $P5  # get_integer
    say $P5     # get_string\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Introspection}

\index{introspection} \index{classes;introspection} Classes def\mbox{}ined in PIR using the \texttt{newclass} opcode are instances of the \texttt{Class} PMC\index{Class PMC}. This PMC contains all the meta-information for the class, such as attribute def\mbox{}initions, methods, vtable overrides, and its inheritance hierarchy. The opcode \texttt{inspect}\index{inspect opcode} provides a way to peek behind the curtain of encapsulation to see what makes a class tick. When called with no arguments, \texttt{inspect} returns an associative array containing data on all characteristics of the class that it chooses to reveal:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = inspect $P0
  $P2 = $P1['attributes']\end{verbatim}
\vspace{-6pt}
\normalsize
When called with a string argument, \texttt{inspect} only returns the data for a specif\mbox{}ic characteristic of the class:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = inspect $P1, 'parents'\end{verbatim}
\vspace{-6pt}
\normalsize
Table 7-1 shows the introspection characteristics supported by \texttt{inspect}.

\begin{table}[!h]
\caption{Class Introspection}
\begin{center}
\begin{tabular}{|l|l|}
\hline
\rowcolor[gray]{.9}
\textbf{\textsf{Characteristic}} & \textbf{\textsf{Description}}\\ \hline
\texttt{attributes} & Information about the attributes the class will instantiate in its objects. An associative array, where the keys are the attribute names and the values are hashes of metadata.\\ \hline
\texttt{f\mbox{}lags} & An \texttt{Integer} PMC containing any integer f\mbox{}lags set on the class object.\\ \hline
\texttt{methods} & A list of methods provided by the class. An associative array where the keys are the method names and the values are the invocable method objects.\\ \hline
\texttt{name} & A \texttt{String} PMC containing the name of the class.\\ \hline
\texttt{namespace} & The \texttt{NameSpace} PMC associated with the class.\\ \hline
\texttt{parents} & An array of \texttt{Class} objects that this class inherits from directly (via \texttt{subclass} or \texttt{add\_parent}). Does not include indirectly inherited parents.\\ \hline
\texttt{roles} & An array of \texttt{Role} objects composed into the class.\\ \hline
\texttt{vtable\_overrides} & A list of vtable overrides def\mbox{}ined by the class. An associative array where the keys are the vtable names and the values are the invocable sub objects.\\ \hline
\end{tabular}
\end{center}
\end{table}
\section{I/O}

\index{FileHandle PMC} Parrot handles all I/O in Parrot with a set of PMCs. The \texttt{FileHandle} PMC takes care of reading from and writing to f\mbox{}iles and f\mbox{}ile-like streams. The \texttt{Socket} PMC takes care of network I/O.

\subsection*{FileHandle Opcodes}

The \texttt{open}\index{open opcode} opcode opens a new f\mbox{}ilehandle. It takes a string argument, which is the path to the f\mbox{}ile:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  $P0 = open 'my/file/name.txt'\end{verbatim}
\vspace{-6pt}
\normalsize
By default, it opens the f\mbox{}ilehandle as read-only, but an optional second string argument can specify the mode for the f\mbox{}ile. The modes are \texttt{r} for read, \texttt{w} for write, \texttt{a} for append, and \texttt{p} for pipe:\footnote{These are the same as the C language read-modes, so may be familiar.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  $P0 = open 'my/file/name.txt', 'a'

  $P0 = open 'myfile.txt', 'r'\end{verbatim}
\vspace{-6pt}
\normalsize
You can combine modes; a handle that can read and write uses the mode string \texttt{rw}. A handle that can read and write but will not overwrite the existing contents uses \texttt{ra} instead.

The \texttt{close}\index{close opcode} opcode closes a f\mbox{}ilehandle when it's no longer needed. Closing a f\mbox{}ilehandle doesn't destroy the object, it only makes that f\mbox{}ilehandle object available for opening a dif\mbox{}ferent f\mbox{}ile.\footnote{It's generally not a good idea to manually close the standard input, standard output, or standard error f\mbox{}ilehandles, though you can recreate them.}

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  close $P0\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{print}\index{print opcode} opcode prints a string argument or the string form of an integer, number, or PMC to a f\mbox{}ilehandle:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  print $P0, 'Nobody expects'\end{verbatim}
\vspace{-6pt}
\normalsize
It also has a one-argument variant that always prints to standard output:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  print 'the Spanish Inquisition'\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{say}\index{say opcode} opcode also prints to standard output, but it appends a trailing newline to whatever it prints. Another opcode worth mentioning is the \texttt{printerr}\index{printerr opcode} opcode, which prints an argument to the standard error instead of standard output:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  say 'Turnip'

  # ...

  .loadlib 'io_ops'

  # ...

  printerr 'Blancmange'\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{read}\index{read opcode} and \texttt{readline}\index{readline opcode} opcodes read values from a f\mbox{}ilehandle. \texttt{read} takes an integer value and returns a string with that many characters (if possible). \texttt{readline} reads a line of input from a f\mbox{}ilehandle and returns the string without the trailing newline:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  $S0 = read $P0, 10

  $S0 = readline $P0\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{read} opcode has a one-argument variant that reads from standard input:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  $S0 = read 10\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{getstdin}\index{getstdin opcode}, \texttt{getstdout}\index{getstdout opcode}, and \texttt{getstderr}\index{getstderr opcode} opcodes fetch the f\mbox{}ilehandle objects for the standard streams: standard input, standard output, and standard error:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  $P0 = getstdin    # Standard input handle
  $P1 = getstdout   # Standard output handle
  $P2 = getstderr   # Standard error handle\end{verbatim}
\vspace{-6pt}
\normalsize
Once you have the f\mbox{}ilehandle for one of the standard streams, you can use it just like any other f\mbox{}ilehandle object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  # ...

  $P0 = getstdout
  print $P0, 'hello'\end{verbatim}
\vspace{-6pt}
\normalsize
This following example reads data from the f\mbox{}ile \emph{myf\mbox{}ile.txt} one line at a time using the \texttt{readline} opcode. As it loops over the lines of the f\mbox{}ile, it checks the boolean value of the read-only f\mbox{}ilehandle \texttt{\$P0} to test whether the f\mbox{}ilehandle has reached the end of the f\mbox{}ile:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .loadlib 'io_ops'

  .sub 'main'
    $P0 = getstdout
    $P1 = open 'myfile.txt', 'r'
    loop_top:
      $S0 = readline $P1
      print $P0, $S0
      if $P1 goto loop_top
    close $P1
  .end\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{FileHandle Methods}

The methods available on a f\mbox{}ilehandle object are mostly duplicates of the opcodes, though sometimes they provide more options. Behind the scenes many of the opcodes call the f\mbox{}ilehandle's methods anyway, so the choice between the two is more a matter of style preference than anything else.

\subsubsection*{open}

The \texttt{open}\index{open method} method opens a stream in an existing f\mbox{}ilehandle object. It takes two optional string arguments: the name of the f\mbox{}ile to open and the open mode.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'FileHandle'
  $P0.'open'('myfile.txt', 'r')\end{verbatim}
\vspace{-6pt}
\normalsize
The \texttt{open} opcode internally creates a new f\mbox{}ilehandle PMC and calls its \texttt{open} method on it. The opcode version is shorter to write, but it also creates a new PMC for every call, while the method can reopen an existing f\mbox{}ilehandle PMC with a new f\mbox{}ile.

When reopening a f\mbox{}ilehandle, Parrot will reuse the previous f\mbox{}ilename associated with the f\mbox{}ilehandle unless you provide a dif\mbox{}ferent f\mbox{}ilename. The same goes for the mode.

\subsubsection*{close}

The \texttt{close}\index{close method} method closes the f\mbox{}ilehandle. This does not destroy the f\mbox{}ilehandle object; you can reopen it with the \texttt{open} method later.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'close'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{is\_closed}

The \texttt{is\_closed}\index{is\_closed method} method checks if the f\mbox{}ilehandle is closed. It returns true if the f\mbox{}ilehandle has been closed or was never opened, and false if it is currently open:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'is_closed'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{print}

The \texttt{print}\index{print method} method prints a given value to the f\mbox{}ilehandle. The argument can be an integer, number, string, or PMC.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'print'('Hello!')\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{puts}

The \texttt{puts}\index{puts method} method is similar to \texttt{print}, but it only takes a string argument.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'puts'('Hello!')\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{read}

The \texttt{read}\index{read method} method reads a specif\mbox{}ied number of bytes from the f\mbox{}ilehandle object and returns them in a string.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'read'(10)\end{verbatim}
\vspace{-6pt}
\normalsize
If the remaining bytes in the f\mbox{}ilehandle are fewer than the requested number of bytes, returns a string containing the remaining bytes.

\subsubsection*{readline}

The \texttt{readline}\index{readline method} method reads an entire line up to a newline character or the end-of-f\mbox{}ile mark from the f\mbox{}ilehandle object and returns it in a string.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'readline'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{readline\_interactive}

The \texttt{readline\_interactive}\index{readline\_interactive method} method is useful for command-line scripts. It writes the single argument to the method as a prompt to the screen, then reads back a line of input.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'readline_interactive'('Please enter your name:')\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{readall}

The \texttt{readall}\index{readall method} method reads an entire f\mbox{}ile. If the f\mbox{}ilehandle is closed, it will open the f\mbox{}ile given by the passed in string argument, read the entire f\mbox{}ile, and then close the f\mbox{}ilehandle.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'readall'('myfile.txt')\end{verbatim}
\vspace{-6pt}
\normalsize
If the f\mbox{}ilehandle is already open, \texttt{readall} will read the contents of the f\mbox{}ile, and won't close the f\mbox{}ilehandle when it's f\mbox{}inished. Don't pass the name argument when working with a f\mbox{}ile you've already opened.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'readall'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{mode}

The \texttt{mode}\index{mode method} method returns the current f\mbox{}ile access mode for the f\mbox{}ilehandle object.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0.'mode'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{encoding}

The \texttt{encoding}\index{encoding method} method sets or retrieves the string encoding behavior of the f\mbox{}ilehandle.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
 $P0.'encoding'('utf8')
 $S0 = $P0.'encoding'()\end{verbatim}
\vspace{-6pt}
\normalsize
See ``Encodings and Charsets`` in Chapter 4 for more details on the encodings supported in Parrot.

\subsubsection*{buf\mbox{}fer\_type}

The \texttt{buf\mbox{}fer\_type}\index{buf\mbox{}fer\_type method} method sets or retrieves the buf\mbox{}fering behavior of the f\mbox{}ilehandle object. The argument or return value is one of: \texttt{unbuf\mbox{}fered} to disable buf\mbox{}fering, \texttt{line-buf\mbox{}fered} to read or write when the f\mbox{}ilehandle encounters a line ending, or \texttt{full-buf\mbox{}fered} to read or write bytes when the buf\mbox{}fer is full.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'buffer_type'('full-buffered')
  $S0 = $P0.'buffer_type'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{buf\mbox{}fer\_size}

The \texttt{buf\mbox{}fer\_size}\index{buf\mbox{}fer\_size method} method sets or retrieves the buf\mbox{}fer size of the f\mbox{}ilehandle object.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'buffer_size'(1024)
  $I0 = $P0.'buffer_size'()\end{verbatim}
\vspace{-6pt}
\normalsize
The buf\mbox{}fer size set on the f\mbox{}ilehandle is only a suggestion. Parrot may allocate a larger buf\mbox{}fer, but it will never allocate a smaller buf\mbox{}fer.

\subsubsection*{f\mbox{}lush}

The \texttt{f\mbox{}lush}\index{f\mbox{}lush method} method f\mbox{}lushes the buf\mbox{}fer if the f\mbox{}ilehandle object is working in a buf\mbox{}fered mode.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'flush'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{eof}

The \texttt{eof}\index{eof method} method checks whether a f\mbox{}ilehandle object has reached the end of the current f\mbox{}ile. It returns true if the f\mbox{}ilehandle is at the end of the current f\mbox{}ile and false otherwise.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'eof'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{isatty}

The \texttt{isatty}\index{isatty method} method returns a boolean value whether the f\mbox{}ilehandle is a TTY terminal.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'isatty'()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsubsection*{get\_fd}

The \texttt{get\_fd}\index{get\_fd method} method returns the integer f\mbox{}ile descriptor of the current f\mbox{}ilehandle object. Not all operating systems use integer f\mbox{}ile descriptors. Those that don't simply return \texttt{-1}.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'get_fd'()\end{verbatim}
\vspace{-6pt}
\normalsize
\section{Exceptions}

\index{exceptions} Exceptions provide a way of subverting the normal f\mbox{}low of control. Their main use is error reporting and cleanup tasks, but sometimes exceptions are just a funny way to jump from one code location to another one. Parrot uses a robust exception mechanism and makes it available to PIR.

Exceptions are objects that hold essential information about an exceptional situation: the error message, the severity and type of the error, the location of the error, and backtrace information about the chain of calls that led to the error. Exception handlers are ordinary subroutines, but user code never calls them directly from within user code. Instead, Parrot invokes an appropriate exception handler to catch a thrown exception.

\subsection*{Throwing Exceptions}

\index{exceptions; throwing} The \texttt{throw}\index{throw opcode} opcode throws an exception object. This example creates a new \texttt{Exception}\index{Exception PMC} object in \texttt{\$P0} and throws it:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'Exception'
  throw $P0\end{verbatim}
\vspace{-6pt}
\normalsize
Setting the string value of an exception object sets its error message:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = new 'Exception'
  $P0 = "I really had my heart set on halibut."
  throw $P0\end{verbatim}
\vspace{-6pt}
\normalsize
Other parts of Parrot throw their own exceptions. The \texttt{die}\index{die opcode} opcode throws a fatal (that is, uncatchable) exception. Many opcodes throw exceptions to indicate error conditions. The \texttt{/} operator (the \texttt{div} opcode), for example, throws an exception on attempted division by zero.

When no appropriate handlers are available to catch an exception, Parrot treats it as a fatal error and exits, printing the exception message followed by a backtrace showing the location of the thrown exception:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  I really had my heart set on halibut.
  current instr.: 'main' pc 6 (pet_store.pir:4)\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Catching Exceptions}

\index{exception handlers} \index{exceptions; catching} Exception handlers catch exceptions, making it possible to recover from errors in a controlled way, instead of terminating the process entirely.

The \texttt{push\_eh}\index{push\_eh opcode} opcode creates an exception handler object and stores it in the list of currently active exception handlers. The body of the exception handler is a labeled section of code inside the same subroutine as the call to \texttt{push\_eh}. The opcode takes one argument, the name of the label:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  push_eh my_handler
    $P0 = new 'Exception'
    throw $P0

    say 'never printed'

  my_handler:
    say 'caught an exception'\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates an exception handler with a destination address of the \texttt{my\_handler} label, then creates a new exception and throws it. At this point, Parrot checks to see if there are any appropriate exception handlers in the currently active list. It f\mbox{}inds \texttt{my\_handler} and runs it, printing ``caught an exception''. The ``never printed'' line never runs, because the exceptional control f\mbox{}low skips right over it.

Because Parrot scans the list of active handlers from newest to oldest, you don't want to leave exception handlers lying around when you're done with them. The \texttt{pop\_eh}\index{pop\_eh opcode} opcode removes an exception handler from the list of currently active handlers:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  push_eh my_handler
    $I0 = $I1 / $I2
    pop_eh

    say 'maybe printed'

    goto skip_handler

  my_handler:
    say 'caught an exception'
    pop_eh

  skip_handler:\end{verbatim}
\vspace{-6pt}
\normalsize
This example creates an exception handler \texttt{my\_handler} and then runs a division operation that will throw a ``division by zero'' exception if \texttt{\$I2} is 0. When \texttt{\$I2} is 0, \texttt{div} throws an exception. The exception handler catches it, prints ``caught an exception'', and then clears itself with \texttt{pop\_eh}. When \texttt{\$I2} is a non-zero value, there is no exception. The code clears the exception handler with \texttt{pop\_eh}, then prints ``maybe printed''. The \texttt{goto} skips over the code of the exception handler, as it's just a labeled unit of code within the subroutine.

The exception object provides access to various attributes of the exception for additional information about what kind of error it was, and what might have caused it. The directive \texttt{.get\_results}\index{.get\_results directive} retrieves the \texttt{Exception} object from inside the handler:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  my_handler:
    .get_results($P0)\end{verbatim}
\vspace{-6pt}
\normalsize
Not all handlers are able to handle all kinds of exceptions. If a handler determines that it's caught an exception it can't handle, it can \texttt{rethrow} the exception to the next handler in the list of active handlers:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  my_handler:
    .get_results($P0)
    rethrow $P0\end{verbatim}
\vspace{-6pt}
\normalsize
If none of the active handlers can handle the exception, the exception becomes a fatal error. Parrot will exit, just as if it could f\mbox{}ind no handlers.

\index{exceptions;resuming} \index{resumable exceptions}

An exception handler creates a return continuation with a snapshot of the current interpreter context. If the handler is successful, it can resume running at the instruction immediately after the one that threw the exception. This resume continuation is available from the \texttt{resume} attribute of the exception object. To resume after the exception handler is complete, call the resume handler like an ordinary subroutine:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  my_handler:
    .get_results($P0)
    $P1 = $P0['resume']
    $P1()\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Exception PMC}

\index{Exception PMC} \index{Exception PMC;message} \texttt{Exception} objects contain several useful pieces of information about the exception. To set and retrieve the exception message, use the \texttt{message} key on the exception object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0            = new 'Exception'
  $P0['message'] = "this is an error message for the exception"\end{verbatim}
\vspace{-6pt}
\normalsize
\ldots or set and retrieve the string value of the exception object directly:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $S0 = $P0\end{verbatim}
\vspace{-6pt}
\normalsize
\index{Exception PMC;severity} \index{Exception PMC;type} The severity and type of the exception are both integer values:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0['severity'] = 1
  $P0['type']     = 2\end{verbatim}
\vspace{-6pt}
\normalsize
\index{Exception PMC;payload} The payload holds any user-def\mbox{}ined data attached to the exception object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0['payload'] = $P2\end{verbatim}
\vspace{-6pt}
\normalsize
The attributes of the exception are useful in the handler for making decisions about how and whether to handle an exception and report its results:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  my_handler:
    .get_results($P2)
    $S0 = $P2['message']
    print 'caught exception: "'
    print $S0
    $I0 = $P2['type']
    print '", of type '
    say $I0\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{ExceptionHandler PMC}

\index{ExceptionHandler PMC} Exception handlers are subroutine-like PMC objects, derived from Parrot's \texttt{Continuation} type. When you use \texttt{push\_eh} with a label to create an exception handler, Parrot creates the handler PMC for you. You can also create it directly by creating a new \texttt{ExceptionHandler} object, and setting its destination address to the label of the handler using the \texttt{set\_addr} opcode\index{set\_addr opcode}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = new 'ExceptionHandler'
    set_addr $P0, my_handler
    push_eh $P0
    # ...

  my_handler:
    # ...\end{verbatim}
\vspace{-6pt}
\normalsize
\index{can\_handle method} \texttt{ExceptionHandler} PMCs have several methods for setting or checking handler attributes. The \texttt{can\_handle} method reports whether the handler is willing or able to handle a particular exception. It takes one argument, the exception object to test:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'can_handle'($P1)\end{verbatim}
\vspace{-6pt}
\normalsize
\index{min\_severity method} \index{max\_severity method} The \texttt{min\_severity} and \texttt{max\_severity} methods set and retrieve the severity attributes of the handler, allowing it to refuse to handle any exceptions whose severity is too high or too low. Both take a single optional integer argument to set the severity; both return the current value of the attribute as a result:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'min_severity'(5)
  $I0 = $P0.'max_severity'()\end{verbatim}
\vspace{-6pt}
\normalsize
\index{handle\_types method} \index{handle\_types\_except method} The \texttt{handle\_types} and \texttt{handle\_types\_except} methods tell the exception handler what types of exceptions it should or shouldn't handle. Both take a list of integer types, which correspond to the \texttt{type} attribute set on an exception object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0.'handle_types'(5, 78, 42)\end{verbatim}
\vspace{-6pt}
\normalsize
The following example creates an exception handler that only handles exception types 1 and 2. Instead of having \texttt{push\_eh} create the exception handler object, it creates a new \texttt{ExceptionHandler} object manually. It then calls \texttt{handle\_types} to identify the exception types it will handle:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
    $P0 = new 'ExceptionHandler'
    set_addr $P0, my_handler
    $P0.'handle_types'(1, 2)
    push_eh $P0
    # ...
  my_handler:
    # ...\end{verbatim}
\vspace{-6pt}
\normalsize
This handler can only handle exception objects with a type of 1 or 2. Parrot will skip over this handler for all other exception types.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1         = new 'Exception'
  $P1['type'] = 2
  throw $P1                     # caught

  $P1         = new 'Exception'
  $P1['type'] = 3
  throw $P1                     # uncaught\end{verbatim}
\vspace{-6pt}
\normalsize
\subsection*{Annotations}

\index{bytecode annotations} Annotations are pieces of metadata code stored in a bytecode f\mbox{}ile. This is especially important when dealing with high-level languages, where annotations contain information about the HLL's source code such as the current line number and f\mbox{}ile name.

Create an annotation with the \texttt{.annotate}\index{.annotate directive} directive. Annotations consist of a key/value pair, where the key is a string and the value is an integer, or a string. Bytecode stores annotations as constants in the compiled bytecode. Consequently, you may not store PMCs.

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .annotate 'file', 'mysource.lang'
  .annotate 'line', 42
  .annotate 'compiletime', '0.3456'\end{verbatim}
\vspace{-6pt}
\normalsize
Annotations exist, or are ``in force'' throughout the entire subroutine or until their redef\mbox{}inition. Creating a new annotation with the same name as an old one overwrites it with the new value. The \texttt{annotations}\index{annotations opcode} opcode retrieves the current hash of annotations:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  .annotate 'line', 1
  $P0 = annotations # {'line' => 1}

  .annotate 'line', 2
  $P0 = annotations # {'line' => 2}\end{verbatim}
\vspace{-6pt}
\normalsize
To retrieve a single annotation by name, use the name with \texttt{annotations}:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P0 = annotations 'line'\end{verbatim}
\vspace{-6pt}
\normalsize
Exception objects contain information about the annotations that were in force when the exception was thrown. Retrieve them with the \texttt{annotations}\index{annotations method} method on the exception PMC object:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $I0 = $P0.'annotations'('line')  # only the 'line' annotation
  $P1 = $P0.'annotations'()        # hash of all annotations\end{verbatim}
\vspace{-6pt}
\normalsize
Exceptions can also include a backtrace\index{backtrace method} to display the program f\mbox{}low to the point of the throw:

\vspace{-6pt}
\scriptsize
\begin{verbatim}
  $P1 = $P0.'backtrace'()\end{verbatim}
\vspace{-6pt}
\normalsize
The backtrace PMC is an array of hashes. Each element in the array corresponds to a function in the current call chain. Each hash has two elements: \texttt{annotation} (the hash of annotations in ef\mbox{}fect at that point) and \texttt{sub} (the Sub PMC of that function).

\end{document}