Sophie

Sophie

distrib > Mageia > 1 > i586 > by-pkgid > d92aa75c2d384ff9f513aed09a46f703 > files > 378

parrot-doc-3.1.0-2.mga1.i586.rpm

# Copyright (C) 2001-2010, Parrot Foundation.

=head1 [DRAFT] PDD 8: PMC Keys

=head2 Abstract

This PDD aims to clear up the confusion regarding the implementation of keyed
access to PMCs in Parrot.

=head2 Version

$Revision$

=head2 Description

First, let's define some terminology. An C<aggregate PMC> is one which stores
or references other values as elements.  The aggregate PMC allows indexed
access to element by implementing some of the C<_keyed> variants of VTABLE
functions.  These variants are called C<indexing> operations, as they act on a
specific indexed element of an aggregate PMC.  Examples of a aggregate PMCs
include C<Hash>, C<FixedIntegerArray> and C<ResizablePMCArray>.

Non-aggregates may also support C<_keyed> variants of the VTABLE functions,
but they not do anything particularly clever.  For instance, PMC types
implementing Perl references will merely pass the index on to the referent.
These aren't aggregates because they don't directly store or reference
elements.

Indexing operations take one or more aggregate B<keys>.  At runtime these
operations will index into the B<aggregate> using the C<key> and return a
B<value>.  Here is a well-known indexing operation in Perl 6:

    @a[12] = $b;

The B<key> here is the constant integer C<12>  The aggregate is the
C<Perl6Array> C<@a>.  In the process of this assignment, Parrot will have to
extract the PMC in element 12 of the array, producing a C<value>.
C<$b> is then assigned to this value.

Now, how does this all get implemented?

=head2 Implementation

=head3 The key structure

The key structure must bundle multiple keys.  This is to allow indexing into
multidimensional aggregate PMCs.  These keys may be specified as integer,
string, number or PMC.

For this reason the following structure was produced.  Individual keys (e.g. a
single C<Integer> or C<String>) are stored in a C<Key> PMC.  The type of the
key is encoded in the private flags of the PMC as specified below.   The value
of the C<Key> PMC is stored in the PMC's data structure internally and can be
accessed using the appropriate get_* VTABLE functions.

For example, indexing the multidimensional array C<@foo[$a,12;"hi"]>
produces three PMCs; one with a PMC type, one with an integer type
and one with a string type.

The key type is encoded in the PMC flags using 8 bits based on the following
scheme (from includes/parrot/key.h):

    typedef enum {
        KEY_integer_FLAG        = PObj_private0_FLAG,
        KEY_number_FLAG         = PObj_private1_FLAG,
        KEY_hash_iterator_FLAGS = PObj_private0_FLAG | PObj_private1_FLAG,
        KEY_string_FLAG         = PObj_private2_FLAG,
        KEY_pmc_FLAG            = PObj_private3_FLAG,
        KEY_register_FLAG       = PObj_private4_FLAG,

        KEY_type_FLAGS          = KEY_integer_FLAG   |
                                  KEY_number_FLAG    |
                                  KEY_string_FLAG    |
                                  KEY_pmc_FLAG       |
                                  KEY_register_FLA G |
                                  KEY_hash_iterator_FLAGS
    } KEY_flags


The C<KEY_register_FLAG> is used to indicate that value if the key is in a
register.  In this case, the Key PMC's data contains the integer number of the
appropriate register in the current context.

Parrot must also have a way to combine multiple keys into a key that can be
treated as a single unit.  This is done by forming a singly linked list such
that each key points at the next.  Within a single Key PMC, the pointer to the
next key is stored in C<PMC_data(key)>.  The linked list structure allows the
use of partial keys in multidimensional lookups, since the next key can be
generated while the aggregate PMC is being traversed.

These definitions, along with declarations of support routines used to
manipulate keys, can be found in F<include/parrot/key.h>

=head3 Aggregate and non-aggregate PMCs

We've already said that what separates the aggregate PMCs from the
non-aggregates is their implementation of the C<_keyed> vtable functions. So
it is Hereby Decreed that the default vtable which everyone inherits from
defines the C<_keyed> forms to throw an exception.

=over 3

=item Todo

Need discussion on whether C<EXCEPTION_OUT_OF_BOUNDS> is a good exception for
this, or whether something else should be used. It's really a compiler
screw-up, since code which indexes a non-aggregate shouldn't be generated.

=back

=head3 C<_keyed> vtable functions

So what of these magical C<_keyed> vtable functions? They are generated when
you add the C<keyed> tag to the appropriate entry in F<src/vtable.tbl>. They
are constructed by following B<every> C<PMC> argument with a second C<PMC>
argument which acts as the key for that argument; the name of the second
C<PMC> argument is formed by adding C<_key> onto the end of the first C<PMC>
argument.

The reason why every PMC argument has an associated key is twofold. Firstly,
it means that

    @a[$b] = $c

and

    $a = @b[$c]

use the same vtable function, reducing the multiplicity of methods. Secondly,
a three-argument C<assign> as suggested by the code above would be ambiguous -
the code above uses 3 PMCs in different ways.

Also, operations which take an aggregate key for one of their arguments should
take aggregate keys for B<all> of their arguments. This is to avoid the
following:

    void foo_keyed_i(PMC* x, PMC* x_key, INT a)
    void foo_keyed_n(PMC* x, PMC* x_key, NUM a)
    void foo_keyed_p(PMC* x, PMC* x_key, PMC a)
    void foo_keyed_p_keyed(PMC* x, PMC* x_key, PMC* a, PMC* a_key)

These are all replaced with the single entry

    void foo_keyed(PMC* x, PMC* a_key, PMC* a, PMC* a_key)

(Think how much worse it gets when there are three or more PMCs in an
entry...)

Yes. This means that you may need to turn some things into C<PMC>s that you
didn't want to. Since the alternative is mega pollution and duplication in the
vtable table, and since the majority of things that you'll deal with in a real
world situation are expected to be C<PMC>s anyway, this shouldn't be too much
of a problem.

So, if you have a PMC in a C<_keyed> method which you don't want to index,
pass in C<NULL> instead of a real key. Code implementing these methods should
understand C<PMC* foo, PMC* NULL> as meaning the entirety of C<foo> in some
sense; this is trivial to understand if C<foo> is non-aggregate, and
implementation-defined if C<foo> is aggregate. If you remember that a key PMC
is really a linked list, you'll notice that after traversing down through the
list, you'll reach a C<NULL> which again means the entirety of whatever object
you traversed to.

Similarly, non-C<_keyed> methods on aggregates are implementation defined; for
instance, a C<set_integer> on a C<PerlArray> may be understood as setting
C<@array.length>.

Historically, we first implemented keys as two separate keyed methods per
applicable method - C<..._index> and C<..._index_s> for integer and string
indexing respectively. However, this didn't give us the flexibility and
scalability that key structures give us.

=head3 Input to the assembler

There are several different valid specifications of an aggregate key to the
assembler. These are:

    op arg, P1[1234]  # Constant integer key
    op arg, P1[I1]    # Integer key

    op arg, P1[12.34] # Constant number key - handled as constant key
    op arg, P1["foo"] # Constant string key - handled as constant key
    op arg, P1[I1;I2] # Multi-level key - handled as constant key

    op arg, P1[P1]    # Register key

(Rationale: fits programmer's expectation, easier to understand at a glance
than C<op P1, P2, P3>. Also, is C<op P1, P2, P3> the same as C<op P1[P2], P3>
or C<op P1, P2[P3]>, or are these three separate PMCs?)

In all there are four types of key. The first two are integer keys and
constant integer keys which are optimisations for the common case of single
level integer keys.

The other two are constant keys, which can handle any combination of constants
and registers with any number of levels; and register keys which are
represented by a single PMC register that is assumed to  contain a PMC of the
Key class.

=head3 What the assembler did next

When the assembler sees an aggregate key, it "detaches" the key to form a
separate argument. It then decides on the type of key. For  integer keys (both
constant and register) the data is encoded in the same way as an ordinary
integer argument. For register keys the data is encoded as for an ordinary PMC
register argument, while for constant keys a key constant is generated that
encodes the list of constants and registers that make up the key and an
appropriate index into the constant table is encoded as the argument.

Next it selects the appropriate op. Register keys have the signature C<k> and
constant keys have the signature C<kc>. Integer register and constant keys are
encoded as C<ki> and C<kic> respectively.

=begin PIR_FRAGMENT

    set $P1["hi"], 1234

=end PIR_FRAGMENT

finds an op named C<set_p_kc_i>. On the other hand,

=begin PIR_FRAGMENT

    set $P1[$P1], 1234

=end PIR_FRAGMENT

produces an op named C<set_p_k_i>. Likewise, this:

=begin PIR_FRAGMENT

    set $P1[1], 1234

=end PIR_FRAGMENT

produces an op named C<set_p_kic>, and this:

=begin PIR_FRAGMENT

    set $P1[$I1], 1234

=end PIR_FRAGMENT

produces an op named C<set_p_ki>.

=head3 Bytecode representation

The bytecode representation of these keys are as follows: constant keys are
treated just like another constant, and are an index into the packfile's
constant table.

Each key in that constant table consists of one word specifying its length in
terms of number of keys. For instance, C<["hi"]> has length 1;
C<["hi";P1;S1;123]> has length 4. Next, each key is specified using two words.
The first word is a type specifier:

    1 - Integer constant
    2 - Number constant
    4 - String constant
    7 - Integer register
    8 - Number register
    9 - PMC register
   10 - String register

and the second word is either a value (for integer constants), a register
number (for registers) or an index into the appropriate constant table.

The type values shown above are actually the C<PARROT_ARG_*> values taken from
F<include/parrot/op.h>.

=head2 References

None.

=cut

__END__
Local Variables:
  fill-column:78
End:
vim: expandtab shiftwidth=4: