Sophie: gap-system-4.4.12-5mdv2010.0 x86

gap-system-4.4.12-5mdv2010.0.x86_64.rpm

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%W  pargap3.tex            ParGAP documentation            Gene Cooperman
%%
%H  $Id: pargap3.tex,v 1.5 2001/11/16 15:57:00 gap Exp $
%%
%Y  Copyright (C) 1999-2001  Gene Cooperman
%Y    See included file, COPYING, for conditions for copying
%%

\Chapter{Basic Concepts for the TOP-C model (MasterSlave)}

TOP-C stands for *Task-Oriented Parallel  C*~\cite{Coo96}.   The  ``TOP-C
model'' is the specific master slave model implemented here.  That  model
has  been  adapted  for  use  in  {\ParGAP}.  The  implementation  is  in
`masslave.g' in {\ParGAP}'s `lib' directory. Note that the functions  and
variables with names `TOPC...' are intended as internal  functions  only,
and should not be used by the {\GAP} programmer.

For the impatient, you may type `MSexample();'  in  a  {\ParGAP}  session
now. If you prefer further hands-on learning in a tutorial style, you may
wish to next read Chapter~"MasterSlave Tutorial". Eventually, if you wish
a deeper understanding of the TOP-C model, you will  need  to  read  this
current section and those that follow.

The initial {\GAP} process is the *master* process, and  all  others  are
*slave* processes. It allows most of the CPU-intensive computations to be
carried  out  on  slave  processes,  which  typically  reside  on  remote
processors. A well-developed  TOP-C  application  should  find  that  the
master process is almost never busy when a slave process is idle, waiting
for a new computation to carry  out.  This  provides  a  natural  way  of
maximizing utilization and load balancing.

The TOP-C model depends on three concepts:

\beginitems

\index{task}
the *task*:&
    a function that takes an arbitrary object  as  its  single  argument,
    reads some or all of the global  shared data,  and  then  returns  an
    arbitrary object as its value. The task typically corresponds to  the
    inner loop of a typical application.

\index{shared data}
the *shared data*:&
    global data, shared among all processes. This data  can  be  read  as
    part of the computation of a task. However, after  initialization  of
    the shared data, this data must be written  (modified)  *only*  by  a
    particular user-provided application routine, `<UpdateSharedData>()'.

\index{action}
the *action*:&
    After the output of a task has been produced, an application  routine
    must choose one of four actions to determine how the output is used.

\enditems

\index{task!input}\index{task!output}
The *task input* is defined to be the argument of the *task*  (considered
as a function), and the *task output* is the return value of the *task*.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Basic TOP-C (Master-Slave) commands}

\atindex{TOP-C model}{@TOP-C model}\index{master slave model}
There is only one core TOP-C command, a  utility  function,  and  several
constants. A TOP-C command must be evaluated on the  master  and  on  all
slaves. We shall  describe  the  commands  in  detail  in  the  following
sections, but a short list of the essentials and a small example will  be
helpful to set the context.

\>MasterSlave( <SubmitTaskInput>,<DoTask>[,<CheckTaskResult>%
               [,<UpdateSharedData>[,<taskAgglomCount>]]] ) F

See Section~"Other TOP-C Commands" for a description of `MasterSlave'.

\>`NOTASK' V

\index{actions}
\>`NO_ACTION' V
\>`UPDATE_ACTION' V
\>`REDO_ACTION' V
\>CONTINUATION_ACTION( <taskContinuation> ) F

`CONTINUATION_ACTION()' is described in Section~"The  GOTO  statement  of
the TOP-C model".

\>IsUpToDate() F

\>ParInstallTOPCGlobalFunction( <string>, <function> ) F
\>ParInstallTOPCGlobalFunction( <gvar>, <function> ) F

A short example shows one possible implementation of `ParList()'.

\beginexample
gap> ParInstallTOPCGlobalFunction( "MyParListWithAgglom",
> function( list, fnc )
>   local result, i;
>   result := []; i := 0;
>   MasterSlave( function() if i >= Length(list) then return NOTASK;
>                           else i := i+1; return i; fi; end,
>                fnc,
>                function(input,output) result[input] := output;
>                                       return NO_ACTION; end,
>                Error
>              );
>   return result;
> end );
\endexample

(Of course rather than  type  such  code  in  a  {\ParGAP}  session  it's
generally more convenient to have it in a file and `Read' it in.)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Other TOP-C Commands}

A master-slave computation is invoked when a {\GAP}  program  issues  the
command `MasterSlave()'. As given earlier, the typical form is:

\){\kernttindent}MasterSlave( <SubmitTaskInput>, <DoTask> %
  [, <CheckTaskResult>[, <UpdateSharedData>[, <taskAgglom>]]] )

where the first four arguments of `MasterSlave()' are also functions, but
they must be defined by the application writer. Their calling  syntax  is
defined by the following {\GAP} code, which also  provides  a  simplified
description of how  a  sequential  (non-parallel)  `MasterSlave()'  would
invoke these functions if there were  only  a  single  process.  (A  more
sophisticated version of this routine is provided in {\ParGAP}  to  allow
one to debug within a  single  process  first.)  The  use  of  the  fifth
argument, <taskAgglom>, is deferred  until  section~"Agglomerating  tasks
for efficiency (ParSemiEchelonMat revisited again)".

In this section, we define `MasterSlave()' and describe the  use  of  its
four  arguments  in  a  purely  sequential  environment.  The  issues  of
parallelism and passing of messages between processes is covered  in  the
next section. The call to `MasterSlave()' in {\ParGAP}, above, will  have
the same result  as  if  `MasterSlave()'  were  defined  equivalently  to
`SeqMasterSlave()' below, and then run in a standard,  sequential  {\GAP}
(a  single  process).  The  next  section  describes  the   multi-process
implementation of `MasterSlave()' in {\ParGAP}, in which  `taskInput'  is
computed on the master process and sent as a message to a slave  process,
while `taskOutput' is computed on a slave process and sent as  a  message
to the master process.

\indextt{SeqMasterSlave!simplified pseudo-code}
\begintt
SeqMasterSlave :=
  function(SubmitTaskInput, DoTask, CheckTaskResult, UpdateSharedData)
  local taskInput, taskOutput, action;
  while true do
    taskInput := SubmitTaskInput();
    if taskInput = NOTASK then break; fi;
    repeat
      taskOutput := DoTask( taskInput );
      action := CheckTaskResult( taskOutput, taskInput );
    until action <> REDO_ACTION;
    if action = UPDATE_ACTION then
      # Modify the shared data (global data structures) here
      # Called on all processes, master and slaves
      UpdateSharedData( taskOutput, taskInput );
    fi;
  od;
end;
\endtt

One can also follow the life of  a  single  task  in  a  multi-processing
environment through the diagram below.

\begintt
               MASTER               ||               SLAVE
_________________________________________________________________________
                                    ||  
      +---------------------+       ||
      || GenerateTaskInput() ||       ||
      +---------------------+       ||
                             \input ||
                              \______________
                                    ||        ||
                                    ||        v
                                    ||       +---------------+
                                    ||       || DoTask(input) ||
                                    ||       +---------------+
                                    ||output/  ^
                                 _________/   ||
                                ||   ||         ||
                                v   ||         ||
 +--------------------------------+ ||         ||
 || CheckTaskResult(input, output) ||___________||
 +--------------------------------+ (if action == REDO)
                         ||          ||
                         || (if action == UPDATE)
                         v          ||
                    +---------------------------------+
                    || UpdateSharedData(input, output) ||
                    +---------------------------------+
                                    ||
                         TOP-C Programmers' Model
                          (Life Cycle of a Task)
 
\endtt
% {\footnotesize
% \begin{figure}[htb]\label{topc-fig}
% \begin{center}\label{diagram}
% \setlength{\unitlength}{3.0pt}
% % \setlength{\unitlength}{2pt} % for 2-column mode
% \begin{picture}(100,95)
% \put(25,90){\makebox(0,0)[b]{\bf MASTER}}
% \put(75,90){\makebox(0,0)[b]{\bf SLAVE}}
% \put(50,00){\line(0,1){4}}
% \multiput(50,15)(0,5){16}{\line(0,1){4}} % 20 times, delta = (0,5)
% \thicklines
% \put(5,88){\line(1,0){95}}
% \put(25,75){\oval(20,10)}  % corner is (35,70)
% \put(25,75){\makebox(0,0){`SubmitTaskInput()'}}
% \put(75,55){\oval(25,10)} % corner is (62,5, 50)
% \put(75,55){\makebox(0,0){`DoTask(taskInput)'}}
% \put(25,32){\oval(45,10)} % corner is (47.5, 26)
% \put(25,32){\makebox(0,0){\parbox[t]{90pt}{`CheckTaskResult' \hbox{\ \ }`(taskOutput,~taskInput)'}}}
% \put(50,10){\oval(75,10)}
% \put(50,10){\makebox(0,0){`UpdateSharedData(taskOutput, taskInput)'}}
% \thinlines
% % text is near beginning of vector
% % ref. point is one corner of oval
% \thicklines
% \put(34,69){\vector(3,-1){27}} % base is length 20
% \put(36,69){\makebox(0,0)[bl]{`taskInput'}}
% \put(62,51){\vector(-3,-2){18}} % base is length 20
% \put(60,51){\makebox(0,0)[br]{`taskOutput'}}
% \thinlines
% \put(48,37){\vector(3,2){15}} % base is length 20
% \put(48,37){\makebox(0,0)[tl]{(if action == {\tt REDO\_ACTION})}}
% \put(25,25){\vector(3,-1){25}} % base is length 20
% \put(36,22){\makebox(0,0)[bl]{(if action == {\tt UPDATE\_ACTION})}}
% \end{picture}\break
%   \hbox{\large TOP-C Programmer's Model}
% \end{center}
% \end{figure}
% }

Although not explicit in the code,  the  application  writer  should  add
comments to define the shared data. The *shared data*  is  defined  as  a
global  data   structure   that   is   treated   as   ``read-write''   by
`<UpdateSharedData>()',  while  being   treated   as   ``read-only''   by
`<SubmitTaskInput>()', `<DoTask>()', and `<CheckTaskResult>()'. Note also
that an application writer may use different names for the four functions
`<SubmitTaskInput>()', etc. It is only a convention within this manual to
give  those  functions  the   names,   above.   Similarly,   <taskInput>,
<taskOutput> and <action> are the conventional names used in this manual,
and a given application may use different names.

In a correct {\ParGAP} application, the shared data should be initialized
to  the  same  value  on  all  processes  before  the  application  calls
`MasterSlave()'. `MasterSlave()' is then called on all  processes.  After
that,  the  shared data   can   be   modified   only   by   a   call   to
`<UpdateSharedData>()', and `MasterSlave()' arranges  for  each  call  to
`<UpdateSharedData>()'  to  be  executed  on  all   processes.   Further,
`<UpdateSharedData>()' has access only to <taskInput>, <taskOutput>,  and
the previous value of the shared data.  Thus,  `MasterSlave()'  maintains
the same shared data uniformly on all processes.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Simple Usage of MasterSlave()}

This section is  concerned  with  formal  definitions  for  the  routines
associated  with  {\ParGAP}.  It  is  important  to  keep  in  mind   the
pseudo-code   of   Chapter~"Basic   Concepts   for   the   TOP-C    model
(MasterSlave)". Since `MasterSlave()' uses all the  {\ParGAP}  processes,
the user must invoke it on all processes. This is typically done  through
some function provided by the slave listener layer, such  as  `ParEval()'
(see~"ParEval"). It may be instructive for the reader  to  run  {\ParGAP}
and type `MSexample();'  now,  or  else  to  look  at  some  examples  of
{\ParGAP}  applications  in  the  section~"MasterSlave  Tutorial".   This
demonstrates the use of `MasterSlave()' in a typical session.

The   four   functions   written   by   the   application   writer   are:
`<SubmitTaskInput>()',    `<DoTask>()',    `<CheckTaskResult>()',     and
`<UpdateSharedData>()'.   `<DoTask>()'   is   executed   on   a    slave.
`<SubmitTaskInput>()'  and  `<CheckTaskResult>()'  are  executed  on  the
master, where a <taskInput> is generated and a corresponding <taskOutput>
is  received.  Finally,  `<UpdateSharedData>()'  is   executed   on   all
processes. {\ParGAP}  arranges  to  automatically  pass  <taskInput>  and
<taskOutput> between the master and a slave.

Since the  single  master  process  is  responsible  for  generating  all
<taskInput>s  and  receiving  all  <taskOutput>s,  it  is  critical  that
computation on the master process should not become a  bottleneck  for  a
well-designed {\ParGAP} application. Accordingly, the application  writer
should arrange for  `<SubmitTaskInput>()'  and  `<CheckTaskResult>()'  to
execute  quickly,  even  if  this   means   additional   computation   by
`<DoTask>()' or `<UpdateSharedData>()'.

As seen in the examples, `<SubmitTaskInput>()' may use global  variables  on
the  master  to  ``remember''  the  last  <taskInput>  or   other   state
information. Note that such  global  variables  cannot  be  part  of  the
shared data, since they are modified outside of `<UpdateSharedData>()'.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Efficient Parallelism in MasterSlave() using CheckTaskResult()}

It is instructive to review the logic for the  lifetime  of  a  task,  as
described by the pseudo-code for `SeqMasterSlave' in Section~"Other TOP-C
commands". Initially, `MasterSlave()' calls `<SubmitTaskInput>()' on  the
master, which returns an application-defined {\GAP} object,  <taskInput>.
`MasterSlave()' then copies <taskInput> to an  arbitrary  slave  process,
and `MasterSlave()' then calls `<DoTask>( <taskInput> )'  on  the  slave.
This returns an application-defined {\GAP}  object,  <taskOutput>,  which
`MasterSlave()'  copies  to  the   master   process.   On   the   master,
`MasterSlave()' then calls `<CheckTaskResult>( <taskInput>,  <taskOutput>
)', which returns an action. (Recall that <taskInput>,  <taskOutput>  and
`<CheckTaskResult>()' are defined by the application writer,  and  so  an
application program may give them different names.)

There are four possible  *actions*  ({\ParGAP}  constants):  `NO_ACTION',
`UPDATE_ACTION', `REDO_ACTION', `CONTINUATION_ACTION(  <taskContinuation>
)'.   A   standard   language   idiom   in   {\ParGAP}   is   to   define
`<CheckTaskResult>()'       as       the        {\ParGAP}        function
`DefaultCheckTaskResult()', whose code is as follows:

\begintt
DefaultCheckTaskResult := function( taskOutput, taskInput )
  if taskOutput = false then return NO_ACTION;
  elif IsUpToDate() then return UPDATE_ACTION;
  else return REDO_ACTION;
  fi;
end;
\endtt

\indextt{NO_ACTION!definition}
In the simplest case, `<CheckTaskResult>()' returns `NO_ACTION', in which
case there is no further computation related to the original <taskInput>.
`<CheckTaskResult>()'  may  record  global  information  on  the   master
process, based on the <taskOutput>, but the shared data,  and  hence  the
state of the slave processes, will not be modified.

\indextt{UPDATE_ACTION!definition}
In  the  second   most   common   case,   `<CheckTaskResult>()'   returns
`UPDATE_ACTION'.   This   action   causes   `MasterSlave()'    to    call
`<UpdateSharedData>(  <taskOutput>,  <taskInput>  )'  on  all   processes
(master and slaves). This is the *only* way in which the shared data  can
be modified by a correct {\ParGAP} program.

\indextt{REDO_ACTION!definition}
In the third most common case, `<CheckTaskResult>()' returns `REDO_ACTION'.
When a `REDO_ACTION' action is generated, the  value  of  <taskInput>  is
re-sent to the same slave that executed `<DoTask>( <taskInput> )' for the
current task. An application will typically invoke `REDO_ACTION'  if  the
shared data has changed, and this changed shared data will produce a  new
<taskOutput>. As  before,  `<DoTask>()'  then  returns  a  new  value  of
<taskOutput>. Then, <taskInput> and the new <taskOutput> are again passed
to `<CheckTaskResult>()'.

Note that `MasterSlave()' guarantees that `REDO_ACTION' causes  the  task
to be re-sent to the same slave process. This allows the  application  to
cache in a  global  variable  some  information  computed  by  the  first
invocation of `<DoTask>()'. A second invocation of `<DoTask>()' caused by
the `REDO_ACTION' allows the task to test if the <taskInput> is the  same
as  the  last  invocation.  In   that   case,   the   application-defined
`<DoTask>()' routine can recognize that this is a `REDO_ACTION',  and  it
can take advantage of the cached global variable  to  avoid  re-computing
certain quantities that would not be changed by the altered shared  data.
In order to make this strategy possible, `MasterSlave()' also  guarantees
that in the case of `REDO_ACTION', the slave process will not  have  seen
any intervening calls to `<DoTask>()' with values  of  `taskInput'  other
than the current value.

In typical usage, the application-defined routine, `<CheckTaskResult>()',
will first call `IsUpToDate()'. `IsUpToDate()' tests if the  shared  data
has  been  modified  since  the  current  <taskInput>  corresponding   to
`<CheckTaskResult>()' was originally generated by  `<SubmitTaskInput>()'.
The times of the relevant events are recorded as when seen on the  master
process. It is an error to call  `IsUpToDate()'  outside  of  a  call  to
`<CheckTaskResult>()'  by  `MasterSlave()'.  `IsUpToDate()'   returns   a
boolean value, `true' or `false'.

The last possible action, `CONTINUATION_ACTION( <taskContinuation> )', is
provided for unusual cases. As with advice about the use of ``goto'',  it
is recommended to avoid `CONTINUATION_ACTION()' where possible.

A favorite aphorism of this author is, ``The source code is the  ultimate
documentation''. With this in mind, the reader  may  also  wish  to  read
`lib/masslave.g', for which readability of the code was one of the design
criteria.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Modifying Task Output or Input (a dirty trick)}

At this point, it should be noted that  it  explicitly  *is*  allowed  to
modify the input or output of a task from  within  `<CheckTaskResult>()'.
This is  not  recommended  in  general,  but  there  may  be  times  when
`<CheckTaskResult>()' returns an `UPDATE_ACTION' and must also be used to
pass additional information to `<UpdateSharedData>()'. In order to modify
a previous input or output, it is  important  that  the  application  has
chosen a representation of the input or output as a list or record, which
can be modified in place, such that the  code  excerpt  succeeds  without
error.

\begintt
oldOutput := taskOutput;
# Modify taskOutput here
if ( IsIdenticalObj( oldOutput, taskOutput ) ) = false then
  Error( "MasterSlave() will see only oldOutput, not current taskOutput" );
fi;
return UPDATE_ACTION;
\endtt

In principle, a *dirty trick* like this would also work in  the  case  of
returning a `REDO_ACTION'. However, this is  not  recommended.  For  that
functionality,   the   code   will   be   clearer    if    an    explicit
`CONTINUATION_ACTION(   <modifiedTaskOutput>   )'   is   returned.    See
Section~"The GOTO statement of the TOP-C model" for further discussion on
the use of `CONTINUATION_ACTION()'.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{The GOTO statement of the TOP-C model}

\>CONTINUATION_ACTION( <taskContinuation> )!{definition}

The  `CONTINUATION_ACTION()',  like  the   <goto>   statement,   is   not
recommended for ordinary programs,  but  it  may  be  useful  in  unusual
circumstances. This  is  a  parametrized  action.  When  the  application
routine  `<CheckTaskResult>()'   returns   this   action,   `MasterSlave()'
guarantees to invoke `<DoTask>()' on the same slave process  as  for  the
original task. There will have been no intervening calls to  `<DoTask>()'
on that slave, although there  may  have  been  an  intervening  call  to
`<UpdateSharedData>()' on that slave.

This action allows arbitrary, repeated communication between  the  master
and a  single  slave  process.  The  slave  process  executes  `<DoTask>(
<taskInput>  )'  and  communicates  with  the  master  by   returning   a
<taskOutput>.   The   master   process    executes    `<CheckTaskResult>(
<taskInput>,  <taskOutput>  )'  and  returns  a  <taskContinuation>.  The
original  slave  process  then  receives  another  call   to   `<DoTask>(
<taskInput> )', this time with <taskInput> bound to <taskContinuation>.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Being nice to other users (Nice, Alarm and LimitRss)}

When you are running a long job on a network of  workstations,  you  will
often be sharing it with others. Making your parallel job as  unintrusive
as possible will leave you with a warmer welcome the next time  that  you
want to use that  network  of  workstations.  Accordingly,  three  useful
functions are provided.

\>UNIX_Nice( <priority> )!{definition} F

This is  similar  to  the  `nice'  command  of  many  UNIX  shells.  UNIX
priorities are in a range from $-20$ to 20 with $-20$ being the  highest.
Users typically start at priority  0.  You  can  give  yourself  a  lower
priority by specifying a priority of 5, for example. Usually,  priorities
19 and 20 are *absolute* priorities. Any process with a  priority  higher
than 19 that wishes to run will always have precedence. Other  priorities
are *relative* priorities. Your process will still receive some CPU  time
even if other processes with higher priorities are running. You  can  set
your priority lower, but you cannot raise it back to its  original  value
after that. The return value is the previous priority of your process.

\>UNIX_Alarm( <seconds> )!{definition} F

This causes the process to kill itself after that many seconds. This is a
useful safety measure, since it is unfortunately too easy for  a  runaway
slave process to continue if the master process  is  killed  without  the
normal `quit;'. You might consider  adding  something  like  `UNIX_Alarm(
25000 );' (about 6 hours) to your `.gaprc' file. Executing `UNIX_Alarm( 0
);' cancels any previous alarm. The return value is the number of seconds
remaining under the previous setting of the alarm.

\>UNIX_LimitRss( <size> )!{definition} [ = setrlimit(RLIMIT_RSS, ...) ] F

Many dialects of UNIX (and their shells)  offer  a  `limit'  or  `ulimit'
command to limit the resources  available  to  the  shell.  This  command
limits the size of the RSS (resident set size), or the amount of physical
RAM used by your process. The size limit is in bytes. Unfortunately, some
UNIX dialects may not allow or even silently ignore this request to limit
the RSS. A UNIX command such as `top' can show you if your process RSS is
staying below your requested limit.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Converting legacy sequential code to the TOP-C model}

The (tutorial contains a section~"Raw MasterSlave (ParMultMat revisited)",
about <raw> version of `MasterSlave()'  that  is  useful  for  converting
legacy sequential code to the TOP-C model. However,  that  model  is  not
recommended for writing new code, for stylistic reasons.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%E