Sophie: gap-system-4.4.12-5mdv2010.0 x86

gap-system-4.4.12-5mdv2010.0.x86_64.rpm

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%W  pargap6.tex            ParGAP documentation            Gene Cooperman
%%
%H  $Id: pargap6.tex,v 1.5 2001/07/28 23:49:06 gap Exp $
%%
%Y  Copyright (C) 1999-2001  Gene Cooperman
%Y    See included file, COPYING, for conditions for copying
%%

\Chapter{MPI commands and UNIX system calls in ParGAP}

This chapter can  be  safely  ignored  on  a  first  reading,  and  maybe
permanently. It is for application programmers who wish to develop  their
own low-level message-based parallel  application.  The  additional  UNIX
system calls in {\ParGAP} may also be useful in some applications.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Tutorial introduction to the MPI C library}

\atindex{MPI}{@MPI}
\atindex{Message Passing Interface}{@Message Passing Interface}
\atindex{MPI model}{@MPI model}\atindex{tutorial!MPI}{@tutorial!MPI}
This section lists some of the  more  common  message  passing  commands,
followed by a short MPI example.  The  next  section  ("Other  low  level
commands") contains more (but by no means all) of the  MPI  commands  and
some UNIX system calls. The {\ParGAP} binding provides a simplified  form
that makes interactive usage easier. This section describes the  original
MPI binding in~C, with  some  comments  about  the  interactive  versions
provided in {\ParGAP}. (The MPI standard includes a binding both to~C and
to~FORTRAN.)

Even if your ultimate goal is a standalone  C-based  application,  it  is
useful to prototype your application with  equivalent  commands  executed
interactively within {\ParGAP}. Note that this  distribution  includes  a
subdirectory `mpinu', which provides a  subset  MPI  implementation  in~C
with a small <footprint>. It consists of a C~library, `libmpi.a', and  an
include  file  `mpi.h'.  The  library  is   approximately   150~KB.   The
subdirectory can be consulted for further details.

We first briefly explain some MPI concepts.

\beginitems

*rank*:& 
    The *rank* of an MPI process is a unique ID  number  associated  with
    the process. By convention, the console (master) process has  rank~0.
    The ranks of the process are guaranteed by MPI to form a consecutive,
    ascending sequence of integers, starting with~0.

*tag*:& 
    Each message has associated with  it  a  non-negative  integer  *tag*
    specified by the application. Our interface allows you to ignore tags
    by letting them take on default values. Typical application uses  for
    tags are either to choose consecutive integers in order to  guarantee
    that all messages can be re-assembled in sequence,  or  to  choose  a
    fixed set of constant tags, each  constant  associated  with  another
    type of message. In the latter case, one might have  integers  for  a
    `QUIT_TAG', an `INTEGER_ARRAY_TAG', `START_TASK2_TAG', etc. In  fact,
    our  implementation  of  the  Slave  Listener   and   `MasterSlave()'
    specifically uses certain tags of value 1000  and  higher  for  these
    purposes.  Hence,  application  routines  that  do  use  tags  should
    restrict themselves to tags `[0..999]'.

*communicator*:& 
    A *communicator* in MPI serves the purpose of a namespace.  Most  MPI
    commands require a communicator argument to  specify  the  namespace.
    MPI  starts  up  with  a  default  namespace,  `MPI_COMM_WORLD'.  The
    {\ParGAP} implementation always  assumes  that  single  namespace.  A
    namespace is important in MPI to build modules and library  routines,
    so that a thread may distinguish messages meant  for  itself,  or  to
    catch errors of cross-communication between two modules.

*message*:&
    Each message in MPI is typically implemented to  include  fields  for
    the source rank,  destination  rank  (optional),  tag,  communicator,
    count, and an array of data. The <count> field specifies  the  length
    of the array. MPI guarantees that messages are non-overtaking, in the
    sense that if two messages are sent from a single source  process  to
    the same destination process, then it is guaranteed  that  the  first
    process sent will be the first one to arrive, and will be received or
    probed first from the queue.

*other*:&
    MPI also has concepts of  <datatype>,  <derived  datatype>,  <group>,
    <topology>, etc. This implementation defaults those values,  so  that
    <datatype> is always  a  character  (hence  the  use  of  strings  in
    {\ParGAP}), no <derived datatypes> are implemented, <group> is always
    consistent  with  `MPI_COMM_WORLD',  and  <topology>  is  the   fully
    connected topology.

*communication*:&
    This  implementation  implements  only  point-to-point  communication
    (always blocking receives, except for `MPI_Iprobe', and sends can  be
    blocking or not, according to the default underlying sockets).

*collective communication*:& 
    The MPI standard also provides for *collective communication*,  which
    sets up a barrier in which all process within the named  communicator
    must participate. One process is distinguished as the *root*  process
    in cases of  asymmetric  usage.  {\ParGAP}  does  not  implement  any
    collective communication (although you can easily emulate it using  a
    sequence of point-to-point commands). The MPI subset distribution (in
    {\ParGAP}'s  `mpinu'  directory)  does  provide  some  commands   for
    collective communication. Examples of  MPI  collective  communication
    commands are `MPI_Bcast' (broadcast), `MPI_Gather'  (place  an  entry
    from each  process  in  an  array  residing  on  the  root  process),
    `MPI_Scatter'  (inverse   of   gather),   `MPI_Reduce'   (execute   a
    commutative, associative function with an entry from each process and
    store on root; example functions are `sum', `and', `xor', etc.

*dynamic processes*:& 
    The newer MPI-2 standard allows  for  the  dynamic  creation  of  new
    processes on new  processors  in  an  ongoing  MPI  computation.  The
    standard is silent on whether an MPI session should be aborted if one
    of its member processes  dies,  and  the  MPI  standard  provides  no
    mechanism to recognize such a dead process. Part of  the  reason  for
    this silence is that much of the ancestry of MPI  lies  in  dedicated
    parallel computers for which it would be unusual for one  process  or
    processor to die.

\enditems

Here is a short  extract  of  MPI  code  to  illustrate  its  flavor.  It
illustrates the C equivalents of the following {\ParGAP}  commands.  Note
that the {\ParGAP} versions noted here take fewer parameters  than  their
C-based cousins,  and  {\ParGAP}  includes  defaults  for  some  optional
parameters.

\>MPI_Init()!{example} %
              [ called for you automatically when ParGAP is loaded ] F
\>MPI_Finalize()!{example} [ called for you automatically when GAP quits ] F
\>MPI_Comm_rank()!{example} F
\>MPI_Get_count()!{example} F
\>MPI_Get_source()!{example} F
\>MPI_Get_tag()!{example} F
\>MPI_Comm_size()!{example} F
\>MPI_Send( <string buf>, <int dest>[, <int tag = 0> ] )!{example} F
\>MPI_Recv( <string buf> [, <int source = MPI_ANY_SOURCE>[, %
            <int tag = MPI_ANY_TAG> ]  ] )!{example} F
\>MPI_Probe( [ <int source = MPI_ANY_SOURCE>[, %
             <int tag = MPI_ANY_TAG> ] ] )!{example} F

Many  of  the  above  commands  have  analogues  at  a  higher  level  in
section~"Slave    Listener     Commands"     as     `GetLastMsgSource()',
`GetLastMsgTag()', `MPI_Comm_size() = TOPCnumSlaves  +  1',  `SendMsg()',
`RecvMsg()' and `ProbeMsg()'.

\begintt
#include <stdlib.h>
#include <mpi.h>

#define MYCOUNT 5
#define INT_TAG 1

main( int argc, char *argv[] )
{
  int myrank;
  MPI_Init( &argc, &argv );
  MPI_Comm_rank( MPI_COMM_WORLD, &myrank );

  if ( myrank == 0 ) {
    int mysize, dest, i;
    int buf;
    printf("My rank (master):  %d\n", myrank);
    for ( i=0; i<MYCOUNT; i++ )
      buf = 5;
    MPI_Comm_size( MPI_COMM_WORLD, &mysize );
    printf("Size:  %d\n", mysize);
    for ( dest=1; dest< mysize; dest++ )
      MPI_Send( &buf, MYCOUNT, MPI_INT, dest, INT_TAG, MPI_COMM_WORLD );
  } else {
    int i;
    MPI_Status status;
    int source;
    int count;
    int *buf;
    printf("My rank (slave):  %d\n", myrank);

    MPI_Probe( MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status );
    printf( "Message pending with rank %d and tag %d.\n",
            status.MPI_SOURCE, status.MPI_TAG );
    if ( status.MPI_TAG != INT_TAG )
      printf("Error: Bad tag.\n"); exit(1);
    MPI_Get_count( &status, MPI_INT, &count );
    printf( "The count of how many data units (MPI_INT) is:  %d.\n", count );
    buf = (int *)malloc( count * sizeof(int) );

    source = status.MPI_SOURCE;
    MPI_Recv( buf, count, MPI_INT, source, INT_TAG, MPI_COMM_WORLD, &status );
    for ( i=0; i<MYCOUNT; i++ )
      if ( *buf != 5 ) printf("error:  buf[%d] != 5\n", i);
    printf("slave %d done.\n", myrank);
    }
  MPI_Finalize();
  exit(0);
}
\endtt

Even in this simplistic example, it was important to specify

\begintt
MPI_Recv( buf, count, MPI_INT, source, INT_TAG, MPI_COMM_WORLD, &status );
\endtt

and not to use `MPI_ANY_SOURCE' instead of  the  known  source.  Although
this alternative would often work, there is a danger that there might  be
a second incoming message from a different source  that  arrives  between
the calls to `MPI_Probe()' and `MPI_Recv()'. In such an event, MPI  would
be free to receive the second message in `MPI_Recv()',  even  though  the
appropriate count of the second message is likely to be  different,  thus
risking an overflow of the `buf' buffer.

Other typical bugs in MPI programs are:

\beginlist

\item{$\bullet$}
    Incorrectly matching corresponding sends and receives or having  more
    or fewer sends than receives due to the logic of multiple  sends  and
    receives within distinct loops.

\item{$\bullet$}
    Reaching deadlock because all active processes have blocking calls to
    `MPI_Recv()' while no process has  yet  reached  code  that  executes
    `MPI_Send()'.

\item{$\bullet$}
    Incorrect use of barriers in collective  communication,  whereby  one
    process might execute:

\begintt
MPI_Send( buf, count, datatype, dest, tag, COMM_1 );
MPI_Bcast( buffer, count, datatype, root, COMM_2 );
\endtt

\item{}
    and a second executes

\begintt
MPI_Bcast( buffer, count, datatype, root, COMM_2 );
MPI_Recv( buf, count, datatype, dest, tag, COMM_1, status );
\endtt

\item{}
    If the call to `MPI_Send()' is blocking (as  is  the  case  for  long
    messages in the case of many implementations), then the first process
    will block at `MPI_Send()' while the second blocks at  'MPI_Bcast()'.
    This happens even though they use  distinct  communicators,  and  the
    send-receive communication  would  not  normally  interact  with  the
    broadcast communication.

\endlist

Much of the TOP-C method in {\ParGAP} (see chapters~"Basic  Concepts  for
the TOP-C model (MasterSlave)" and~"MasterSlave Tutorial") was  developed
precisely to make errors like those above syntactically  impossible.  The
slave listener layer also does some additional work to keep track of  the
`status' that was last received and other bookkeeping. Additionally,  the
TOP-C method was  designed  to  provide  a  higher  level,  task-oriented
``language'', which would naturally lead the application programmer  into
designing an efficient high level algorithm.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Other low level commands}

\atindex{MPI commands!All ParGAP bindings}{@MPI commands!All ParGAP bindings}
\atindex{UNIX system calls!All ParGAP bindings}%
{@UNIX system calls!All ParGAP bindings}
Here is a complete  listing  of  the  low  level  commands  available  in
{\ParGAP}.  Some  of  these  commands  were  documented  elsewhere.   The
remaining ones are not recommended for most users. Nevertheless,  it  may
be useful to others for more sophisticated applications.

For most of these commands, the source code is the ultimat documentation.
However, you may be able to guess at the meaning of many of them based on
their names and their similarity  UNIX  system  calls  (in  the  case  of
`UNIX_...') or MPI commands (in  the  case  of  `MPI...').  Some  of  the
commands will also show you their calling parameters if called  with  the
wrong number of arguments. Many  of  the  MPI  commands  have  simplified
calling parameters with certain arguments optional or  set  to  defaults,
making them easier for interactive use.

\atindex{UNIX functions}{@UNIX functions}
\atindex{functions!UNIX}{@functions!UNIX}
\>UNIX_MakeString( <len> ) F
\>UNIX_DirectoryCurrent() [ Defined in `pkg/pargap/lib/slavelist.g' ] F
\>UNIX_Chdir( <string> ) F
\>UNIX_FflushStdout() F
\>UNIX_Catch( <function>, <return_val> ) F
\>UNIX_Throw() F
\>UNIX_Getpid() F
\>UNIX_Hostname() F
\>UNIX_Alarm( <seconds> ) F
\>UNIX_Realtime() F
\>UNIX_Nice( <priority> ) F
\>UNIX_LimitRss( <bytes_of_ram> ) [ = setrlimit(RLIMIT_RSS, ...) ] F

\atindex{MPI functions}{@MPI functions}
\atindex{functions!MPI}{@functions!MPI}
\>MPI_Init() F
\>MPI_Initialized() F
\>MPI_Finalize() F
\>MPI_Comm_rank() F
\>MPI_Get_count() F
\>MPI_Get_source() F
\>MPI_Get_tag() F
\>MPI_Comm_size() F
\>MPI_World_size() F
\>MPI_Error_string( <errorcode> ) F
\>MPI_Get_processor_name() F
\>MPI_Attr_get( <keyval> ) F
\>MPI_Abort( <errorcode> ) F
\>MPI_Send( <string buf>, <int dest>[, <int tag = 0> ] ) F
\>MPI_Recv( <string buf> [, <int source = MPI_ANY_SOURCE>[, <int tag = MPI_ANY_TAG> ] ] ) F
\>MPI_Probe( [ <int source = MPI_ANY_SOURCE>[, <int tag = MPI_ANY_TAG> ] ] ) F
\>MPI_Iprobe( [ <int source = MPI_ANY_SOURCE>[, <int tag = MPI_ANY_TAG> ] ] ) F

\atindex{MPI global constants}{@MPI global constants}
\atindex{constants!MPI, global}{@constants!MPI, global}
\>`MPI_ANY_SOURCE' V
\>`MPI_ANY_TAG' V
\>`MPI_COMM_WORLD' V
\>`MPI_TAG_UB' V
\>`MPI_HOST' V
\>`MPI_IO' V

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%E