%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %W pargap1.tex ParGAP documentation Gene Cooperman %% %H $Id: pargap1.tex,v 1.8 2001/11/16 15:57:00 gap Exp $ %% %Y Copyright (C) 1999-2001 Gene Cooperman %Y See included file, COPYING, for conditions for copying %% \pretolerance=500 % Will tolerate badness of 500 before trying hyphenations% \tolerance=1600 % Will tolerate stretching line up to badness of 1600% \hbadness=4000 % Seems to affect overfull boxes reported by TeX% \hfuzz=5pt % If still no good break, can stick out into margin by 5 pt.% \overfullrule=0pt % Lines sticking out more than 10 pt should not% % contain the black box marking it.% \Chapter{Writing Parallel Programs in GAP Easily} \indextt{ParGAP} The {\ParGAP} (Parallel {\GAP}) package provides a way of writing parallel programs using the {\GAP} language. Former names of the package were \package{ParGAP/MPI} and \package{GAP/MPI}; the word <MPI> refers to <Message Passing Interface>, a well-known standard for parallelism. {\ParGAP} is based on the MPI standard, and this distribution includes a subset implementation of MPI, to provide a portable layer with a high level interface to BSD sockets. Since knowledge of MPI is not required for use of this software, we now refer to the package as simply {\ParGAP}. For more information visit the author's {\ParGAP} home page at: \URL{http://www.ccs.neu.edu/home/gene/pargap.html} For some background reading, see~\cite{Coo95} and \cite{Coo97}. This first chapter is intended to help a new user set up {\ParGAP} and run through some quick examples: see \beginlist%unordered \item{$\bullet$} Section~"Overview of ParGAP" for an overview of the features of {\ParGAP} and a general discussion of how it's implemented; \item{$\bullet$} Section~"Installing ParGAP" for how to install {\ParGAP}; \item{$\bullet$} Section~"Running ParGAP" for how to run {\ParGAP} (*not* by using `RequirePackage'); and \item{$\bullet$} Section~"Extended Example" for some introductory {\ParGAP} examples. \endlist The later chapters present detailed explanations of the facilities of {\ParGAP}. Because parallel programming is sufficiently different from sequential programming, this author recommends printing out at least Chapters~1 through~"MasterSlave Tutorial", and skimming through those chapters for areas of interest, before returning to the terminal to try out some of the ideas. This document can be found in `.../pkg/pargap/doc/manual.dvi' of the software distribution. You may also want to print the index at the end of `manual.dvi'. In particular, the heading `example' in the index, or `??example' from within {\GAP}, should be useful. If you prefer postscript, the UNIX command `dvips' will convert that file to postscript form. The development of {\ParGAP} was partially supported by National Science Foundation grants CCR-9509783 and CCR-9732330. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Overview of ParGAP} {\ParGAP} is currently functional only on UNIX installations. (Cygwin for Windows is also an option, if you would like to port it.) {\ParGAP} can be installed on top of an existing {\GAP} installation. See Section~"Installing ParGAP" for instructions on installation of {\ParGAP}. At the time that {\ParGAP} is invoked, a special ``procgroup'' file must be available to tell {\ParGAP} which processors to use for slave processors. See sections~"Installing ParGAP" and~"Extended Example" for instructions on invoking {\ParGAP}. If there are questions or bugs concerning {\ParGAP}, please write to: \Mailto{gene@ccs.neu.edu} If one wishes only to try out the parallel features, the first five pages of this manual (through the section on the slave listener) will suffice for installation, and using it. For the more advanced user who wishes to design new parallel algorithms or port old sequential code to a parallel environment, it is strongly recommended to also read the sections following on from Section~"Basic Concepts for the TOP-C model (MasterSlave)". {\ParGAP} should be invoked via the script `bin/pargap.sh' created by the installation process which invokes `<GAP_ROOT_DIR>/bin/<ARCH>/pargapmpi', where <ARCH> depends on your system but is the same directory in which the `gap' binary is found. MPI and the higher layers will not be available if the binary is invoked in the standard way as `gap'. This is a feature, since a single binary and source distribution serves both for the standard {\GAP} and for {\ParGAP}. {\ParGAP} is implemented in three layers: 1)~MPI, 2)~Slave~Listener, and 3)~Master~Slave (TOP-C abstraction). Most users will find that the two highest layers (Slave Listener and Master Slave) meet all their needs. \beginitems `1) MPI:'& The lowest layer is MPI. Most users can ignore this layer. MPI is a standard for message-based parallel computation. A subset of the original MPI commands is provided. The syntax is modified from the original C binding to make a {\GAP} binding in an interpreted environment more convenient. This includes default arguments, useful return values, and `Error' break in the presence of errors. `MPI_Init()' (see~"MPI_Init") and `MPI_Finalize()' (see~"MPI_Finalize") are invoked automatically by {\ParGAP}. `'& The MPI layer is not documented, since most users will not be using it. From {\GAP} level, you can type: `MPI_<tab><tab>' to see all implemented MPI functions and variables. However, typing the symbol name alone (e.g.: `MPI_Send;' ) will cause it to display the calling syntax. The same information is displayed after an incorrect call. The return value is typically obvious. MPI is implemented in `src/pargap.c'. The standard distribution uses a simple, subset implementation of MPI in `pkg/gapmpi/mpinu/', which is implemented on top of a standard sockets interface. It is possible to substitute other implementations of MPI. \atindex{MPI!standard}{@MPI!standard} `'& For those who wish to directly use the MPI interface, the meanings of the MPI calls are best found from the standard MPI documentation: `'&MPI Forum: \URL{http://www.mpi-forum.org/} `'&MPI Standard (version 1.1): \URL{http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html} `'&UNIX style man pages: \URL{http://www-c.mcs.anl.gov/mpi/www/} `2) Slave Listener:'& This layer provides basic message passing facilities for communication among multiple {\ParGAP} processes in a form that is more convenient for programming than the lower MPI layer. This will be the most useful entry point to {\ParGAP} for most users. This is the default mode for {\ParGAP}. Each remote (slave) process is in a receive-eval-send loop, in which the slave receives a {\GAP} command from the local or master, the slave evaluates the {\GAP} command, and the slave then sends the result back to the master as a {\GAP} object. `'& Almost all commands in the slave listener are of the form `*Msg*' e.g. `SendMsg()' (see~"SendMsg"), `RecvMsg()' (see~"RecvMsg"), `ProbeMsg()' (see~"ProbeMsg"). Since the slave is in a receive-eval-send loop, every `SendMsg(<cmd>)' on the master must be balanced by a later `RecvMsg()'. `SendRecvMsg()' (see~"SendRecvMsg") is provided to combine these steps. A few parallel utilities are also included, such as `ParRead()' ("ParRead"), `ParList()' ("ParList"), `ParEval()' ("ParEval"), etc. `'& Messages are arbitrary {\GAP} objects. Note that arguments to any {\GAP} function are evaluated before being passed to the function. Hence, any argument to `SendMsg()' or `ParEval()' would be evaluated locally before being sent across the network. For this reason, arguments can also be given as strings, to delay evaluation until reaching the destination process. Hence, real strings must be quoted: `ParEval("x:=\"abc\";");' Additionally, multiple commands are valid, and the final ```;''' of the string is optional. So, one can write: \begintt BroadcastMsg("x:=\"abc\"; Print(Length(x), \"\\n\")");; \endtt `'& A full description is contained in Chapter~"Slave Listener". `3) Master Slave:'& The Master Slave facility is provided both for writing complex parallel software, and as an easier way to parallelize previous or ``legacy'' sequential code. While the Slave Listener may be sufficient for simple parallel requirements, more complex software requires a higher level abstraction. The fundamental abstractions of the master slave layer are the *task* and the *shared data*. \beginlist \itemitem{`1)'} The task typically corresponds to the procedure or inner body of a loop in a sequential program. This is the part that must be repetitively computed in parallel. \itemitem{`2)'} The shared data typically corresponds to the data of a sequential program that is not within the local scope of the task. Often this is a global data structure. In the case that the task is the inner body of a loop, the shared data may be a local data structure that is outside the local scope of the loop. \endlist `'& It is usually quite easy to identify the task and the shared data of a sequential program or algorithm, which is the first step in parallelizing an algorithm. `'& The Master Slave parallel model described here has also been successfully used in~C and in LISP. It has been used both in distributed memory and shared memory environments, although this version in {\GAP} currently works only in a distributed environment. In the C~language, this parallel model is known as TOP-C (Task Oriented Parallel~C). For examples of the use of the TOP-C model see \cite{Coo98}, \cite{CFTY94}, \cite{CH97}, \cite{CHLM97}, \cite{CLMW96}, and \cite{CT96}. `'& While no parallel software can eliminate the problem of designing an algorithm that is efficient in a parallel environment, the TOP-C abstraction eases the job by eliminating programmer concerns about lower level details, such as message passing, migration and replication of data, load balancing, etc. This leaves the programmer to concentrate on the primary goal: maximizing the concurrency or parallelism. \enditems %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Installing ParGAP} \index{installation} Installing {\ParGAP} should be relatively simple. However, since there are many interactions both with the {\GAP} kernel and with the UNIX operating system, in a minority of cases, manual intervention will be necessary. If you are part of this minority, please see the section "Problems with Installation". The most common problem is the local security policy; {\ParGAP} is more pleasant to use when you don't have to manually provide the password for each slave. See section "Problems with Passwords (Getting Around Security)" for suggestions in this respect. To install the {\ParGAP} package, move the file `pargap-<XXX>.zoo' or `pargap-<XXX>.tar.gz' (for some version number <XXX> of {\ParGAP}) into the `pkg' directory in which you plan to install {\ParGAP}. Usually, this will be the directory `pkg' in the hierarchy of your version of {\GAP}~4 (in fact, currently it is not possible to have the `pkg' directory separate from {\GAP}'s `pkg' directory; we hope to remedy this in future versions of {\ParGAP} so that it will also possible to keep an additional `pkg' directory in your private directories; section "ref:Installing GAP Packages" of the GAP 4 reference manual gives details on how to do this, when it's possible.) Now change into the `pkg' directory in which you plan to install {\ParGAP}. If you got a `.zoo' file, unpack it with: \){\kernttindent}unzoo -x pargap-<XXX> If you got a `.tar.gz' file and your `tar' command supports the `z' option, unpack it with: \){\kernttindent}tar zxf pargap-<XXX>.tar.gz or otherwise unpack in two steps with: \){\kernttindent}gunzip pargap-<XXX>.tar \){\kernttindent}tar xvf pargap-<XXX>.tar Whether you got the `.zoo' or `.tar.gz' archive you should now have a new directory `pargap'. As for a generic {\GAP} package, do: \begintt cd pargap ./configure ../.. make \endtt If your version of {\GAP} is earlier than {\GAP}~4.3 you will first need to adjust {\GAP}'s `lib/init.g' file; see item~0.\ of Section~"Problems with Installation". Your {\ParGAP} should now be ready to use. Now read the next section which decribes how to run {\ParGAP} (if you are reading this from {\GAP}'s on-line help, type: `?>'). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Running ParGAP} After doing the `configure' and `make' steps of {\ParGAP}'s installation process (see Section~"Installing ParGAP"), you should find in {\ParGAP}'s `bin' subdirectory a script \begintt pargap.sh \endtt which you should use to start {\ParGAP}. ({\ParGAP} can *not* be started by starting {\GAP}~4 in the usual way, and using `RequirePackage'; doing so will result in `Info'-ed advice to read this section.) Edit the `pargap.sh' script if necessary, copy it to a standard path and rename it according to how you intend to call {\ParGAP} (e.g. rename it: `pargap'). Also, in the `bin' subdirectory is a sample `procgroup' file which defines the master and slave processes that will be used by {\ParGAP}. When {\ParGAP} is started it looks for a file called `procgroup' in the current directory, unless the `-p4pg' option is used. Thus if you renamed your shell script `pargap', the following are valid ways of starting {\ParGAP}: \begintt pargap \endtt (if current directory contains the file: `procgroup'), or \){\kernttindent}pargap -p4pg <myprocgroupfile> (where <myprocgroupfile> is the complete path of your procgroup file -- there is no restriction on how you name it). If you had trouble installing {\ParGAP}, see the section~"Problems with Installation". Otherwise continue onto Section~"Extended Example" and try out {\ParGAP}. *Note:* The script `pargap.sh' defines the program that runs {\ParGAP} as `pargapmpi'. In fact, after installation `pargapmpi' is a symbolic link to the {\GAP} binary named `gap'. The same binary runs both {\GAP} and {\ParGAP}; when the binary is invoked as `gap' {\GAP} runs in the usual way without any parallel features; only when the binary is invoked as `pargapmpi' are the parallel features incorporated. See Section~"Modifying the GAP kernel" for more details. Now you are ready to test your installation, try the example in the following section (if you are reading this from {\GAP}'s on-line help, type: `?>'). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Extended Example} After installation, try it out. Invoke {\ParGAP} as described in Section~"Running ParGAP" and try the example below (but substitute your own program where you see `"/home/gene/myprogram.g"'). The commands in this first example are also found in the `README' file. So, you may wish to copy text from the `README' file and paste it into a `ParGAP' session. If you are using the unmodified `procgroup' file, your *remote slaves* will be other processes on your local machine. It is a good idea to run only on your local machine for your first experiments and while you are debugging parallel programs. When you wish to experiment with using remote machines, you can then proceed to the following section, "Invoking ParGAP with Remote Slaves". \atindex{example!Slave Listener}{@example!Slave Listener} \atindex{Slave Listener!example}{@Slave Listener!example} \beginexample gap> # This assumes your procgroup file includes two slave processes. gap> PingSlave(1); #a `true' response indicates Slave 1 is alive true gap> # Print() on slave appears on standard output gap> # i.e. after the master's prompt. gap> SendMsg( "Print(3+4)" ); gap> 7 gap> # A <return> was input above to get a fresh prompt. gap> # gap> # To get special characters (including newline: `\n') gap> # into a string, escape them with a `\'. gap> SendMsg( "Print(3+4,\"\\n\")" ); gap> 7 gap> # Again, a <return> was input above after the 7 and new-line gap> # were printed to get a fresh prompt. gap> # gap> # Each SendMsg() is normally balanced by a RecvMsg(). gap> SendMsg( "3+4", 2); gap> RecvMsg( 2 ); 7 gap> # The following is equivalent to the two previous commands. gap> SendRecvMsg( "3+4", 2); 7 gap> # Flush any messages that are pending. The response is gap> # the number of messages flushed. (Above, the two gap> # SendMsg("Print...") (to the default slave: 1) did not gap> # have a corresponding RecvMsg() command.) gap> FlushAllMsgs(); 2 gap> # As with Print() the result of Exec() appears on standard gap> # output. Print() and Exec() are each `no-value' functions, gap> # and so the result of a RecvMsg() in these cases gap> # is "<no_return_val>". gap> SendRecvMsg( "Exec(\"pwd\")" ); # Your pwd will differ :-) /home/gene "<no_return_val>" gap> # Put default slave into an infinite loop. gap> SendMsg("while true do od"); gap> # Default slave can't execute the next command until it's gap> # finished with the previous command. gap> SendMsg("Print(\"WAKE UP\\n\")"); gap> # Check to see if a message is waiting to be collected but gap> # return immediately (i.e. don't get blocked by waiting for gap> # a message to appear). A `false' response indicates the gap> # infinite loop hasn't terminated and produced a value yet! gap> ProbeMsgNonBlocking(); false gap> # Send an interrupt to each slave, slave 1 will see the gap> # following command and print `WAKE UP', and then all gap> # pending messages are flushed. gap> ParReset(); ... resetting ... WAKE UP 0 gap> # The return value, 0, from ParReset() indicates there gap> # were 0 pending messages flushed, confirming correctness gap> # of ProbeMsgNonBlocking() when it returned "false" gap> SendRecvMsg( "a:=45; 3+4", 1 ); 7 gap> # Note "a" is defined on slave 1, not slave 2. gap> SendMsg( "a", 2 ); # Slave prints error, output on master gap> Variable: 'a' must have a value gap> # <return> entered to get fresh prompt. gap> RecvMsg( 2 ); # No value for last SendMsg() command "<no_return_val>" gap> RecvMsg( 1 ); 45 gap> myfnc := function() return 42; end;; gap> # Use PrintToString() to define myfnc on all slave processes gap> BroadcastMsg( PrintToString( "myfnc := ", myfnc ) ); gap> SendRecvMsg( "myfnc()", 1 ); 42 gap> FlushAllMsgs(); # There are no messages pending. 0 gap> # Execute analogue of GAP's List() in parallel on slaves. gap> squares := ParList( [1..100], x->x^2 ); [ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000 ] gap> # Ensure problem shared data is read into master and slaves. gap> # Try one of your GAP program files instead. gap> ParRead( "/home/gene/myprogram.g"); \endexample Now that you have done a fairly rudimentary test of {\ParGAP} you should be ready to do something a little bit more interesting: \beginexample gap> ParInstallTOPCGlobalFunction( "MyParList", > function( list, fnc ) > local result, iter; > result := []; > iter := Iterator(list); > MasterSlave( function() if IsDoneIterator(iter) then return NOTASK; > else return NextIterator(iter); fi; end, > fnc, > function(input,output) result[input] := output; > return NO_ACTION; end, > Error > ); > return result; > end ); gap> MyParList( [1..25], x->x^3 ); master -> 1: 1 master -> 2: 2 2 -> master: 8 1 -> master: 1 master -> 1: 3 master -> 2: 4 2 -> master: 64 1 -> master: 27 master -> 1: 5 master -> 2: 6 2 -> master: 216 1 -> master: 125 master -> 1: 7 master -> 2: 8 2 -> master: 512 1 -> master: 343 master -> 1: 9 master -> 2: 10 2 -> master: 1000 1 -> master: 729 master -> 1: 11 master -> 2: 12 2 -> master: 1728 1 -> master: 1331 master -> 1: 13 master -> 2: 14 2 -> master: 2744 1 -> master: 2197 master -> 1: 15 master -> 2: 16 2 -> master: 4096 1 -> master: 3375 master -> 1: 17 master -> 2: 18 2 -> master: 5832 1 -> master: 4913 master -> 1: 19 master -> 2: 20 2 -> master: 8000 1 -> master: 6859 master -> 1: 21 master -> 2: 22 2 -> master: 10648 1 -> master: 9261 master -> 1: 23 master -> 2: 24 2 -> master: 13824 1 -> master: 12167 master -> 1: 25 1 -> master: 15625 [ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ] gap> ParInstallTOPCGlobalFunction( "MyParListWithAglom", > function( list, fnc, aglomCount ) > local result, iter; > result := []; > iter := Iterator(list); > MasterSlave( function() if IsDoneIterator(iter) then return NOTASK; > else return NextIterator(iter); fi; end, > fnc, > function(input,output) > local i; > for i in [1..Length(input)] do > result[input[i]] := output[i]; > od; > return NO_ACTION; > end, > Error, # Never called, can specify anything > aglomCount > ); > return result; > end ); gap> MyParListWithAglom( [1..25], x->x^3, 4 ); master -> 1: (AGGLOM_TASK): [ 1, 2, 3, 4 ] master -> 2: (AGGLOM_TASK): [ 5, 6, 7, 8 ] 1 -> master: [ 1, 8, 27, 64 ] 2 -> master: [ 125, 216, 343, 512 ] master -> 1: (AGGLOM_TASK): [ 9, 10, 11, 12 ] master -> 2: (AGGLOM_TASK): [ 13, 14, 15, 16 ] 1 -> master: [ 729, 1000, 1331, 1728 ] 2 -> master: [ 2197, 2744, 3375, 4096 ] master -> 1: (AGGLOM_TASK): [ 17, 18, 19, 20 ] master -> 2: (AGGLOM_TASK): [ 21, 22, 23, 24 ] 1 -> master: [ 4913, 5832, 6859, 8000 ] 2 -> master: [ 9261, 10648, 12167, 13824 ] master -> 1: (AGGLOM_TASK): [ 25 ] 1 -> master: [ 15625 ] [ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ] \endexample If you wish an accelerated introduction to the models of parallel programming provided here, you might wish to read the beginning of Chapter~"Slave Listener" through section~"Slave Listener Commands", and then proceed immediately to Chapter~"Basic Concepts for the TOP-C model (MasterSlave)". %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Author} The {\ParGAP} package was designed and written by Gene Cooperman, College of Computer Science, Northeastern University, Boston, MA, U.S.A. If you use {\ParGAP} to solve a problem then please send a short email to \Mailto{gene@ccs.neu.edu} about it, and cite the {\ParGAP} package as follows: \begintt \bibitem[Coo99]{Coo99} Cooperman, Gene, {\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1, College of Computer Science, Northeastern University, 1999, \verb+http://www.ccs.neu.edu/home/gene/pargap.html+. \endtt %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Invoking ParGAP with Remote Slaves} {\ParGAP}, unlike {\GAP}, must be invoked under a separate name. After {\ParGAP} has been installed, a script `bin/pargap.sh' will have been created which (after any changes you needed to make; see Section~"Installing ParGAP") you should use to invoke {\ParGAP}. This is similar to `<GAP_ROOT_DIR>/bin/gap.sh' that is used to invoke the non-parallel {\GAP}. Installers are encouraged to treat `pargap.sh' in analogy to `gap.sh'. For example, if your site has copied `gap.sh' to `/usr/local/bin/gap', then you should also look for the `pargap.sh' script as `/usr/local/bin/pargap'. In addition, when `pargap' (we'll assume that's how {\ParGAP} is invoked at your site) is called, there must be a file, `procgroup', in the current directory, or alternatively, if you wish to use a single procgroup file for all jobs, and that procgroup file is in `/home/joe', then you can alias `pargap' to `pargap -p4pg /home/joe/procgroup'. The procgroup file has a simple syntax, taken from the MPICH implementation of MPI (inherited from P4). A `\#' in column~1 introduces a comment line. The first non-comment line should be `local 0', verbatim. This line declares the master process as the local process. Other lines are of the form: \){\kernttindent}<host-machine> 1 <pargap-script> e.g. \begintt regulus.ccs.neu.edu 1 /usr/local/bin/pargap \endtt The first field is the hostname for a remote process. The second field specifies one thread per process. ({\ParGAP} recognizes only the value~1 for the second field.) The third field is an absolute pathname for {\ParGAP}, as it would be called on the remote process. Note that you can repeat the same line twice if you want two remote {\ParGAP} processes on the same processor. The default `procgroup' provided in the distribution will have lines of form: \){\kernttindent}localhost 1 <path-of-provided-pargap.sh> If you change <path-of-provided-pargap.sh> to just, say, `pargap', this will work only if `pargap' is in your path on the remote machine shell (`localhost' in this case), using your default shell. On most machines, `localhost' is an alias for the local processor. This is a good default for debugging, so that you don't disturb users on other machines. MPI will use a line \){\kernttindent}<host-machine> 1 <pargap-script> to create a UNIX subprocess executing: \){\kernttindent}rsh <host-machine> <pargap-script> Suppose <host-machine> is `regulus.ccs.neu.edu' and <pargap-script> is `/usr/local/bin/pargap' as in the above example, and we were to have trouble invoking {\ParGAP}, then it would be a good idea to try invoking `rsh regulus.ccs.neu.edu' from a UNIX prompt and if that succeeds, to then try executing the full `rsh' command. A typical problem is that the remote processor requires a password to login. MPI requires a login without passwords. Typically, `/etc/hosts.equiv' has not been set up to remove the password requirement for your remote host. Sometimes this can be solved by an appropriate `.rhosts' file in your home directory on the remote host. Sometimes, PAM is also used for user authentication (see `/etc/pam.conf'). `man in.rshd' also has helpful information. Consult your system staff for further analysis. In these days of hyper-security, `rsh' may be disabled at your site and you may have to use `ssh' instead; if so, there is a solution here: add the lines \begintt ############################################################################# ## ## RSH . . . .. . . . . . . . . . . . . . . . . remote shell used by ParGAP ## ## RSH=ssh export RSH \endtt before the `GAP' block with the `exec' line. (Of course, the `\#' lines are not needed; they are comments.) Note that the remote {\ParGAP} process will not read from standard input, although signals such as SIGINT (`\^{}C') may be received by the remote process. However, the remote {\ParGAP} process will write to standard output, which is relayed to the local process. So, \beginexample gap> SendMsg("Exec(\"hostname\")", 2); \endexample will execute and print from the remote process. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Problems with Installation} If you still have problems, here is a list of things to check. \beginlist \item{0.} In versions of {\GAP} earlier than {\GAP}~4.3 some {\ParGAP} ``hooks'' need to be added to {\GAP}'s `lib/init.g' file. Please add: \begintt PAR_GAP_SLAVE_START := fail; \endtt \item{} before the line: \begintt READ(GAP_RC_FILE); \endtt \item{} and add: \begintt if PAR_GAP_SLAVE_START <> fail then PAR_GAP_SLAVE_START(); fi; \endtt \item{} at the end of the file. \item{1.} Do you have enough swap space to support multiple {\GAP} processes? A simple way to check this is with the UNIX command, `top'. The Linux version of `top' sorts by memory usage if you type `M'. \item{2.} `make' tries to automatically create: \begintt pkg/pargap/bin/pargap.sh \endtt \item{} and copy the parameters from `<GAP_ROOT>/bin/gap.sh'. <GAP_ROOT> was specified when you executed `./configure <GAP_ROOT>' to install ParGAP. This can be error-prone if your site has an unusual setup. If you execute `<GAP_ROOT>/bin/gap.sh', does gap come up? If so, compare it with `pargap.sh' and check for correct settings in `.../pkg/pargap/bin/pargap.sh'? \item{3.} Did {\ParGAP} find your `procgroup' file? [It looks in the current directory for `procgroup', or for: \){\kernttindent}... -p4pg <PATH>/procgroup \item{} on the command line.] \item{4.} Were the remote slave processes able to start up? If so, could they connect back to the master? To test connectivity problems, try manually starting a remote slave by executing a line in the script. Try a simple `rsh <remote-hostname>' to see if the issue is with security. If your site uses `ssh' instead of `rsh', then there is a security issue. Read Section~"Problems with Passwords (Getting Around Security)", and possibly `man sshd'. \item{5.} If the previous step failed due to security issues, such as requesting a password, you have several options. `man rshd' tells you the security model at your site (or possibly `man ssh' if you use that). Then read Section~"Problems with Passwords (Getting Around Security)". \item{6.} Is the `procgroup' file in your current directory set correctly? Test it. If you are calling it on a remote host, manually type: \){\kernttindent}rsh <HOSTNAME> <ParGAP> \item{} where <HOSTNAME> and <ParGAP> appear exactly as in `procgroup', e.g. \){\kernttindent}rsh denali.ccs.neu.edu /usr/local/gap4r3/bin/pargap.sh \item{} In some cases, `exec' is used to save process overhead. Also try: \){\kernttindent}rsh <HOSTNAME> exec <ParGAP> \item{} If you plan to call it on localhost, try just: <ParGAP> \item{} Note that if not all the slave processes succeed in connecting to the master, then {\ParGAP} writes out a file: \begintt /tmp/pargapmpi--rsh.$$ \endtt \item{} where `\$\$' is replaced by the the process id of the {\ParGAP} process. \item{7.} Is `pargap' listed in `.../pkg/ALLPKG'? [It's needed to autostart slaves.] \item{8.} Inside {\ParGAP}, has MPI been successfully initialized? Try: \beginexample gap> MPI_Initialized(); \endexample \item{9.} A remote (slave) {\ParGAP} process starts in your home directory and tries to `cd' to a directory of the same name as your local directory. Check your assumptions about the remote machine. Try: \beginexample gap> SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()"); gap> SendRecvMsg("UNIX_Getpid()"); \endexample \item{10.} If the connection dies at random, after some period of time: You can experiment with `SO_KEEPALIVE' and variants. (See `man setsockopt'.) This periodically sends *null messages* so the remote machine does not think that the originating machine is dead. However, if the remote machine fails to reply, the local process sends a SIGPIPE signal to notify current processes of a broken socket, even though there might have been only a temporary lapse in connectivity. `ssh' specifies `KeepAlive yes' by default, but setting `KeepAlive no' might get you through some transient lapses in connectivity due to high congestion. You may also want to experiment with: `setenv RSH "rsh -n"' \item{11.} Read the documentation for further possible problems. \endlist %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Problems with Hosts on Multiple Networks} If a host is on multiple networks, it will have multiple IP addresses and usually multiple hostnames. In this case, the master process cannot always guess correctly which IP address (which internet address) should be passed to the slave process, so that the slave process can call back to the master. In such cases, you may need to tell {\ParGAP} which hostname or IP address to use for the callback. This is done by setting the UNIX environment variable, `CALLBACK_HOST', as in the example below. \begintt # [ in sh/bash/... ] CALLBACK_HOST=denali.ccs.neu.edu; export CALLBACK_HOST # [ in csh/tcsh/... ] setenv CALLBACK_HOST=denali.ccs.neu.edu \endtt The appropriate line for your shell can be placed in your shell initialization file. Alternatively, you can set this up for all users by placing the Bourne shell version (for `sh') somewhere between the first and last line of `.../pkg/pargap/bin/pargap.sh'. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Problems with Passwords (Getting Around Security)} There is a simple test to see if you need to read this section. Pick a remote machine, <HOSTNAME>, that you wish to execute on, and type: `rsh <HOSTNAME>'. If this did not work, also try `ssh <HOSTNAME>'. If you were asked for your password, then you and your system administrator may need to talk about security policy. If you were successful with `ssh' and not with `rsh' then set the environment variable, `RSH', to the value `ssh', as described in item~3 below. \beginlist \item{(1)} Ask your systems administrator to put the machines in a `hosts.equiv' file, so that logging in from one to the other does not require a password. (`man hosts.equiv') \item{(2)} Add a `.rhosts' file to your home directory (or `.shosts' for `ssh'). \item{(3)} Hack around the problem: By default, the startup script uses `rsh' to start remote processes. However, if the environment variable `RSH' was set, the script uses the value of the environment variable instead of `rsh'. This may be useful, if you have your own script, `myrsh', that automatically gets around the security issues. Then just type: \begintt RSH=myrsh; export RSH # [ in sh/bash/... ] setenv RSH myrsh # [ in csh/tcsh/... ] \endtt \item{} The appropriate line for your shell can be placed in your shell initialization file. Alternatively, you can set this up for all users by placing the Bourne shell version (for `sh') somewhere between the first and last line of `.../pkg/pargap/bin/pargap.sh'. (The example for `ssh' was given earlier.) \item{(4)} `ssh': `man ssh' mentions some possibilities for giving the password the first time, and then having ssh remember that future logins to that machine are authorized for the duration of the session. Don't overlook the use of `\$HOME/.ssh/config' to set special parameters, such as specifying a different login name on the remote machine. Some parameters of interest might be `KeepAlive', `RSAAuthentication', `UseRsh'. You may also find useful information in `man sshd'. \item{(5)} After starting {\ParGAP}, manually call \begintt /tmp/pargapmpi--rsh.$$ \endtt \item{} and repeatedly type in the password for each slave process. If you find yourself doing this, you may want to talk with your system administrator, since it actually hurts system security to have you repeatedly typing passwords with a concommitant risk that someone else will find out your password. \endlist %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Modifying the GAP kernel} Note that this package modifies the {\GAP} `src' and `bin' files, and creates a new {\GAP} kernel. This new {\GAP} kernel can be shared by traditional users of the old, sequential {\GAP} kernel, and by those doing parallel processing. The {\GAP} kernel will have identical behavior to the old {\GAP} kernel when invoked through the `gap.sh' script or the `bin/@GAParch@/gap' binary. The new {\ParGAP} variables will appear to the end user *ONLY* if the {\GAP} binary was invoked as `pargapmpi': a symbolic link to the actual {\GAP} binary. The script, `pargap.sh', does this. So, in a multi-user environment, traditional users can continue to use `gap.sh' without noticing any difference. Only an invocation of `pargap.sh' will add the new features. In a future version of {\GAP}, it is hoped that the {\GAP} kernel will have enough ``hooks'', so that no modification of the {\GAP} kernel is required. At that time, it will also be possible to speed up the startup time for {\ParGAP}. Much of the startup time is caused by waiting for {\GAP} to read its library files. It will be possible to use the {\GAP} function, `SaveWorkspace()' to save a version with the {\GAP} library pre-loaded. That saved version can then be used to start up {\ParGAP}. This is not currently possible, because {\ParGAP} needs to get at the command line of {\GAP} before the {\GAP} kernel sees it. Comments and contributions to a {\ParGAP} user library, or any other type of assistance, are gratefully accepted. Gene Cooperman \Mailto{gene@ccs.neu.edu} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %E