Sophie

Sophie

distrib > Fedora > 18 > i386 > by-pkgid > c7bcde103eb66d4f5145472fdbc08882 > files > 13

pdsh-2.31-1.fc18.i686.rpm

Running Elan MPI jobs on QsNet Clusters
---------------------------------------

If built with --with-qshell or --with-mqshell, pdsh may be used to run
MPI jobs on a QsNet interconnect.  This option requires that you have
the Elan user space libraries installed (qsnetlibs or qswelan-r RPM for
Linux) and that your kernel be patched to run the 'elan3' or 'elan4' and
'rms' device drivers. Pdsh can run independently of the RMS product (the
'rms' kernel module, which is used by pdsh, is a distinct facility from
the RMS product).

Quadrics has provided a PDSH FAQ which may answer some common questions
about getting the qshell module to run MPI jobs. Please see

  http://web1.quadrics.com/twiki/bin/view/FAQs/SetupPDSH


rms pdsh module
---------------------------------------
Pdsh can also be run via the Quadrics RMS 'allocate' command such that
allocate takes care of the node reservations and passes a batch ID through
to pdsh via the RMS_RESOURCEID environment variable.  Pdsh retrieves
the list of allocated nodes out of the RMS database using the rmsquery
command. This functionality is provided by the "rms" pdsh module
(--with-rms).

slurm pdsh module
---------------------------------------
Similar to the rms pdsh module, the slurm module allows pdsh to target
nodes based on SLURM allocations, either targetting an already running job
or by running under ``srun --allocate'' The SLURM jobid can be passed to
pdsh using the `-j' option provided by the module, or via the SLURM_JOBID
environment variable, which is set by --allocate.


The `/etc/elanhosts' config file
---------------------------------------
Pdsh uses a simple config file, /etc/elanhosts, to describe hosts
containing Elan adapters (and on which Elan MPI jobs may be run). The
config file is also used by the daemons qshd and mqshd to initialize
the Elan network error resolver thread. Parsing of the /etc/elanhosts
file is accomplished by using the libelanhosts(3) library, upon which
pdsh depends (when building for QsNet). See the elanhosts(5)
man page for a description of the /etc/elanhosts file format.

The libelanhosts package may be obtained from

  ftp://ftp.llnl.gov/pub/linux/libelanhosts/