nettee (NETwork TEE) Version 0.1.6 12 MAY 2005 David Mathog <mathog@caltech.edu> License: GPL 2 [ nettee is derived from dolly 0.58C by Felix Rauch <rauch@inf.ethz.ch>] nettee is a network version of the Unix "tee" program. See the man page nettee.1 for more info. To compile, use gcc -Wall -D_LARGEFILE64_SOURCE -o nettee nettee.c Returns status is "EXIT_SUCCESS" if there are no errors and "EXIT_FAILURE" if any node fails in a manner that cannot be handled. The pdist*.sh scripts are example wrappers that use nettee. For instance: pdist_file.sh pdist_file.sh \ /usr/common/tmp/thelist.txt \ /usr/common/bin/pdist_store.sh \ /tmp/foobar.txt <big_input_file would copy big_input_file to every node listed in thelist.txt. pdist_shell.sh \ /usr/common/tmp/thelist.txt \ /tmp/cmdfifo sets up a fifo (cmdfifo) to which commands may be written. From there they will be passed to all other nodes on the chain and executed there more or less in parallel. Terminate by sending "EOS" to the fifo. The environmental variable NEXTNODE if included in such a command will be translated correctly for each node. Be sure to escape $ as needed in the command strings!. On a linux 2.6.8 test system, one master, 20 slaves in chain, 100baseT switched network, this command accudate ; echo "accudate" >>/tmp/cmdfifo indicates that there is about a .007s delay before the first node executes and .009s delay before the last node executes. The test nodes were synched with ntp to the master node which issued the above command. This is a more efficient way to use nettee to distribute small files. Ie: echo "nettee -next \$NEXTNODE -out wherever" >>/tmp/cmdfifo nettee -in smallfile -next $FIRSTNODEINCHAIN only suffers about .01 s delay in setting up the chain. Conversely using pdist_file.sh to distribute small files (one at a time) is slow due to the rsh setup overhead. execinput, accudate, and extract, which are used by these scripts, are available in source code here: ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/ Instructions for using nettee with SystemImager are in the files SI_METHOD1.TXT and SI_METHOD2.TXT. WARNING: On AMD Athlon processors (the 32 bit ones, not the Athlon64) the "athcool" program, if active, will degrade throughput significantly wherever nettee must read from the net and then write to disk. When transferring large files on these machines it is best to first do "athcool off", then transfer, then "athcool on". Please send comments to the email address above. Change log: 1.8 Added -connf flag and failovers for -next. Modified formatting of messages slightly. 1.7 Minor change to get a clean compilation on solaris. 1.6 Added nettee.spec file from dag wieers. RPM page for nettee for OS's that are maintained there are: http://dag.wieers.com/packages/nettee/ Modified error handling on bad count. Previously if node K+1 returned a bad byte count node K would return the bad byte count, and the error condition would be propagated upward in that manner. In this version node K returns its own byte count but sets an error flag that propagates upward instead. Also the CONWF, COLWF, etc error announcements now include the name of the preceding node if that node generated the error. The general idea being to allow the distribution chain to continue functioning when certain types of errors occur on individual nodes. Theoretically these error messages could be piped into something like "socket" or "logger" and then logged to a central server. Without that they would need to be stored some place like /var/run/nettee_messages.txt on each node and processed in a post mortem. If -v 1 is used and no errors occur this would generate no traffic, so it shouldn't interfere with throughput if everything is working. Added changes to handle compilation warnings noticed by Tru Huynh on an X86_64 machine. Fixed a bug in beowulf.master which sometimes resulted in the end node not being able to set its HOSTNAME.