CHPOX - CHeckPOinter for linuX ============================== Copyright (C) 2002, Olexander O. Sudakov <csaa@mail.univ.kiev.ua> Copyright (C) 2002, Eugeniy S. Meshcheryakov <eugen@univ.kiev.ua> This program licensed under the GNU General Public License. See COPYING for more details. ----------------------------- CHPOX is a set of the Linux kernel modules for transparent dumping of specified processes into disk file and restarting ones. FEATURES ======== Dumping of one process or process with all it's children into disk file. Restarting of process or group of processes from file. CHPOX supports dumping of virtual memory, regular files, terminal state, current working directory, pipes, Unix sockets, multiple non-interacting processes. It does not crush on openMosix [3], it is SMP safe. It works as a kernel module, so you does not need recompilation of the Linux kernel. INSTALLATION ============ Before installation you must have configured and compiled kernel sources and System.map file for it. 1. Unzip and untar archive with CHPOX sources: tar -xzf chpox-<version>.tar.gz 2. 'cd' to the directory containing the source codes and run './configure' script: cd chpox-<version> ./configure You can pass path to Linux sources to 'configure': ./configure --with-linux=/path/to/kernel/sources If there is not System.map file in Linux directory you must specify path to it: ./configure --with-sysmap=/path/to/System.map See output of ./configure --help for other configure options. 3. Run 'make', 'make install' and 'depmod -ae': make make install depmod -ae !!!WARNING: Compile CHPOX with the same compiler the kernel was compile. !!!WARNING: Recompile CHPOX after recompiling kernel USING ===== Before starting checkpointing and restoring processes you must load CHPOX module: modprobe chpox_mod or insmod chpox_mod Checkpointing of processes is controlled by proc interface. You can register or unregister process(es) by writing string <pid>:<signal>:<arg>:<dump file> into file '/proc/chpox/register'. This will cause registration or unregistering of process (possibly with all it's child processes) with identifier <pid>. Passing <arg> == 0 unregisters process with given <pid> (if <pid> is 0 it unregisters all registered processes). If <arg> isn't 0 it registers given process (or group of processes). Dump file name is created as <dump file>. The meaning of <arg>: - if bit 2 is set then executable file is included into dump. It may be useful to transfer processes between different machines with the same shared libraries. - if bit 3 is set then shared libraries registered will be included into dump file. - if bit 4 is set then CHPOX will checkpoint all child processes of given one. Examples: Registration of process with PID 1234 for checkpointing with signal 31 (SIGSYS): echo "1234:31:1:/tmp/proc.dump" > /proc/chpox/register Registration of process with all it's child processes: echo "1234:31:9:/tmp/proc.dump" > /proc/chpox/register Unregistering of one process: echo "1234:0:0:" > /proc/chpox/register Unregistering of all registered processes: echo "0:0:0:" > /proc/chpox/register After registering of process you can checkpoint it as many times as necessary by sending signal to it: kill -31 1234 File "/proc/chpox/info" holds information about registered processes in form: <pid>:<signal>: [<flag>|<number of checkpoints>] -> <filename> [<last checkpoint time>] Flag is one of: C - new entry S - checkpoint is in progress O - last request was processed correctly E - last request caused an error Example: 2108:31:9 [O|1] -> /tmp/proc.dump [1040130768.799984] File "/proc/chpox/libs" holds information about libraries registered for including into processes which use them (but see also `chpoxctl' program). You can add libraries to list by writing string in form "+<library file name>" to this file: echo "+/lib/ld-linux.so.2" > /proc/chpox/libs To remove library from the list write string "-<library file name>": echo "-/lib/ld-linux.so.2" > /proc/chpox/libs You can clean list of libraries by writing string "-": echo "-" > /proc/chpox/libs Any manipulations with libraries list require root rights. File "/proc/chpox/version" contains version number of chpox module. Another way to control CHPOX is user-level executable called `chpoxctl' which uses ioctl interface for chpox. See output of command chpoxctl --help for details. Examples: Registration of process with PID 1234 for checkpointing with signal 31 (SIGSYS): chpoxctl add 1234 31 1 /tmp/proc.dump Registration of process with all it's child processes: chpoxctl add 1234 31 9 /tmp/proc.dump Unregistering of one process: chpoxctl del 1234 Unregistering of all registered processes: chpoxctl clear Adding library to list: chpoxctl addlib /lib/ld-linux.so.2 Removing one library from list: chpoxctl dellib /lib/ld-linux.so.2 Removing all libraries from list: chpoxctl clearlibs Displaying list of registered libraries: chpoxctl liblist RESTORING OF PROCESSES ====================== For restoring of checkpointed process you must execute program `ld-chpox' with loaded chpox module: ld-chpox /tmp/proc.dump See output of command ld-chpox --help for details. See also next section. REGISTERING CHPOX FILE FORMAT AS EXECUTABLE =========================================== You will need to enable misc binary format support in Linux kernel (CONFIG_BINFMT_MISC option). In order to run chpox dumps as a executable file you can register dump format as misc binary format. In Debian system you can install package `binfmt-support' and execute following command as root: update-binfmts --install chpox /usr/local/bin/ld-chpox --magic "CHPOX" On other systems you can register chpox format by writing string ":chpox:M:0:CHPOX::/usr/local/bin/ld-chpox:" into file "/proc/sys/fs/binfmt_misc/register". But in that case you will need to do this after each reboot. EXAMINING CHPOX DUMP FILES ========================== To see information about chpox dump file you can use file(1) program. Execute command: file -m chpox.magic <dump-file> and you will see folowing information about the dump file: chpox file format version, architecture for which dump was created, whether file is complete or corrupted during checkpoint and number of child processes dumped. File chpox.magic is included into the chpox distribution. Alternatively you can merge it with /etc/magic file. OPERATION DETAILS ================= Registration of process for checkpoint will block the specified signal of specified process. Blocking of signal is doing by setting the notifier function that returns 1 and cleans queue of signals before returning. When specified signal is sent to process notifier is executed in the context of process being checkpointed. During execution of notifier, VMA dump and dump of files information are executed. If MOSIX is configured process returns home before dumping. This kind of operation is similar to EPCKPT [1]. EPCKPT is very powerful (supports almost all except sockets) but it needs patching of kernel and thus does not work with MOSIX. CRAK tries to stop the process before checkpointing and after that operates with its structures. CRAK does not designed for SMP (at least version for Linux-2.4.4) but it is declared that it supports sockets (not in version for Linux-2.4.4). AUTHORS ======= Olexander O. Sudakov <csaa@univ.kiev.ua> Eugeniy S. Meshcheryakov <eugen@univ.kiev.ua> CHPOX based on VMADUMP by Erik Hendriks <erik@hendriks.cx> EPCKPT by Eduardo Pinheiro CRAK by Hua Zhong <huaz@cs.columbia.edu> VERSION ======= This is version 0.6-2 beta. It is tested with Linux kernel version 2.4.21 with openMosix patch, version 2.4.22 without openMosix patch, and with Linux kernel version 2.4.22 on PowerPC. BUGS ==== This version does not work correctly with interactive programs. Probably many?.. SUPPORTED ARCHITECTURES ======================= This version of CHPOX tested on machines with i386 and PowerPC (G3) architecture. But VMADUMP module have code for supporting Sparc and Alpha architectures. For Sparc dumping/restoring of FPU state is not supported. If you succeed using CHPOX on machines with Sparc or Alpha architectures, please be kind to inform us about that. Also inform us if you had troubles compiling or using CHPOX. SOCKETS SUPPORT =============== Chpox supports connected and listening stream Unix (or local) sockets. It yet does not supports Internet sockets. Chpox does not saves socket flags. While restoring chpox creates connected sockets using socketpair system call. It restores listening socket with name it has during checkpointing if possible. If it is not possible chpox tries to rename socket by changing last letter of the file name. If that fails chpox creates socket in abstract namespace. So if you want to restore listening socket with old name remove old socket file first. TODO ==== Support for Internet sockets, shared memory, System V IPC, processes with multiple threads. Better integration with openMosix. LINKS ===== 1. http://www.chechpointing.org/ Checkpointing.org: The home to checkpointing packages 2. http://www.beowulf.org/software/bproc.html BPROC: Beowulf Distributed Process Space 3. http://www.openmosix.org/ The openMosix Project