Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > 211238da6d926d1ca4390483bb29f586 > files > 29

coda-doc-5.2.0-4mdk.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>Coda File System  User and System Administrators Manual: The Backup System                              </TITLE>
 <LINK HREF="manual-13.html" REL=next>
 <LINK HREF="manual-11.html" REL=previous>
 <LINK HREF="manual.html#toc12" REL=contents>
</HEAD>
<BODY>
<A HREF="manual-13.html">Next</A>
<A HREF="manual-11.html">Previous</A>
<A HREF="manual.html#toc12">Contents</A>
<HR>
<H2><A NAME="Backup"></A> <A NAME="s12">12. The Backup System                              </A></H2>

<P>
<P>
<H2><A NAME="ss12.1">12.1 Introduction: Design of the Coda Backup Subsystem</A>
</H2>

<P>
<P>As the use of the Coda file system increased, the need for a reliable
backup storage system with a large capacity and a mininal loss of service
became apparent.  A one operation backup system was determined to be
infeasible given the volume of data in Coda, the nature of a
distributed filesystem, and the long downtime that would normaly be required
to backup the system in one operation.
<P>In order to meet the goals of high availability and reliability inherient in Coda
design and to make efficient use of backup hardware and materials, the volume
was chosen as the unit of data, and 24 hours was choosen as the time unit
for system management and administration.  The result of these design 
considerations is a volume by volume backup mechanism that occurs in
three phases:
<P><EM>cloning</EM>
<P>The cloning phase consists of freezing the (replicated) volume, creating a
readonly clone of each of the replicas, and then unfreezing the volume.
This allows mutating operations on the replicated volume to occur while
maintaining a snapshot to backup.  Once the cloning phase has been completed,
normal read-write services can be resumed without fear of data corruption
due to mutating operations on an active file system.
<P><EM>dumping to diskfiles on a backup spool machine</EM>
<P>
<P>The dumping phase consists of converting the read-only volume clones
to disk images stored as regular disk files, on a spool machine.  A
dump can either be full, in which all files are dumped; or
incremental, in which only those files or directories which have
changed since the last successful backup are included in the dump.
This allows for a system in which only a subset of volumes need a full
backup at anyone time (with incrementals done between full backups),
thus reducing the amount of offline storage and network bandwidth
needed at any one time.  However, it allows the re-creation of data a
granularity of 24 hours when combined with incremental dumps.
Incremental dumps, however, are only supported for replicated volumes.
Since there is little need for non-replicated volumes, only full dumps
are supported for non-replicated volumes.
<P>
<P><EM>saving to media</EM>
<P>The last phase consists of writing all the dump files from the backed up
volumes dumped on local partitions to an archival media such as tape.  Any
standard backup system can be used for this phase.  At CMU, we use the BSD
dump and restore utilities to write and retrieve the disk images of Coda
volumes to tape.
<P>Practically, this system has been implemented as series of tasks The first two tasks are carried out by the backup program the latter
by a Coda independent perl script (tape.pl).
<P>
<P>
<UL>
<LI>Saving:
<OL>
<LI>create a readonly clone<P>
</LI>
<LI>dumping the readonly clone to a local disk<P>
</LI>
<LI>backing up the dumped data to a suitable archive media<P>
</LI>
</OL>
</LI>
<LI>Restoring:
<OL>
<LI>Retrieving appropriate full and incremental dumps from the archive media.<P>
</LI>
<LI>Merging the full and incremental dumps to the time line of restoration.<P>
</LI>
<LI>restoring the fully intregrated backup to the Coda file system.<P> 
</LI>
</OL>
<P>
</LI>
</UL>
<P>Remeber, in practice, many restores are a result of a user accidently
deleting or corrupting thier own files.  In this case, users may use
the the <CODE>cfs</CODE> mechanism to retrive files for the last 24 hour
time line.  For example:
<P><CODE>cfs mkmount OldFiles u.hmpierce.0.backup</CODE>
<P>will mount the hmpierce's user backup volume from replica 0 to the
OldFiles mount point.  The file can then be copied out (backup volumes
are read-only). Only if restoration needs are older than 24-hours or some
chatastrophic event outside of the users control occurs do restores
from tape normally need be done
<P>Several tools have been developed to help in the creation, analysis, and
restoration of data backups.  Some of these tools have been developed by
the Coda team (those tools concerning Coda FS to local disk conversion) such
as <CODE>backup</CODE> and <CODE>tape.pl</CODE> (used to coordinate the efforts
of <CODE>backup</CODE>, <CODE>dump</CODE>, etc),
others employ off-the-shelf software such as the traditional UNIX
<B>dump</B> or <B>tar</B> to tranfer the disk images created from
the dump
phase to the backup media.  Coda, however, provides a perl script frontend
to <B>dump</B>.
<P>
<H2><A NAME="ss12.2">12.2 Installing a Coda Backup Coordinator Machine</A>
</H2>

<P>
<P>The Coda Backup coordinator should be a trusted machine.  It should be able
get all the files that exist in the <CODE>/vice</CODE> subtree on the servers,
although it is not necessary to run a fileserver on the backup coordinator
(nor is it recommended that the backup machine be a fileserver).
<P>Assuming the Backup Coodinator has been setup with an appropriate
operating system, the steps are as follows:
<P><B>NEEDS SECTION RECOMMENDING DATA/LOCAL DISK SPACE RATIO</B>
<P>On the Coda File Server designated as the SCM, create a file called 
<CODE>/vice/db/dumplist</CODE>.  The dumplist contains three fields:
volume id specified as its hex value, the full/incremental backup schedule,
and a comment which is generally the human readable volume name.  
For example:
<P>
<PRE>
7f000001        IFIIIII         s:coda
7f000002        IIIIIFI         u:satya
</PRE>
<P>The first column specifies the volume id to be backed up, the second
column specifies the backup schedule by the day of the week begining
Sunday, and the third column is a comment, ussually the volume name
in human readable form.  So, the volume id 7f000001 is scheduled for a 
Full backup every Monday, and incrementals Tuesday through Sunday and
from the comment, we know this is a system volume called "coda".  Likewise,
the second volume 7f000002 is scheduled for a Full backup on Friday with
incrementals being done Saturday through Thursday and is a user volume
called "satya".
<P>On the SCM, modify <CODE>/vice/db/vicetab</CODE> to indicate
which host is acting as the backup coordinator and which partitions 
on the backup server are to be used by the backup coordinator to store
the dump files.  On a tripily replicated system, <CODE>vicetab</CODE> might
look like this:
<P>
<BLOCKQUOTE><CODE>
<PRE>
tye             /vicepa         ftree  width=8,depth=5
taverner        /vicepa         ftree  width=8,depth=5
tallis          /vicepa         ftree  width=8,depth=5
dvorak          /backup1        backup
dvorak          /backup2        backup
dvorak          /backup3        backup
</PRE>
</CODE></BLOCKQUOTE>
<P><CODE>vicetab</CODE>, in addition to listing information on the servers providing
replicated data, must also include information on the backup coordinator with 
backup coordinator's name in the first column, the backup partitions in the
second column, and the designation "backup" in the third column.  The 4th
column is not used for the backup sub-system.  Please see the 
<EM>vicetab(5) </EM> man page for additional information.
<P>Note: that the number of partitions available for dumping may be
controlled by the system administrator.  Because the volume of data
may be both large and variable, the <B>backup</B> program intelligently
decides where to store individual dump files based on size accross the
specified backup partitions.  The directories in the sample <B>vicetab</B>,
are assumed to be seperate local disk partitions.  An organized central
symbolic link tree is created by the backup.sh script in the directory
<CODE>/backup</CODE> that points to the actual files scattered accross the
<CODE>/backup1, /backup2, and /backup3</CODE> given in this example.
<P>
<OL>
<LI>On the Backup Coordinator, the backup binaries and shell scripts need
to be installed.  The following directories should be have been created
under <CODE>/vice</CODE> upon the installation of the Coda backup package:</LI>
</OL>
<P>
<BLOCKQUOTE><CODE>
<PRE>
/backup
/vice/backup
/vice/backuplogs
/vice/db
/vice/vol    
/vice/lib
/vice/bin
/vice/spool
/vice/srv
</PRE>
</CODE></BLOCKQUOTE>
<P>In addition, the file <CODE>/vice/UpdateMonitor</CODE> should be created once
the update monitor is run for the first time.  The primary binaries
that should be installed under /vice/bin to get started are:
<P>
<BLOCKQUOTE><CODE>
<PRE>
backup
backup.sh
bldvldb.sh
merge
updateclnt
updatesrv
updfetch
tape.pl
</PRE>
</CODE></BLOCKQUOTE>
<P>Once it has been verified that the backup system is installed, the files
<P>
<BLOCKQUOTE><CODE>
<PRE>
/vice/db/hosts 
/vice/db/files
</PRE>
</CODE></BLOCKQUOTE>
<P>on the SCM should be manually copied to the same location on 
the Backup Coordinator.  These are needed the first time
by the <CODE>updateclt</CODE> daemon the when it runs.  Also, Coda currently relies
on the BSD dump and restore command to manipulate the tapes.  A copy of dump
package should be installed on the backup coordinator.  BSD dump is available
for all UNIX and UNIX like operating systems we have sucesfully run Coda on.
Please check with your OS Vendor if you need help obtaining a copy.
<P>===============================
Upon completion <B>backup</B> will print which
volumes were successfully backed up, the volumes on which backup failed,
and those volumes which were not specified for backup.
<P>The <B>merge</B> program allows a system administrator to update the
state seen
in a full dump by the partial state in an incremental dump.  This is useful when
a user wishes to restore to a state that was captured by a full and some number
of incremental dumps
(For instance, in the middle of the week) The <B>merge</B>
program applies an incremental to a full dump, producing a new full dump file.
<P>An incremental is a partial snapshot <EM>with respect to</EM> the previous 
dump.
The Coda backup facility maintains an order on dumps for a volume.  The merge
program will only allow an incremental to be applied to its predecessor in
the order.  This predecessor may be a full dump or the output of the
<B>merge</B> program.
<P>Once the administrator has created or retrieved the full dump which contains
the desired state of a volume, she can create a read-only copy of that state
by using the <B>volutil restore</B> facility.  This <B>volutil</B> command 
creates a new read-only volume on a server.  The new volume can
be mounted as any other Coda volume.  Regular Unix file operations can
then be used to extract the desired old data.  The obvious exception is that 
mutating options will fail on files in a readonly volume.
<P>
<H2><A NAME="ss12.3">12.3 Incremental Dumps</A>
</H2>

<P>
<P>In every dump (full or incremental) produces a file
containing the version vectors and <EM>StoreIds</EM> of every vnode in the volume.  These files
have names of the form <CODE>/vice/backup/&lt;groupid&gt;.&lt;volid&gt;.newlist</CODE> for replicated
volumes and <CODE>/vice/backup/&lt;volid&gt;.newlist</CODE> for non-replicated volumes.  When
the backup coordinator is convinced that the backup of a volume has completed,
the *.newlist file is renamed to be *.ancient via the <B>volutil ancient</B> call.  
These files are stored in a human-readable format for convenience.
<P>When creating an incremental dump, the server looks for the .ancient file
corresponding to the volume.  If it doesnt exist, a full dump is created.  If
it does exist, it is used to determine which files have changed since the last
successful backup.  The server iterates through the vnode lists for the volume
and the version vector lists from the <EM>*.ancient</EM> file comparing the
version vectors and storeIds.  A discrepancy between the two implies that the
file has changed and should be included in the incremental dump.
Since version vectors are not maintained for non-replicated volumes, incremental
dumps are not supported by the coda backup facility.
<P>By comparing the sequence numbers in the vnodes lists, it
can also be determined if a file or directory had been deleted (since the
vnode is no longer in use).  Vnodes that are freed and then reallocated between
dumps look like vnodes which have been modified, and so are safely included
in the incremental dump.
<P>It is also important to maintain an ordering on the incremental dumps.  To
correctly restore to a particular day each
incremental dump must be applied to the appropriate full dump.  To ensure that
this happens, each dump is labled with a uniquifier, and each incremental
is labeled with the uniquifier of the dump with respect to which it is taken.
During merge, the full dumps uniquifier is compared with the uniquifier
of the dump used to create the incremental.  If they do not match, the
incremental should not be applied to the full dump.  
<P>
<H2><A NAME="ss12.4">12.4 Tape files</A>
</H2>

<P>Once the dump files have been created, they must be written to tape.
This is due to the fact that disk space is usually a limited
commodity.  The basic mechanism for the writing is the unix <B>tar (1)</B> 
facility.  
<P>Each tape contains a series of tar files, the first and last of which are labels.
The start and end labels are indentical, and 
contain version information, the date the backup was taken,
and an index which maps
individual dump files into offsets into the tape.
Thus the Coda backup tapes are self identifying for easy sanity checks.
The label is a tar file which only contains a simple unix file called
<CODE>TAPELABEL</CODE>. 
<P>The dump files are first sequenced by size.  They are then broken down into groups,
where the total size of the group must be larger than a certain size, 
currently .5 Megabytes.  Each group is stored in a single tar file on the tape.
These
data tar files are the 2nd through n-1st records on the tape, the first and nth
being a tar files containing just the tape label.
<P>This structure was chosen for several reasons.  The first is that it is easy to
implement.  Tar has been used for many years, and has been proven to be reliable.
The second is easy access of information on the tape.  Using a single monolithic
tar file would often require hours of waiting to retrieve a single dump file.
This way you can skip over most of the data using <B>mt (1)</B> and its fast-forward
facility.  Finally, it provides a simple and effective end-to-end check to validate
that all the information has made it to tape.
<P>At CMU, we have created a convention for capturing sufficient information
for reliability, while trying to avoid excess use of tapes.  Full backups are taken
once a week.  However, since our staging disks are not large enough to hold full 
dumps for all the replicas of all the volumes, we stagger the full
backups across the week.
<P>There are three kinds of requests for restorations: users who have
mistakenly trashed a file, users who lost data but didnt know it,
or bugs which require us to roll back to a substantially earlier
state.  The first class of restores can be typically handled by
yesterdays state, which we keep on-line in the form of read-only backup clones.
Thus almost all forms of requests never reach the system administrator
at all.  To give users easy access to the previous days backup,
create a directory, OldFiles, in their coda directory, and mount each
of the backups in the OldFiles directory.  
<P>
<P>If the user didnt catch the loss of data immediately, its reasonable to expect
that they will catch it before a week has passed.
We keep all incrementals and fulls to guarantee we can restore state
from any day in the last week.  This requires 14 tapes, or two weekly sets.
One weeks
worth is not sufficient, because state from later incrementals relies on earlier 
incrementals in order to be restored.  Thus as soon as the first incremental tape 
is over-written (say Mondays), the state from the remainder of the last week is
lost (last Tuesdays, Wednesdays, etc).
<P>The third class of data loss is either due to infrequently used files or to 
catastrophy.  (Weve actually been forced to rely on the backup system to restore
<EM>all Coda state due to major bugs in the servers</EM>).
Since its unreasonable to keep
all the tapes around, we only save tapes containing full dumps.  Weekly tapes are
saved for a month, and monthly tapes are saved for eternity.
<P>
<H2><A NAME="ss12.5">12.5 Restoring a backup clone</A>
</H2>

<P>A basic assumption of performing backups is that eventually someone will need
to restore old state of a volume.  To do this they should contact the system
administrator, specifying the volume (groupid and repid for replicated volumes
or just the volid for non-replicated volumes) and the date of the state they
wish to restore.
<P>The system administrator must then determine which dump files contain the state.
There could be more than one involved since the state may have been captured
by a full and some incremental dumps.  Once the administrator knows the dates
of the backups involved, she must get the appropriate tapes and extract the
dump files (via the <B>extract.sh script)</B>.
<P>The administrator then creates the full state to be restored by iteratively
applying the incrementals to the full state via the <B>merge</B> program.  Once the
state for the date in question has been restored, a read-only clone is created
by choosing a server to hold the clone, and invoking the <B>volutil restore</B>
operation, directing the call to the chosen server.  Once the clone has been
restored, the administrator should build a new VLDB, and mount the volume
in the Coda name space so the
user can access it.  When the user has finished with it, she should notify the
administrator in order for the clone to be purged.
<P>
<H2><A NAME="ss12.6">12.6 Backup Scripts</A>
</H2>

<P>Although the <B>backup</B> program handles all the tricky details involved in Coda 
backup, there still remains some issues to be handled, most notably the saving
of the dump files to tape.  This is done by a series of scripts, <B>backup.sh</B>, 
<B>writetotape.sh</B>, and <B>checktape.sh</B>.  The job of extracting dump files from
tape is handled by <B>extract.sh</B>.
<P><B>backup.sh</B> takes the name of the directory in which to run backups.  It creates
a subdirectory whos name indicates the date that backup was run.  It then runs
the <B>backup</B> program, using the dumplist file in the directory specified in
the arguments, saving the output of <B>backup</B> in a logfile in the newly created
subdirectory.
It copies in the current Coda databases (so they will be saved to tape
along with the dump files.)
It then invokes <B>writetotape.sh</B> and <B>checktape.sh</B>
to write and verify that the files have been safely recoreded.
<P><B>writetotape.sh</B> performs the work of saving the files on tape.  It takes
the directory in which the backup was taken (the subdirectory generated by <B>backup.sh)</B>, and the device name of the tape drive.  It first checks
to see that the tape to be used is either empty or has the correct label.  For 
Coda at CMU this means checking that the tape was last used on the same day of
the week.
It then gathers the dump files and databases into groups and generates the tape
label for this backup.  Finally it writes the tape label and all the groups to
the tape via the <B>tar (1)</B> facility, marking the end of the tape with another
copy
of the tape label.
<P><B>checktape.sh</B> verifies that <B>writetotape.sh</B> did its job correctly.  Like
<B>writetotape.sh</B>, it takes the backup directory and the name of the tape drive
as input.  It first reads off the tape label, comparing it with one stored
in the backup directory.  It then scans all the data tar files, comparing their
actual contents with what it expected, and finally reads the tape label at the
end, 
comparing it with the saved value.
<P><B>extract.sh</B> is used to extract a dump file from a Coda backup tape.  
It takes the name of the tape device, the date the backup was taken, and the 
identifier of the volume to be restored.  The date should be specified in the 
form DDMMMYYYY, as in 10Feb1992.  Volume identifiers are "groupid.repid" for 
replicated volumes and "volid" for non-replicated volumes.
<B>extract.sh</B> will locate the correct group by reading the
tape label, fast-forward the tape to the correct tape file, and extract the 
dump file for the volume.
<HR>
<A HREF="manual-13.html">Next</A>
<A HREF="manual-11.html">Previous</A>
<A HREF="manual.html#toc12">Contents</A>
</BODY>
</HTML>