Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > 211238da6d926d1ca4390483bb29f586 > files > 84

coda-doc-5.2.0-4mdk.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>The Coda File Server: Volumes </TITLE>
 <LINK HREF="server-3.html" REL=next>
 <LINK HREF="server-1.html" REL=previous>
 <LINK HREF="server.html#toc2" REL=contents>
</HEAD>
<BODY>
<A HREF="server-3.html">Next</A>
<A HREF="server-1.html">Previous</A>
<A HREF="server.html#toc2">Contents</A>
<HR>
<H2><A NAME="s2">2. Volumes </A></H2>

<P>
<P>
<H2><A NAME="ss2.1">2.1 Volume Data Structures</A>
</H2>

<P>
<P>When a server is initialized during its first run, a structure at a
fixed RVM address is initialized, from which all data access is
initiated. The structure is of type <EM>coda_recoverable_segment</EM>
defined in <CODE>coda_globals.h</CODE>.  For volume handling it holds the
following information:
<P>
<OL>
<LI> an array <EM>VolumeList</EM> of size <EM>MAXVOLS</EM> of <EM>VolHead</EM>
structures.  
</LI>
<LI> a constant <EM>MaxVolId</EM> holding the maximum allocated volume
id. This constant is overloaded and holds the constant
<EM>ThisServerId</EM> in its top 8 bits.</LI>
</OL>
<P>We will now explore how the <EM>VolHead</EM> structures lead to other
volume structures.  There are many data structures linking these
things together:
<P>
<DL>
<P>
<DT><B>VolHead:</B><DD><P>defined in <CODE>camprivate.h</CODE>. An RVM structure
containing a structure <EM>VolumeHeader</EM> and <EM>VolumeData</EM>. 
<P>
<DT><B>VolumeData:</B><DD><P>defined in <CODE>camprivate.h</CODE>. An RVM structure,
points to <EM>VolumeDiskData</EM> and points to <EM>SmallVnodeLists</EM> (with
some contstants related to these, and to <EM>BigVnodeLists</EM>.
<P>
<DT><B>VolumeHeader:</B><DD><P>defined in <CODE>volume.h</CODE>. Contains the
<EM>volumeid, type</EM> and <EM>parentid.</EM>
<P>
<DT><B>VolumeDiskData:</B><DD><P>defined in <EM>volume.h</EM>. An RVM structure
holding the persistent data associated with the volume.
<P>
</DL>
<P>These RVM structures are copied into VM when a volume is accessed - we
will describe this in detail.  In VM we have a hash table
<EM>VolumeHashTable</EM> for <EM>Volume</EM> structures, with hash key the
<EM>VolumeId</EM>.  This is used in conjunction with a doubly linked list
<EM>volumeLRU</EM> of <EM>volHeader</EM> structures, which was probably
created to avoid keeping all <EM>volHeader</EM>'s in VM, since the latter
are large.
<P>
<DL>
<DT><B>Volume:</B><DD><P>defined in <CODE>volume.h</CODE>. This structure is the
principal access point to a volume.  These VM structures are held in a
hash table. It contains quite a lot of information, such as a pointer to
a <EM>volHeader</EM> (which has the cached RVM data), Device, partition,
vol_index, vnodeIndex, locks and other run time data.
<P>
<DT><B>volHeader:</B><DD><P>defined in <CODE>volume.h</CODE>.  A VM structure sitting on
a dlist, with a backpointer to a <EM>Volume</EM>. Contains a
<EM>VolumeDiskData</EM> structure. This is the VM cached copy of the RVM
<EM>VolumeDiskData</EM> structure.
</DL>
<P>Notice that in RVM volume's are identified principally by their index,
which is the index in the static <EM>VolumeList</EM> array of <EM>volHead</EM>
structures. Otherwise volumes are mostly accessed by their volume id.
To map an index to a volumeid proceeds through the <EM>VolumeHeader</EM>
structures held in the <EM>VolumeList</EM> inside the <EM>volHead</EM>
structures.   
<P>The reverse mapping, to get an index from a <EM>VolumeId</EM> is done
through an auxiliary hashtable <EM>VolTable</EM> of type <EM>vhashtab</EM>,
defined in <CODE>volhash.cc</CODE>. 
<P>It is informative to know the sizes of all these structures:
<UL>
<LI> VolumeDiskData: 636</LI>
<LI> VolHead: 88</LI>
<LI> volHeader: 648</LI>
<LI> VolumeHeader: 20</LI>
<LI> Volume: 96</LI>
</UL>
<P>
<P>
<H2><A NAME="ss2.2">2.2 Initializing the volume package </A>
</H2>

<P>
<P>The VInitVolumePackage sets up a lot of other structures related to
volumes and vnodes. 
<DL>
<P>
<DT><B>InitLRU:</B><DD><P>calloc a sequence of (normally 50) volHeader's, then
call ReleaseVolumeHeader to put it at the head of the volumeLRU.
<P>
<DT><B>InitVolTable:</B><DD><P>sets up the <EM>VolTable</EM>, hashing volid's to get
the index in the rvm <EM>VolumeList</EM>.
<P>
<DT><B>VolumeHashTable:</B><DD><P>Hash table used to lookup volumes by id. It
stores pointers to the Volume structure
<P>
<DT><B>VInitVnodes:</B><DD><P>setup of the vnode VM lru caches for small and
large vnodes. Store smmary information in the VnodeClassInfoArray for
both small and large vnodes. The way to reach allocated vnode arrays
is through the VnodeClassInfoArray.
<P>
<DT><B>InitLogStorage:</B><DD><P>go through the VolumeList and assign VM
resolution log memory for every volume; store the pointer in
VolLog[i]. The number of rlentries assigned is stored in:
<PRE>
VolumeList[i].data.volumeInfo->maxlogentries.
</PRE>
<P>It is not yet clear if RVM resolution is using this.
<P>
<DT><B>CheckVLDB:</B><DD><P>See if there is a VLDB. 
<P>
<DT><B>DP_Init</B><DD><P>Find server partitions.
<P>
<DT><B>S_VolSalvage:</B><DD><P>This analyzes all inodes on a Coda server
partition and matches data against directory contents. After this has
completed, the volume <EM>InUse</EM> bit and <EM>NeedsSalvage</EM> bit are
cleared (stored in the disk data).
<P>
<DT><B>FSYNC_fsInit:</B><DD><P>This interface is discussed below. 
<P>
<DT><B>Attach volumes</B><DD><P>Now iterate through all volumes and call
<EM>VAttachVolumeById</EM>. The details a far from clear, but the intent
is to add the information for each volume to the VM hash tables.
<P>
<DT><B>VListVolumes</B><DD><P>write out the <CODE>/vice/vol/VolumeList</CODE> file
(VListVolumes).  
<P>
<DT><B>Vinit</B><DD><P>This variable is now set to 1.
</DL>
<P>
<H2><A NAME="ss2.3">2.3 Attaching volumes </A>
</H2>

<P>
<P>This code is sheer madness.
<P>Attaching volumes is the following process. First a <EM>VGetVolume</EM> is
done.  If this return a volume, it has found it in the hash table.  If
that volume is in use, it is detached with <EM>VDetachVolume</EM> (error
here is ignored).  Attaching is not supposed to happen with anything
in the hashtable already.
<P>Next the <EM>VolumeHeader</EM> is extracted from RVM.  If the program is
running as a <EM>volumeUtility</EM> then <EM>FSYNC_askfs</EM> is used to
request the volume.  We continue to chech the partition and call
<EM>attach2</EM>.
<P>Attach2 allocated a new <EM>Volume</EM> structure, initializes the locks
in this structure and fills in the partition information. A
<EM>VolHeader</EM> is found from the LRU list and the routine
<EM>VolDiskInfoById</EM> reads in the RVM data - this can only go wrong if
the volume id cannot be matched to a valid index.  If the needs
salvage field is set, we return the NULL pointer (leaving all memory
allocated). Attach2 continues by checking for a variety of bad
conditions, such as the volume apparently having crashed.  If it is
<EM>blessed</EM>, <EM>in service</EM> and the <EM>salvage</EM> flag is not set,
then we are ready to go and put the volume in the hash table. If the
volume is writable the bitmaps for vnodes are filled in.
<P><EM>DetachVolume</EM> is also very complicated.  It starts by taking the
volume out of the hash table (forcefully).  This means that
<EM>VGetVolume</EM> will no longer find it.
<P>The <EM>shuttingDown</EM> flag is set to one, and <EM>VPutVolume</EM> is
called. This frees the volume and <EM>LWP_signal</EM>s people waiting on
the <EM>VPutVolume</EM> condition. [It is not clear that any process is
waiting, since they normally only appear to wait when the
<EM>goingOffline</EM> flag is set, not when the <EM>shuttingDown</EM> flag is
set.]
<P>
<P>
<H2><A NAME="ss2.4">2.4 State of volumes </A>
</H2>

<P>
<P>There are numerous flags indicating the state of volumes:
<P>
<DL>
<P>
<DT><B>inUse</B><DD><P>Cleared by <EM>attach2</EM>, but only if volume is
<EM>blessed</EM>, not <EM>needsSalvaged</EM> and <EM>inService</EM>.
<P>
<DT><B>inService</B><DD><P>The latter flag has some use in the volume utility
package, not much else.
<P>
<P>
<DT><B>goingOffline</B><DD><P>Taking a volume <EM>offline</EM> (as in <EM>Voffline</EM>
means writing the volume header to disk, the <EM>inUse</EM> bit turned
off.  A copy is kept around in VM.
<P>Set by <EM>Voffline</EM> 
<DT><B>shuttingDown</B><DD><P>used in <EM>VDetachVolume</EM>.
<P>
<DT><B>goingOffline</B><DD><P>Set by <EM>VOffline</EM>. Clear by <EM>VPutVolume</EM>,
<EM>VForceOffline</EM> and <EM>attach2</EM>. In this case <EM>VPutVolume</EM> sets
<EM>inUse</EM> to 0, and <EM>VForceOffline</EM> does that and sets the
<EM>needsSalvaged</EM> flag.
<P>
<DT><B>blessed</B><DD><P>Heavily manipulated by volume utilities while creating
volumes, backing them up etc.  Probably indicates that the volume is
not in an internally consistent state.
<P>
<DT><B>needsSalvage</B><DD><P>This is an error condition. Cleared by
<EM>VolSalvage</EM>, set by <EM>VForceOffline</EM>.
<P>
</DL>
<P>
<H2><A NAME="ss2.5">2.5 The FSYNC interface </A>
</H2>

<P>
<P>Requests for volume operations, such as <EM>VGetVolume</EM> can come from: 
<P>
<UL>
<LI> The fileserver</LI>
<LI> A volume utility</LI>
<LI> The salvager</LI>
</UL>
<P>The FSYNC package allows a volume utility to register itself. The call
<EM>VConnectFS</EM> is made during <EM>VInitVolutil</EM> and makes available
an array for the calling thread, named <EM>OffLineVolumes</EM>. This array
is cleaned up during <EM>VDisconnectFS</EM> and contains a list of volumes
that have been requested to be offline.  
<P>The requests that can be made are <EM>FSYNC_ON</EM> to get a volume back
online.  This clears the spot in the OffLineVolumes, calls
<EM>VAttachVolume</EM> and finally writes out changes by calling
<EM>VPutVolume</EM>.
<P>The other request is <EM>FSYNC_OFF</EM> and <EM>FSYNC_NEEDVOLUME</EM>.  If the
volume is not yet in the threads <EM>OffLineVolumes</EM> then a spot is
found for the volume.  If this is not for cloning the volume is marked
as <EM>BUSY</EM> in the <EM>specialStatus</EM> field of the <EM>Volume</EM>
structure.  In <EM>VOffline</EM> the volume header is written to disk,
with the inUse bit turned off.  A copy of the header is maintained in
memory, however (which is why this is VOffline, not VDetach).
<P>The FSYNC package can also watch over relocation sites. This is not
functional anymore, and should probably be removed.
<P>
<H2><A NAME="ss2.6">2.6 Interface to the <EM>volumeLRU</EM> </A>
</H2>

<P>
<P>This doubly linked list of <EM>volHeader</EM>'s is one of the more
confusing areas in the code.  This is the interface between VM and RVM
data, and the invariants are not very clear.
<P>
<DL>
<P>
<DT><B>AvailVolumeHeader(struct Volume *)</B><DD><P>See if there is a
<EM>volHeader</EM> available for the volume passed.
<P>
<DT><B>GetVolumeHeader(struct Volume *)</B><DD><P>Assign a <EM>volHeader</EM> to the
<EM>Volume</EM> structure. This routine does <B>not</B> fill in the data. 
</DL>
<P>
<H2><A NAME="ss2.7">2.7 Interface to the <EM>VolumeHashTable</EM>. </A>
</H2>

<P>
<P>The when a volume is needed, a call is made to <EM>VGetVolume</EM>.  This
routine is given a <EM>volumeid</EM> to search for and finds the
<EM>Volume</EM> structure in the <EM>VolumeHashTable</EM>.  If this is not
found it returns <EM>VNOVOL</EM>.
<P>If there are users of this volume already, then we are guaranteed that
the <EM>VolumeHeader</EM> structure for this volume is available
too. Perhaps the <EM>header</EM> field in the volume structure still
exists, otherwise a <EM>volHeader</EM> must be found in the LRU, which
possibly involves writing an old one back to RVM.
<P>If the <EM>header</EM> field in the volume structure is not null we are
ok, if it is null we would like to use the first available header in
the <EM>volumeLRU</EM> list.  This is done by checking the <EM>back</EM>
pointer in the <EM>volHeader</EM>.
<P>
<P>
<H2><A NAME="ss2.8">2.8 Creating a volume </A>
</H2>

<P>
<P>The start of this is found in vol-create.cc in the volutil
directory. 
<OL>
<LI> We first initialize the volumeid with VAllocateVolumeId.</LI>
<LI> VCreateVolume (vol/vutil.cc) now build a VolumeDiskData
structure <EM> vol </EM> in VM.  The recoverable resolution log
vol.log is generated with recov_vol_log which does a RVMLIB_REC_MALLOC
to allocate a certain number of pointers to recle's. </LI>
</OL>
<P>
<HR>
<A HREF="server-3.html">Next</A>
<A HREF="server-1.html">Previous</A>
<A HREF="server.html#toc2">Contents</A>
</BODY>
</HTML>