<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9"> <TITLE>The Coda File Server: Volumes </TITLE> <LINK HREF="server-3.html" REL=next> <LINK HREF="server-1.html" REL=previous> <LINK HREF="server.html#toc2" REL=contents> </HEAD> <BODY> <A HREF="server-3.html">Next</A> <A HREF="server-1.html">Previous</A> <A HREF="server.html#toc2">Contents</A> <HR> <H2><A NAME="s2">2. Volumes </A></H2> <P> <P> <H2><A NAME="ss2.1">2.1 Volume Data Structures</A> </H2> <P> <P>When a server is initialized during its first run, a structure at a fixed RVM address is initialized, from which all data access is initiated. The structure is of type <EM>coda_recoverable_segment</EM> defined in <CODE>coda_globals.h</CODE>. For volume handling it holds the following information: <P> <OL> <LI> an array <EM>VolumeList</EM> of size <EM>MAXVOLS</EM> of <EM>VolHead</EM> structures. </LI> <LI> a constant <EM>MaxVolId</EM> holding the maximum allocated volume id. This constant is overloaded and holds the constant <EM>ThisServerId</EM> in its top 8 bits.</LI> </OL> <P>We will now explore how the <EM>VolHead</EM> structures lead to other volume structures. There are many data structures linking these things together: <P> <DL> <P> <DT><B>VolHead:</B><DD><P>defined in <CODE>camprivate.h</CODE>. An RVM structure containing a structure <EM>VolumeHeader</EM> and <EM>VolumeData</EM>. <P> <DT><B>VolumeData:</B><DD><P>defined in <CODE>camprivate.h</CODE>. An RVM structure, points to <EM>VolumeDiskData</EM> and points to <EM>SmallVnodeLists</EM> (with some contstants related to these, and to <EM>BigVnodeLists</EM>. <P> <DT><B>VolumeHeader:</B><DD><P>defined in <CODE>volume.h</CODE>. Contains the <EM>volumeid, type</EM> and <EM>parentid.</EM> <P> <DT><B>VolumeDiskData:</B><DD><P>defined in <EM>volume.h</EM>. An RVM structure holding the persistent data associated with the volume. <P> </DL> <P>These RVM structures are copied into VM when a volume is accessed - we will describe this in detail. In VM we have a hash table <EM>VolumeHashTable</EM> for <EM>Volume</EM> structures, with hash key the <EM>VolumeId</EM>. This is used in conjunction with a doubly linked list <EM>volumeLRU</EM> of <EM>volHeader</EM> structures, which was probably created to avoid keeping all <EM>volHeader</EM>'s in VM, since the latter are large. <P> <DL> <DT><B>Volume:</B><DD><P>defined in <CODE>volume.h</CODE>. This structure is the principal access point to a volume. These VM structures are held in a hash table. It contains quite a lot of information, such as a pointer to a <EM>volHeader</EM> (which has the cached RVM data), Device, partition, vol_index, vnodeIndex, locks and other run time data. <P> <DT><B>volHeader:</B><DD><P>defined in <CODE>volume.h</CODE>. A VM structure sitting on a dlist, with a backpointer to a <EM>Volume</EM>. Contains a <EM>VolumeDiskData</EM> structure. This is the VM cached copy of the RVM <EM>VolumeDiskData</EM> structure. </DL> <P>Notice that in RVM volume's are identified principally by their index, which is the index in the static <EM>VolumeList</EM> array of <EM>volHead</EM> structures. Otherwise volumes are mostly accessed by their volume id. To map an index to a volumeid proceeds through the <EM>VolumeHeader</EM> structures held in the <EM>VolumeList</EM> inside the <EM>volHead</EM> structures. <P>The reverse mapping, to get an index from a <EM>VolumeId</EM> is done through an auxiliary hashtable <EM>VolTable</EM> of type <EM>vhashtab</EM>, defined in <CODE>volhash.cc</CODE>. <P>It is informative to know the sizes of all these structures: <UL> <LI> VolumeDiskData: 636</LI> <LI> VolHead: 88</LI> <LI> volHeader: 648</LI> <LI> VolumeHeader: 20</LI> <LI> Volume: 96</LI> </UL> <P> <P> <H2><A NAME="ss2.2">2.2 Initializing the volume package </A> </H2> <P> <P>The VInitVolumePackage sets up a lot of other structures related to volumes and vnodes. <DL> <P> <DT><B>InitLRU:</B><DD><P>calloc a sequence of (normally 50) volHeader's, then call ReleaseVolumeHeader to put it at the head of the volumeLRU. <P> <DT><B>InitVolTable:</B><DD><P>sets up the <EM>VolTable</EM>, hashing volid's to get the index in the rvm <EM>VolumeList</EM>. <P> <DT><B>VolumeHashTable:</B><DD><P>Hash table used to lookup volumes by id. It stores pointers to the Volume structure <P> <DT><B>VInitVnodes:</B><DD><P>setup of the vnode VM lru caches for small and large vnodes. Store smmary information in the VnodeClassInfoArray for both small and large vnodes. The way to reach allocated vnode arrays is through the VnodeClassInfoArray. <P> <DT><B>InitLogStorage:</B><DD><P>go through the VolumeList and assign VM resolution log memory for every volume; store the pointer in VolLog[i]. The number of rlentries assigned is stored in: <PRE> VolumeList[i].data.volumeInfo->maxlogentries. </PRE> <P>It is not yet clear if RVM resolution is using this. <P> <DT><B>CheckVLDB:</B><DD><P>See if there is a VLDB. <P> <DT><B>DP_Init</B><DD><P>Find server partitions. <P> <DT><B>S_VolSalvage:</B><DD><P>This analyzes all inodes on a Coda server partition and matches data against directory contents. After this has completed, the volume <EM>InUse</EM> bit and <EM>NeedsSalvage</EM> bit are cleared (stored in the disk data). <P> <DT><B>FSYNC_fsInit:</B><DD><P>This interface is discussed below. <P> <DT><B>Attach volumes</B><DD><P>Now iterate through all volumes and call <EM>VAttachVolumeById</EM>. The details a far from clear, but the intent is to add the information for each volume to the VM hash tables. <P> <DT><B>VListVolumes</B><DD><P>write out the <CODE>/vice/vol/VolumeList</CODE> file (VListVolumes). <P> <DT><B>Vinit</B><DD><P>This variable is now set to 1. </DL> <P> <H2><A NAME="ss2.3">2.3 Attaching volumes </A> </H2> <P> <P>This code is sheer madness. <P>Attaching volumes is the following process. First a <EM>VGetVolume</EM> is done. If this return a volume, it has found it in the hash table. If that volume is in use, it is detached with <EM>VDetachVolume</EM> (error here is ignored). Attaching is not supposed to happen with anything in the hashtable already. <P>Next the <EM>VolumeHeader</EM> is extracted from RVM. If the program is running as a <EM>volumeUtility</EM> then <EM>FSYNC_askfs</EM> is used to request the volume. We continue to chech the partition and call <EM>attach2</EM>. <P>Attach2 allocated a new <EM>Volume</EM> structure, initializes the locks in this structure and fills in the partition information. A <EM>VolHeader</EM> is found from the LRU list and the routine <EM>VolDiskInfoById</EM> reads in the RVM data - this can only go wrong if the volume id cannot be matched to a valid index. If the needs salvage field is set, we return the NULL pointer (leaving all memory allocated). Attach2 continues by checking for a variety of bad conditions, such as the volume apparently having crashed. If it is <EM>blessed</EM>, <EM>in service</EM> and the <EM>salvage</EM> flag is not set, then we are ready to go and put the volume in the hash table. If the volume is writable the bitmaps for vnodes are filled in. <P><EM>DetachVolume</EM> is also very complicated. It starts by taking the volume out of the hash table (forcefully). This means that <EM>VGetVolume</EM> will no longer find it. <P>The <EM>shuttingDown</EM> flag is set to one, and <EM>VPutVolume</EM> is called. This frees the volume and <EM>LWP_signal</EM>s people waiting on the <EM>VPutVolume</EM> condition. [It is not clear that any process is waiting, since they normally only appear to wait when the <EM>goingOffline</EM> flag is set, not when the <EM>shuttingDown</EM> flag is set.] <P> <P> <H2><A NAME="ss2.4">2.4 State of volumes </A> </H2> <P> <P>There are numerous flags indicating the state of volumes: <P> <DL> <P> <DT><B>inUse</B><DD><P>Cleared by <EM>attach2</EM>, but only if volume is <EM>blessed</EM>, not <EM>needsSalvaged</EM> and <EM>inService</EM>. <P> <DT><B>inService</B><DD><P>The latter flag has some use in the volume utility package, not much else. <P> <P> <DT><B>goingOffline</B><DD><P>Taking a volume <EM>offline</EM> (as in <EM>Voffline</EM> means writing the volume header to disk, the <EM>inUse</EM> bit turned off. A copy is kept around in VM. <P>Set by <EM>Voffline</EM> <DT><B>shuttingDown</B><DD><P>used in <EM>VDetachVolume</EM>. <P> <DT><B>goingOffline</B><DD><P>Set by <EM>VOffline</EM>. Clear by <EM>VPutVolume</EM>, <EM>VForceOffline</EM> and <EM>attach2</EM>. In this case <EM>VPutVolume</EM> sets <EM>inUse</EM> to 0, and <EM>VForceOffline</EM> does that and sets the <EM>needsSalvaged</EM> flag. <P> <DT><B>blessed</B><DD><P>Heavily manipulated by volume utilities while creating volumes, backing them up etc. Probably indicates that the volume is not in an internally consistent state. <P> <DT><B>needsSalvage</B><DD><P>This is an error condition. Cleared by <EM>VolSalvage</EM>, set by <EM>VForceOffline</EM>. <P> </DL> <P> <H2><A NAME="ss2.5">2.5 The FSYNC interface </A> </H2> <P> <P>Requests for volume operations, such as <EM>VGetVolume</EM> can come from: <P> <UL> <LI> The fileserver</LI> <LI> A volume utility</LI> <LI> The salvager</LI> </UL> <P>The FSYNC package allows a volume utility to register itself. The call <EM>VConnectFS</EM> is made during <EM>VInitVolutil</EM> and makes available an array for the calling thread, named <EM>OffLineVolumes</EM>. This array is cleaned up during <EM>VDisconnectFS</EM> and contains a list of volumes that have been requested to be offline. <P>The requests that can be made are <EM>FSYNC_ON</EM> to get a volume back online. This clears the spot in the OffLineVolumes, calls <EM>VAttachVolume</EM> and finally writes out changes by calling <EM>VPutVolume</EM>. <P>The other request is <EM>FSYNC_OFF</EM> and <EM>FSYNC_NEEDVOLUME</EM>. If the volume is not yet in the threads <EM>OffLineVolumes</EM> then a spot is found for the volume. If this is not for cloning the volume is marked as <EM>BUSY</EM> in the <EM>specialStatus</EM> field of the <EM>Volume</EM> structure. In <EM>VOffline</EM> the volume header is written to disk, with the inUse bit turned off. A copy of the header is maintained in memory, however (which is why this is VOffline, not VDetach). <P>The FSYNC package can also watch over relocation sites. This is not functional anymore, and should probably be removed. <P> <H2><A NAME="ss2.6">2.6 Interface to the <EM>volumeLRU</EM> </A> </H2> <P> <P>This doubly linked list of <EM>volHeader</EM>'s is one of the more confusing areas in the code. This is the interface between VM and RVM data, and the invariants are not very clear. <P> <DL> <P> <DT><B>AvailVolumeHeader(struct Volume *)</B><DD><P>See if there is a <EM>volHeader</EM> available for the volume passed. <P> <DT><B>GetVolumeHeader(struct Volume *)</B><DD><P>Assign a <EM>volHeader</EM> to the <EM>Volume</EM> structure. This routine does <B>not</B> fill in the data. </DL> <P> <H2><A NAME="ss2.7">2.7 Interface to the <EM>VolumeHashTable</EM>. </A> </H2> <P> <P>The when a volume is needed, a call is made to <EM>VGetVolume</EM>. This routine is given a <EM>volumeid</EM> to search for and finds the <EM>Volume</EM> structure in the <EM>VolumeHashTable</EM>. If this is not found it returns <EM>VNOVOL</EM>. <P>If there are users of this volume already, then we are guaranteed that the <EM>VolumeHeader</EM> structure for this volume is available too. Perhaps the <EM>header</EM> field in the volume structure still exists, otherwise a <EM>volHeader</EM> must be found in the LRU, which possibly involves writing an old one back to RVM. <P>If the <EM>header</EM> field in the volume structure is not null we are ok, if it is null we would like to use the first available header in the <EM>volumeLRU</EM> list. This is done by checking the <EM>back</EM> pointer in the <EM>volHeader</EM>. <P> <P> <H2><A NAME="ss2.8">2.8 Creating a volume </A> </H2> <P> <P>The start of this is found in vol-create.cc in the volutil directory. <OL> <LI> We first initialize the volumeid with VAllocateVolumeId.</LI> <LI> VCreateVolume (vol/vutil.cc) now build a VolumeDiskData structure <EM> vol </EM> in VM. The recoverable resolution log vol.log is generated with recov_vol_log which does a RVMLIB_REC_MALLOC to allocate a certain number of pointers to recle's. </LI> </OL> <P> <HR> <A HREF="server-3.html">Next</A> <A HREF="server-1.html">Previous</A> <A HREF="server.html#toc2">Contents</A> </BODY> </HTML>