Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > 211238da6d926d1ca4390483bb29f586 > files > 53

coda-doc-5.2.0-4mdk.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE> RPC2 User Guide and Reference Manual: SFTP Internals</TITLE>
 <LINK HREF="rpc2_manual-17.html" REL=next>
 <LINK HREF="rpc2_manual-15.html" REL=previous>
 <LINK HREF="rpc2_manual.html#toc16" REL=contents>
</HEAD>
<BODY>
<A HREF="rpc2_manual-17.html">Next</A>
<A HREF="rpc2_manual-15.html">Previous</A>
<A HREF="rpc2_manual.html#toc16">Contents</A>
<HR>
<H2><A NAME="s16">16. SFTP Internals</A></H2>

<P>
<A NAME="SFTPInternals"></A> <P>
<H2><A NAME="ss16.1">16.1 Background</A>
</H2>

<P>
<P>An SFTP file transfer can take place from either an RPC2 server or a
RPC2 client. To avoid confusion we will refer to the
transmitting entity as the <EM>source</EM> and the receiver entity as the <EM>sink</EM>.
RPC2 clients and servers are not regarded as peers. While an RPC2
client might be someones personal workstation, an RPC2 server
could be serving a large user community. In an effort to improve
scalability when more clients are added to the system, the servers
will handle all SFTP flow control, irrespective if they are the source
or the sink.
<P>An RPC2 client can use SFTP to transfer a file simultaneously to more
than one RPC2 server using IP multicasting. Multicast file transfers
are only possible when the source is an RPC2 client. The sinks will
send flow control information to the source, and it will adapt to the
requirements of the slowest sink.
<P>An SFTP file transfer is basically a cyclic exchange of data and
acknowledgements. At the beginning of each cycle, the source will send
a block of data packets. It will then wait for an acknowledgement to
arrive. The acknowledgement will specify which packets the sink has
received. The cycle then repeats. The source will now retransmit any
packets that it knows that the sink did not receive, followed by a
block of new packets.
<P>When the source has transmitted a block of data packets, it will wait
for the arrival of an acknowledgement. If the source is an RPC2
server and the acknowledgement does not arrive after a predetermined
time, the source will retransmit the block of data packets. It
basically acts as if it received an acknowledgement that indicated
that the entire block of data packets had been lost.
<P>If the source is an RPC2 client, however, it will wait passively for
an acknowledgement to arrive. If the sink does not receive more data
packets after a predetermined period of time, it will conclude that
acknowledgement was lost in transit and retransmit it.
<P>
<H2><A NAME="ss16.2">16.2 SFTP Code Structure</A>
</H2>

<P>In this section we describe the SFTP code structure.  
Our description assumes that the reader is already familiar with the
description of basic RPC2 internals in Chapter 
<A HREF="rpc2_manual-14.html#RPC2Int">XXX</A>.
<P>
<P>
<H3>Thread creation and Initialization</H3>

<P>In the base RPC2, the RPC2 client and server communicate via Internet Sockets.
Both at the client and at the server, the socket is created during
initialization by calling RPC2_Init.  A SocketListener thread is present at
both ends to monitor these sockets.
<P>When using SFTP, in addition to the above, there is another set of sockets created; one at the
client and another at the server.  These sockets are monitored by the SFTP_Listener.  Both of
these are created during the iniailization of the SFTP package.
<P>Note that there are two independent channels of communication between the client and the server.
The first channel (which we will refer to as the <EM>RPC2 channel</EM>)
that is associated with the base RPC2 is used for making simple RPCs, for 
making RPCs requesting for the file transfers, for retransmissions and BUSYs.  All other 
exchanges related to the file transfer are handled by the second channel  which will be referred
to in future as the <EM>sftp channel</EM>.
<P>As previously mentioned, a file can be transferred from a client to a server
or from a server to a client.
In the first case, the server is the sink.
In the second case, the client is the sink. 
The code however is not symmetric; i.e., the code executed when
a client is the sink is slightly different from the code executed
when a server is the sink.  We describe both cases below.
<P>
<H3>Data Structures used in SFTP</H3>

<P>In addition to the data structures used in RPC2 (on the RPC2 channel), SFTP uses a data 
structure called SFTP_Entry which is given in sftp.h.  It contains fields  relevant to the sftp
channel such as the LocalHandle.  Other fields include the state of the file transfer,
the packet size, the window size and a number of others.  It is created by the call
sftp_AllocSEntry.
<P>
<H3>File transfer from server to client</H3>

<P>This is the case in which the server is the source and the client is the sink.
The client makes a request for the file by doing a <EM>RPC2_MakeRPC</EM>
on the RPC2 channel. 
This is received by the servers SocketListener which wakes up a suitable
LWP blocked on a <EM>RPC2_GetRequest</EM>.
The LWP then calls the routine that is meant to handle this request.
This routine contains calls to two routines, <EM>RPC2_InitSE</EM>
and <EM>RPC2_CheckSE</EM>.
<EM>RPC2_InitSE</EM>  initializes certain internal data structures.
<EM>RPC2_CheckSE</EM> handles the actual  file transfer.
The main procedure in <EM>RPC2_CheckSE</EM> which deals with the
file transfer from SERVERTOCLIENT is the <EM>PutFile</EM> routine.
<P>The <EM>PutFile</EM> routine sets some of the fields of the data structure SEntry,
sets the transfer state of SEntry to <EM>XferInProgress</EM> (transfer in progress)
and calls <EM>sftp_SendStrategy</EM>. This routine sends a set of packets,
using a strategy described in the next section.
After sending the first set of packets,
a <EM>while</EM> loop is entered and executed as long as the transfer state
of SEntry is still in <EM>XferInProgress</EM>.
In the <EM>while</EM> loop,
<EM>AwaitPacket</EM> and <EM>sftp_SendStrategy</EM> are called alternately.
The <EM>AwaitPacket</EM> routine waits for either an ACK, NAK or for
a timeout.  If a timeout occurs, the packets are retransmitted
using <EM>sftp_SendStrategy</EM>.
If an ACK is received, the <EM>sftp_AckArrived</EM> routine is called.
This routine advances the transmission window and checks to see
if the transfer is complete.  If so, it sets the SEntrys transfer
state to <EM>XferCompleted</EM>, and the <EM>while</EM> loop is exited. 
Otherwise, the next set of packets is transmitted, after which control
is yielded.
Note that all these packets are sent on the SFTP channel, not the main
RPC2 channel.
<P>At the client end, the sftp_Listener detects a packet in the socket,
receives it and processes it by calling <EM>sftp_ProcessPacket</EM>.
This routine after receiving the packet calls the <EM>ExaminePacket</EM> routine.
This routine sanity checks the packet, and identifies it as a 
DATA packet.  It then calls <EM>sftp_DataArrived</EM> which sends the requested
ACKs and writes the data to disk by calling the <EM>WriteStrategy</EM> routine.
The sftp_listener yields after each packet it processes.
<P>The packet from the client is received by the sftp_listener at the server
which then calls the <EM>ServerPacket</EM> routine which modifies
the appropriate SEntry.  It then does an <EM>IOMGR_Select</EM>,
and yields control.  Control is then transferred to the LWP waiting
on this packet, and the cycle continues.
<P>
<H3>File transfer from client to server</H3>

<P>This is the case in which the server is the sink and the client is the
source.  As in previous case, the client makes a request for the file
by doing a <EM>RPC2_MakeRPC</EM> on the RPC2 channel. This is received by
the servers SocketListener which wakes up a suitable LWP blocked on
a <EM>RPC2_GetRequest</EM>.  The LWP then calls the routine that is meant to
handle this request.  This routine contains calls to the two routines
<EM>RPC2_InitSE</EM> and <EM>RPC2_CheckSE</EM>.
The <EM>RPC2_InitSE</EM> initializes some of the fields of the data structure.
The main routine in <EM>RPC2_CheckSE</EM> which deals with the file transfer
from CLIENTTOSERVER is the <EM>GetFile</EM> routine.
<P>
<P>The <EM>GetFile</EM> routine sets some of the fields of the data structure SEntry,
sets the transfer state of SEntry to <EM>XferInProgress</EM> and sends
a <EM>START</EM> packet to the client to tell the
client that the server is ready to receive the file.  It then enters a
<EM>while</EM> loop which is executed as long as the transfer state of SEntry
is still in <EM>XferInProgress</EM>.  In the <EM>while</EM> loop, 
<EM>AwaitPacket</EM> and <EM>sftp_DataArrived</EM> are called alternately.
The <EM>AwaitPacket</EM> routine waits for either a packet to arrive or for
a timeout.  If a timeout occurs, the ACK is retransmitted.
If a DATA packet is received, the <EM>sftp_DataArrived</EM> routine is called.
This routine in turn calls the <EM>sftp_WriteStrategy</EM>.
When the file transfer is eventually completed, the transfer state of
the SEntry is set to <EM>XferCompleted</EM>, and the loop is exited.
<P>The sftp_Listener at the client end receives the packet and decodes it, and calls the
<EM>ClientPacket</EM> routine which in turn identifies the packet as an <EM>SFTP_START</EM>
packet.  It then calls <EM>sftp_StartArrived</EM> which sets some of the fields in the SEntry
data structure and calls the <EM>sftp_SendStrategy</EM> descirbed above.  The sftp_Listener then
block on an IOMGR_Select.  Note that it patiently waits for an ACK from the server, and
does not retransmit if it does not receive an ACK within a given time.  What prevents
the client from waiting forever is that communication exists between the client and the server
on the RPC2 channel in the form of retransmissions and BUSYs.
When an ACK arrives, it transmits the next set of packets.
<P>The sftp_Listener at the server end receives the packet and calls the <EM>ServerPacket</EM>
routine.  This routine wakes up the appropriate LWP (which is blocked in the
<EM>AwaitPacket</EM> call).
@foot(Note that although the client sends
a number of packets, the sftp_Listener receives and processes them one at a time;
yielding control after each one.  The same applies at the server end.)
<P>Note that the role of the sftp_listener is different at the client end and at the server end.
At the client end, the whole sftp transfer is handled by the sftp_listener.  At the server
end the sftp_listener receives and decodes the packet.  Most of the sftp transfer is handled by the LWP thread.
<P>
<H2><A NAME="ss16.3">16.3 Packet formats</A>
</H2>

<P>
<P>All packets carry 32 bit sequence numbers. Data packets and control
packets have independent sequence numbers. The sequence number series
of the source and sink (s) are also independent of each other.
<P>There are thus at least 4 sequences in a connection:
<UL>
<LI>Source to Sink, Data
Source to Sink, Control
Sink to Source, Data
Sink to Source, Control</LI>
</UL>

The Sink to Source sequence space is currently never used. When doing
a multicast file transfer each sink will have independent sequence
number series.
<P>The sequence number for a particular packet type is incremented by one for
each new packet of its type that is sent.
<P>The <EM>MOREDATA</EM> flag will be set in each data packet except for the very
last one. This is to facilitate end of file detection.
If the <EM>ACKME</EM> flag is set on a data packet it requests an
acknowledgement, <EM>ACK</EM>, from all of the servers.
<P>Each <EM>ACK</EM> packet describes which packets have been received by the
particular server. There should be little or no need 
to transmit an acknowledgement packet for each data packet. It is of
particular benefit to limit the number of <EM>ACK</EM> packets given our single
channel operating environment. The acknowledgement packets will
contend with data packets going in the other direction. 
<P>Each acknowledgement packet has a 64-bit wide bitmask and an offset
counter, <EM>GotEmAll</EM>. This counter is the highest sequence number
of a data packet such that it and all preceding data packets have
been received. The bitmask indicates which of the data packets with
sequence numbers greater than <EM>GotEmAll</EM> that have been received.
Each bit in the bitmask represents a single packet.
<P>
<H3>Protocol details</H3>

<P>
<P>If the source is an RPC2 client it must first wait for permission from
the sink (s) before it can transmit. This permission is granted by a
special <EM>START</EM> packet.
<P>The following counters are of relevance to the SFTP source protocol
machine. <EM>SendLastContig</EM>, which is the sequence number of the
latest packet to be moved out of the transmission window, and
<EM>SendMostRecent</EM>, which is the sequence number of the data packet
last sent. There are also three important transmission parameters: the
transmission window size, <EM>AckPoint</EM>, and size of the <EM>SendAhead</EM>
set.
<P>When an SFTP source begins the transfer, <EM>SendLastContig</EM> and
<EM>SendMostRecent</EM> will be equal. The packets in the <EM>SendAhead</EM> set are
transmitted, and <EM>SendMostRecent</EM> is increased by the size of
<EM>SendAhead</EM>. Only one of these packets will have the <EM>ACKME</EM> flag
set. The relative position of this packet in the <EM>SendAhead</EM> set is
given by <EM>AckPoint</EM>. <EM>AckPoint</EM> must thus be less than or equal to
the size of the <EM>SendAhead</EM> set.
<P>Packets which have been sent and for which an <EM>ACK</EM> has been
requested but not yet received fall into two categories: the
<EM>NeedAck</EM> set and the <EM>Worried</EM> set. They are distinguished by
whether or not an retranmission timeout has occurred since they were
sent. Packets in the <EM>NeedAck</EM> set have been sent and an <EM>ACK</EM> has
been requested, but not enough time has passed to be worried about the
fact that an <EM>ACK</EM> has not been received.  Packets which have been
sent for which an <EM>ACK</EM> has not yet been requested, if any, are
called the <EM>InTransit</EM> set. The <EM>InTransit</EM> set will always be
empty if <EM>AckPoint</EM> equals the <EM>SendAhead</EM> size.
<P>The source then waits for an <EM>ACK</EM> packet from the sink.
Our implementation uses the waiting time to prefetch more data
from the disk.
During ideal conditions the source will proceed only after having received
the <EM>ACK</EM> it is waiting for.
In practice, however, it may timeout and retransmit data
packets if it is operating as an RPC2 server.
<P>At this point, the source will revise the <EM>Worried</EM> set. Any packets
that have been acknowledged will be taken off the <EM>Worried</EM> set.
The transmit window is shifted by increasing the
<EM>SendLastContig</EM> counter. It will be set to one less the smallest
sequence number of a member in the <EM>Worried</EM> set. 
<P>All the packets that are in the <EM>Worried</EM> set are retransmitted
followed by <EM>SendAhead</EM> new packets. No packets will have the
<EM>ACKME</EM> flag set, except for the member of <EM>SendAhead</EM> set whose
index is given by <EM>AckPoint</EM>. The packets in the new <EM>SendAhead</EM>
set are then either added to the <EM>NeedAck</EM> set or to the
<EM>InTransit</EM> set, depending upon the <EM>AckPoint</EM> value, as described
above. Packets are placed in the <EM>Worried</EM> set only after a
retransmission interval has expired.  The procedure is repeated until
the file has been completely transfered. At no point, however, will
the protocol have more packets outstanding than what is given by the
transmit window size. Whenever the sum of the number of packets in the
various sets is greater than the transmit window size, only the first
packet in the <EM>Worried</EM> set will be sent.
<P>
<H3>Sink side operation</H3>

<P>
<P>The sink keeps two counters, <EM>RecvLastContig</EM> and <EM>RecvMostRecent</EM>, which
are similar to their counterparts at the source side. <EM>RecvLastContig</EM>
is the sequence number of the last data packet where it and all previous data
packets have been received. It is used as the <EM>GotEmAll</EM> counter when
sending an <EM>ACK</EM> packet. <EM>RecvMostRecent</EM> is the highest sequence number
of a packet received so far.
<P>Before the file transfer takes place, the source will inform the sink
about the parameters <EM>RetryInterval</EM>, <EM>RetryCount</EM> and <EM>DupThreshold</EM>.
If the sink is an RPC2 server it will grant the source permission to
transmit data packets by sending a <EM>START</EM> packet. It will then
start waiting for data packets. If the sink is an RPC2 server and no
data packet has arrived after the time specified by <EM>RetryInterval</EM>,
it will send an <EM>ACK</EM> (or <EM>START</EM>) packet, trying to cause the
source to retransmit its data. If this fails <EM>RetryCount</EM> number of
times, without any valid data packet being received, the sink will
consider the connection unusable.
<P>The sink keeps track of the number of duplicate data packets that
have arrived since the last time an <EM>ACK</EM> was sent. If that number
exceed <EM>DupThreshold</EM>, the sink will send an <EM>ACK</EM> in an attempt
to inform the source about the situation.
<P>
<H3>Client and server invariants</H3>

<P>
<P>The state of the counters at the source and sink can be summarized
by the following invariant relations, where
SendAckLimit and SendWorriedLimit are upper bounds of the 
NeedAck and Worried sets, respectively.
<P>
<P>
<H3>Invariants when transfer is in progress </H3>

<P>
<OL>
<LI> SendLastContig &lt;= SendWorriedLimit &lt;= SendAckLimit &lt;= SendMostRecent 
(SendMostRecent - SendLastContig) &lt;= WindowSize
(SendMostRecent - SendAckLimit) &lt;= SendAhead
</LI>
<LI>RecvLastContig &lt;= RecvMostRecent
(RecvMostRecent - RecvLastContig) &lt;= WindowSize</LI>
</OL>
<P>
<H3>Invariants when transfer is completed, aborted or not started </H3>

<P>
<OL>
<LI>SendLastContig (at source) = SendMostRecent (at source)
</LI>
<LI>RecvLastContig (at sink) = RecvMostRecent (at sink)
</LI>
<LI>SendLastContig (at source) = RecvLastContig (at sink)</LI>
</OL>
<P>
<H2><A NAME="ss16.4">16.4 Adjusting the Retransmission Interval</A>
</H2>

<P>
<P>SFTP uses the retranmission interval to determine when it should be
worried about packets for which it has not received an <EM>ACK</EM>.
Initially, the retranmission interval is set to <EM>SFTP_RetryInterval</EM> (2 seconds), but varies depending on RTT observations collected during
file transfers.  When a timeout occurs, the retranmission timer is
backed off.  The backed off timer is independent of the RTT state in
the <EM>sEntry</EM>.
<P>Like RPC2, SFTP collects RTT observations by using packet timestamps.
The timestamps and storage of RTT state is the same as presented in
chapter 
<A HREF="rpc2_manual-15.html#RetryChapter">XXX</A>.  In SFTP, timestamping is <EM>two-way</EM>,
namely, both source and sink collect RTT observations during a
transfer. Both timestamp fields in the packet header are used: one for
the current timestamp, and one for the timestamp being echoed back to
the other side.
<P>The source collects observations as follows: it timestamps outgoing
<EM>DATA</EM> packets.  The sink echos a timestamp back on the <EM>ACK</EM>
packet. When the <EM>ACK</EM> arrives at the source, the source computes
the RTT for that send-ahead set, and updates its RTO accordingly.
<P>The sink collects observations as follows: it timestamps <EM>START</EM>,
<EM>ACK</EM>, and <EM>TRIGGER</EM> packets. (A trigger packet is an <EM>ACK</EM> that
is being used by the server because it has timed out on the client.)
The source echos the timestamp back on the first <EM>DATA</EM> packet sent
in response to such a packet. When that <EM>DATA</EM> packet arrives at the
sink, the sink computes the RTT and updates its RTO.  If the first
packet gets lost, no update is performed. If it is delayed, the update
is performed when it arrives.
<P>All that is needed for state in the <EM>sEntry</EM> is a single word,
<EM>TimeEcho</EM>, to hold the timestamp that will next be echoed on a
packet.  Each packet may carry up to two timestamps - one is the time
at which the sender sent it, and the other is the echoed timestamp. (Only <EM>ACK</EM> packets and certain <EM>DATA</EM> packets actually use both
fields.) The spare2 and spare3 fields of the packet header are used
for these fields, called <EM>TimeStamp</EM> and <EM>TimeEcho</EM>.  These fields
were previously reserved for bitmask fields, but were not being used.
<P>Packets are timestamped as they are sent out in the following routines:
<UL>
<LI> sftp_SendSendAhead, sftp_ResendWorried, sftp_SendFirstUnacked (data)
</LI>
<LI> sftp_SendAck
</LI>
<LI> send_SendStart</LI>
</UL>
<P>The packet TimeStamp field is stashed in sEntry-&gt;TimeEcho as appropriate
when a packet with a timestamp is received. This is the timestamp that
will be echoed back to the other side eventually. This occurs in:
<UL>
<LI>sftp_DataArrived, on the sink, if the packet advances the left edge
of the window (Header.SeqNumber == sEntry-&gt;RecvLastContig+1).
</LI>
<LI>sftp_StartArrived, on the source, whether the transfer
has started or not.  Data will be sent in response to the <EM>START</EM> packet
either way.
</LI>
<LI>sftp_AckArrived, on the source. If there is more data to send, the
source will send it in response to this packet.</LI>
</UL>
<P>The value in sEntry-&gt;TimeEcho is then placed in the Header.TimeEcho field 
in the following routines:
<UL>
<LI>sftp_SendAck, on the sink.
</LI>
<LI>sftp_SendSendAhead, sftp_ResendWorried, or sftp_SendFirstUnacked, on the
source. The timestamp is echoed on the <EM>first</EM> packet sent out by these
collectively (the one corresponding to sEntry-&gt;SendLastContig+1). 
All other <EM>DATA</EM> packets carry a TimeEcho of 0. A special case occurs in 
the first set of <EM>DATA</EM> packets on a server-to-client transfer, from PutFile. 
In this case there is no timestamp to echo, because the source does not 
hear from the sink before sending data. In this case, sEntry-&gt;TimeEcho is 
set to 0 at the top of PutFile. A second special case also occurs in PutFile,
when the server times out. Again, there is no timestamp to echo, because the
data is not being sent in response to a packet from the sink.</LI>
</UL>
<P>RTT measurements are computed from Header.TimeEcho in the following routines:
<UL>
<LI>sftp_AckArrived, on the source, if the <EM>ACK</EM> is not a trigger.  Triggers are
sent when the server times out during a client-to-server transfer. They do 
not represent real observations because there was no transmission from the 
source that caused them. Triggers are marked so that the source can 
distinguish them from real <EM>ACK</EM>s. 
</LI>
<LI>sftp_DataArrived, on the sink.</LI>
</UL>
<P>Any zero TimeStamp or TimeEcho is ignored, and the RTO remains
unchanged. This is chiefly for compatibility with versions of SFTP
that do not use packet timestamps.  RTT state in the sEntry is
initialized on the client in <EM>SFTP_Bind2</EM>, using the BindTime
supplied by RPC2. On the server, it is initialized in
<EM>SFTP_GetRequest</EM>, using the same BindTime shipped to the server on
the first request on the connection.
<P>
<H2><A NAME="ss16.5">16.5 Performance</A>
</H2>

<P>
<P>RPC2 and SFTP perform well over a wide range of network speeds.
Figure 
<@@ref>RPC2TableXXX</A> compares the performance of SFTP and TCP over
three different networks: Ethernet, a WaveLan wireless network, and a
modem over a phone line.  In almost all cases, SFTPs performance
equals or exceeds that of TCP.
<P>
<P>
<P>
<P>
<HR>
<A HREF="rpc2_manual-17.html">Next</A>
<A HREF="rpc2_manual-15.html">Previous</A>
<A HREF="rpc2_manual.html#toc16">Contents</A>
</BODY>
</HTML>