Sophie

Sophie

distrib > Mandriva > 8.1 > i586 > by-pkgid > 700475c8ae73fb4d57b6df4485c29e1c > files > 176

slang-doc-1.4.4-2mdk.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE> A Guide to the S-Lang Language: File Input/Output</TITLE>
 <LINK HREF="slang-17.html" REL=next>
 <LINK HREF="slang-15.html" REL=previous>
 <LINK HREF="slang.html#toc16" REL=contents>
</HEAD>
<BODY>
<A HREF="slang-17.html">Next</A>
<A HREF="slang-15.html">Previous</A>
<A HREF="slang.html#toc16">Contents</A>
<HR>
<H2><A NAME="s16">16. File Input/Output</A></H2>

<P> 
<P><B>S-Lang</B> provides built-in supports for two different I/O facilities.
The simplest interface is modeled upon the C language <CODE>stdio</CODE>
streams interface and consists of functions such as <CODE>fopen</CODE>,
<CODE>fgets</CODE>, etc.  The other interface is modeled on a lower level
POSIX interface consisting of functions such as <CODE>open</CODE>,
<CODE>read</CODE>, etc.  In addition to permitting more control, the lower
level interface permits one to access network objects as well as disk
files.
<P>
<H2><A NAME="ss16.1">16.1 Input/Output via stdio</A>
</H2>

<P>
<H3>Stdio Overview</H3>

<P>The <CODE>stdio</CODE> interface consists of the following functions:
<UL>
<LI> <CODE>fopen</CODE>, which opens a file for read or writing.
</LI>
<LI> <CODE>fclose</CODE>, which closes a file opened by <CODE>fopen</CODE>.
</LI>
<LI> <CODE>fgets</CODE>, used to read a line from the file.
</LI>
<LI> <CODE>fputs</CODE>, which writes text to the file.
</LI>
<LI> <CODE>fprintf</CODE>, used to write formatted text to the file.
</LI>
<LI> <CODE>fwrite</CODE>, which may be used to write objects to the
file.
</LI>
<LI> <CODE>fread</CODE>, which reads a specified number of objects from
the file.
</LI>
<LI> <CODE>feof</CODE>, which is used to test whether the file pointer is at the
end of the file.
</LI>
<LI> <CODE>ferror</CODE>, which is used to see whether or not the stream
associated with the file has an error.
  </LI>
<LI> <CODE>clearerr</CODE>, which clears the end-of-file and error
indicators for the stream.
</LI>
<LI> <CODE>fflush</CODE>, used to force all buffered data associated with
the stream to be written out.
</LI>
<LI> <CODE>ftell</CODE>, which is used to query the file position indicator
of the stream.
       </LI>
<LI> <CODE>fseek</CODE>, which is used to set the position of the file
position indicator of the stream.
</LI>
<LI> <CODE>fgetslines</CODE>, which reads all the lines in a text file and
returns them as an array of strings.
</LI>
</UL>
<P>In addition, the interface supports the <CODE>popen</CODE> and <CODE>pclose</CODE>
functions on systems where the corresponding C functions are available.
<P>Before reading or writing to a file, it must first be opened using
the <CODE>fopen</CODE> function.  The only exceptions to this rule involves
use of the pre-opened streams: <CODE>stdin</CODE>, <CODE>stdout</CODE>, and
<CODE>stderr</CODE>.  <CODE>fopen</CODE> accepts two arguments: a file name and a
string argument that indicates how the file is to be opened, e.g.,
for reading, writing, update, etc.  It returns a <CODE>File_Type</CODE>
stream object that is used as an argument to all other functions of
the <CODE>stdio</CODE> interface.  Upon failure, it returns <CODE>NULL</CODE>.  See the
reference manual for more information about <CODE>fopen</CODE>.
<P>
<H3>Stdio Examples</H3>

<P>
<P>In this section, some simple examples of the use of the <CODE>stdio</CODE>
interface is presented.  It is important to realize that all the
functions of the interface return something, and that return value
must be dealt with.
<P>The first example involves writing a function to count the number of
lines in a text file.  To do this, we shall read in the lines, one by
one, and count them:
<BLOCKQUOTE><CODE>
<PRE>
    define count_lines_in_file (file)
    {
       variable fp, line, count;
       
       fp = fopen (file, "r");    % Open the file for reading
       if (fp == NULL)
         verror ("%s failed to open", file);
         
       count = 0;
       while (-1 != fgets (&amp;line, fp))
         count++;
         
       () = fclose (fp);
       return count;
    }
</PRE>
</CODE></BLOCKQUOTE>

Note that <CODE>&amp;line</CODE> was passed to the <CODE>fgets</CODE> function.  When
<CODE>fgets</CODE> returns, <CODE>line</CODE> will contain the line of text read in
from the file.  Also note how the return value from <CODE>fclose</CODE> was
handled.  
<P>Although the preceding example closed the file via <CODE>fclose</CODE>,
there is no need to explicitly close a file because <B>S-Lang</B> will
automatically close the file when it is no longer referenced.  Since
the only variable to reference the file is <CODE>fp</CODE>, it would have
automatically been closed when the function returned.
<P>Suppose that it is desired to count the number of characters in the
file instead of the number of lines.  To do this, the <CODE>while</CODE>
loop could be modified to count the characters as follows:
<BLOCKQUOTE><CODE>
<PRE>
      while (-1 != fgets (&amp;line, fp))
        count += strlen (line);
</PRE>
</CODE></BLOCKQUOTE>

The main difficulty with this approach is that it will not work for
binary files, i.e., files that contain null characters.  For such
files, the file should be opened in <EM>binary</EM> mode via
<BLOCKQUOTE><CODE>
<PRE>
      fp = fopen (file, "rb");
</PRE>
</CODE></BLOCKQUOTE>

and then the data read in using the <CODE>fread</CODE> function:
<BLOCKQUOTE><CODE>
<PRE>
      while (-1 != fread (&amp;line, Char_Type, 1024, fp))
           count += bstrlen (line);
</PRE>
</CODE></BLOCKQUOTE>

The <CODE>fread</CODE> function requires two additional arguments: the type
of object to read (<CODE>Char_Type</CODE> in the case), and the number of
such objects to read.  The function returns the number of objects
actually read, or -1 upon failure.  The <CODE>bstrlen</CODE> function was
used to compute the length of <CODE>line</CODE> because for <CODE>Char_Type</CODE>
or <CODE>UChar_Type</CODE> objects, the <CODE>fread</CODE> function assigns a
<EM>binary</EM> string (<CODE>BString_Type</CODE>) to <CODE>line</CODE>.
<P>The <CODE>foreach</CODE> construct also works with <CODE>File_Type</CODE> objects.
For example, the number of characters in a file may be counted via
<BLOCKQUOTE><CODE>
<PRE>
     foreach (fp) using ("char")
     {
        ch = ();
        count++;
     }
</PRE>
</CODE></BLOCKQUOTE>

To count the number of lines, one can use:
<BLOCKQUOTE><CODE>
<PRE>
     foreach (fp) using ("line")
     {
        line = ();
        num_lines++;
        count += strlen (line);
     }
</PRE>
</CODE></BLOCKQUOTE>
<P>Finally, it should be mentioned that neither of these examples should
be used to count the number of characters in a file when that
information is more readily accessible by another means.  For
example, it is preferable to get this information via the
<CODE>stat_file</CODE> function:
<BLOCKQUOTE><CODE>
<PRE>
     define count_chars_in_file (file)
     {
        variable st;
        
        st = stat_file (file);
        if (st == NULL)
          error ("stat_file failed.");
        return st.st_size;
     }
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H2><A NAME="ss16.2">16.2 POSIX I/O</A>
</H2>

<P>
<P>
<H2><A NAME="ss16.3">16.3 Advanced I/O techniques</A>
</H2>

<P>
<P>The previous examples illustrate how to read and write objects of a
single data-type from a file, e.g.,
<BLOCKQUOTE><CODE>
<PRE>
      num = fread (&amp;a, Double_Type, 20, fp);
</PRE>
</CODE></BLOCKQUOTE>

would result in a <CODE>Double_Type[num]</CODE> array being assigned to
<CODE>a</CODE> if successful.  However, suppose that the binary data file
consists of numbers in a specified byte-order.  How can one read
such objects with the proper byte swapping?  The answer is to use
the <CODE>fread</CODE> function to read the objects as <CODE>Char_Type</CODE> and
then <EM>unpack</EM> the resulting string into the specified data type,
or types.  This process is facilitated using the <CODE>pack</CODE> and
<CODE>unpack</CODE> functions.
<P>The <CODE>pack</CODE> function follows the syntax
<BLOCKQUOTE><CODE>
BString_Type pack (<EM>format-string</EM>, <EM>item-list</EM>);
</CODE></BLOCKQUOTE>

and combines the objects in the <EM>item-list</EM> according to
<EM>format-string</EM> into a binary string and returns the result.
Likewise, the <CODE>unpack</CODE> function may be used to convert a binary
string into separate data objects:
<BLOCKQUOTE><CODE>
(<EM>variable-list</EM>) = unpack (<EM>format-string</EM>, <EM>binary-string</EM>);
</CODE></BLOCKQUOTE>
<P>The format string consists of one or more data-type specification
characters, and each may be followed by an optional decimal length
specifier. Specifically, the data-types are specified according to
the following table:
<BLOCKQUOTE><CODE>
<PRE>
     c     char
     C     unsigned char
     h     short
     H     unsigned short
     i     int
     I     unsigned int
     l     long
     L     unsigned long
     j     16 bit int
     J     16 unsigned int
     k     32 bit int
     K     32 bit unsigned int
     f     float
     d     double
     F     32 bit float
     D     64 bit float
     s     character string, null padded
     S     character string, space padded
     x     a null pad character
</PRE>
</CODE></BLOCKQUOTE>

A decimal length specifier may follow the data-type specifier. With
the exception of the <CODE>s</CODE> and <CODE>S</CODE> specifiers, the length
specifier indicates how many objects of that data type are to be
packed or unpacked from the string.  When used with the <CODE>s</CODE> or
<CODE>S</CODE> specifiers, it indicates the field width to be used.  If the
length specifier is not present, the length defaults to one.
<P>With the exception of <CODE>c</CODE>, <CODE>C</CODE>, <CODE>s</CODE>, <CODE>S</CODE>, and
<CODE>x</CODE>, each of these may be prefixed by a character that indicates
the byte-order of the object:
<BLOCKQUOTE><CODE>
<PRE>
     &gt;    big-endian order (network order)
     &lt;    little-endian order
     =    native byte-order
</PRE>
</CODE></BLOCKQUOTE>

The default is native byte order.
<P>Here are a few examples that should make this more clear:
<BLOCKQUOTE><CODE>
<PRE>
     a = pack ("cc", 'A', 'B');         % ==&gt; a = "AB";
     a = pack ("c2", 'A', 'B');         % ==&gt; a = "AB";
     a = pack ("xxcxxc", 'A', 'B');     % ==&gt; a = "\0\0A\0\0B";
     a = pack ("h2", 'A', 'B');         % ==&gt; a = "\0A\0B" or "\0B\0A"
     a = pack ("&gt;h2", 'A', 'B');        % ==&gt; a = "\0\xA\0\xB"
     a = pack ("&lt;h2", 'A', 'B');        % ==&gt; a = "\0B\0A"
     a = pack ("s4", "AB", "CD");       % ==&gt; a = "AB\0\0"
     a = pack ("s4s2", "AB", "CD");     % ==&gt; a = "AB\0\0CD"
     a = pack ("S4", "AB", "CD");       % ==&gt; a = "AB  "
     a = pack ("S4S2", "AB", "CD");     % ==&gt; a = "AB  CD"
</PRE>
</CODE></BLOCKQUOTE>
<P>When unpacking, if the length specifier is greater than one, then an
array of that length will be returned.  In addition, trailing
whitespace and null character are stripped when unpacking an object
given by the <CODE>S</CODE> specifier.  Here are a few examples:
<BLOCKQUOTE><CODE>
<PRE>
    (x,y) = unpack ("cc", "AB");         % ==&gt; x = 'A', y = 'B'
    x = unpack ("c2", "AB");             % ==&gt; x = ['A', 'B']
    x = unpack ("x&lt;H", "\0\xAB\xCD");    % ==&gt; x = 0xCDABuh
    x = unpack ("xxs4", "a b c\0d e f");  % ==&gt; x = "b c\0"
    x = unpack ("xxS4", "a b c\0d e f");  % ==&gt; x = "b c"
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H3>Example: Reading /var/log/wtmp</H3>

<P>
<P>Consider the task of reading the Unix system file
<CODE>/var/log/utmp</CODE>, which contains login records about who logged
onto the system.  This file format is documented in section 5 of the
online Unix man pages, and consists of a sequence of entries
formatted according to the C structure <CODE>utmp</CODE> defined in the
<CODE>utmp.h</CODE> C header file.  The actual details of the structure
may vary from one version of Unix to the other.  For the purposes of
this example, consider its definition under the Linux operating
system running on an Intel processor:
<BLOCKQUOTE><CODE>
<PRE>
    struct utmp {
       short ut_type;              /* type of login */
       pid_t ut_pid;               /* pid of process */
       char ut_line[12];           /* device name of tty - "/dev/" */
       char ut_id[2];              /* init id or abbrev. ttyname */
       time_t ut_time;             /* login time */
       char ut_user[8];            /* user name */
       char ut_host[16];           /* host name for remote login */
       long ut_addr;               /* IP addr of remote host */
    };
</PRE>
</CODE></BLOCKQUOTE>

On this system, <CODE>pid_t</CODE> is defined to be an <CODE>int</CODE> and
<CODE>time_t</CODE> is a <CODE>long</CODE>.  Hence, a format specifier for the
<CODE>pack</CODE> and <CODE>unpack</CODE> functions is easily constructed to be:
<BLOCKQUOTE><CODE>
<PRE>
     "h i S12 S2 l S8 S16 l"
</PRE>
</CODE></BLOCKQUOTE>

However, this particular definition is naive because it does not
allow for structure padding performed by the C compiler in order to
align the data types on suitable word boundaries.  Fortunately, the
intrinsic function <CODE>pad_pack_format</CODE> may be used to modify a
format by adding the correct amount of padding in the right places.
In fact, <CODE>pad_pack_format</CODE> applied to the above format on an
Intel-based Linux system produces the result:
<BLOCKQUOTE><CODE>
<PRE>
     "h x2 i S12 S2 x2 l S8 S16 l"
</PRE>
</CODE></BLOCKQUOTE>

Here we see that 4 bytes of padding were added.
<P>The other missing piece of information is the size of the structure.
This is useful because we would like to read in one structure at a
time using the <CODE>fread</CODE> function.  Knowing the size of the
various data types makes this easy; however it is even easier to use
the <CODE>sizeof_pack</CODE> intrinsic function, which returns the size (in
bytes) of the structure described by the pack format.
<P>So, with all the pieces in place, it is rather straightforward to
write the code:
<BLOCKQUOTE><CODE>
<PRE>
    variable format, size, fp, buf;
    
    typedef struct
    {
       ut_type, ut_pid, ut_line, ut_id,
       ut_time, ut_user, ut_host, ut_addr
    } UTMP_Type;

    format = pad_pack_format ("h i S12 S2 l S8 S16 l");
    size = sizeof_pack (format);

    define print_utmp (u)
    {
       
      () = fprintf (stdout, "%-16s %-12s %-16s %s\n",
                    u.ut_user, u.ut_line, u.ut_host, ctime (u.ut_time));
    }

       
   fp = fopen ("/var/log/utmp", "rb");
   if (fp == NULL)
     error ("Unable to open utmp file");

   () = fprintf (stdout, "%-16s %-12s %-16s %s\n",
                          "USER", "TTY", "FROM", "LOGIN@");

   variable U = @UTMP_Type;

   while (-1 != fread (&amp;buf, Char_Type, size, fp))
     {
        set_struct_fields (U, unpack (format, buf));
        print_utmp (U);
     }

   () = fclose (fp);
</PRE>
</CODE></BLOCKQUOTE>

A few comments about this example are in order.  First of all, note
that a new data type called <CODE>UTMP_Type</CODE> was created, although
this was not really necessary.  We also opened the file in binary
mode, but this too is optional under a Unix system where there is no
distinction between binary and text modes. The <CODE>print_utmp</CODE>
function does not print all of the structure fields.  Finally, last
but not least, the return values from <CODE>fprintf</CODE> and <CODE>fclose</CODE>
were dealt with.
<P>
<P>
<HR>
<A HREF="slang-17.html">Next</A>
<A HREF="slang-15.html">Previous</A>
<A HREF="slang.html#toc16">Contents</A>
</BODY>
</HTML>