Sophie: huskybse-1.0.0-7mdv2010.0 noarch

huskybse-1.0.0-7mdv2010.0.noarch.rpm

Tobi's tips and tricks for writing portable code
================================================

I am maintaining a Husky installation on a Tru64 Unix system. I also
sometimes try to compile the code on other exotic platforms, like AIX,
FreeBSD, OS/2, WinNT, Alpha-Linux, etc. pp. When doing so, I frequently
encounter pieces of code in the Husky source that are not really portable,
but rely in some sublte form on the peculiarities of GNU C and/or the Linux
operating system and/or the Intel processor. I then have to fix or rewrite
that code.

I don't want to complain about this, after all it was my free choice to work
on making the code more portable, but sometimes when i fix the same time for
the seventh time, things start to annoy me a little bit. Therefore, I decided
to create this document. It is intended as a guideline for every Husky
programmer (not just for the beginners, for everybody) on writing portable
code, i.E. code that compiles both on the different flavours of Unix (like
Linux, SysV, BSD), and on DOS/OS2/NT, both on 32 and 64 bit machines, both
with GNU C as well as with other C compilers. Every Husky programmer is
encouraged to read this file before writing greater amounts of code. It
really helps. :-)

Contents
--------

Section 1: Reading binary files
Section 2: Working with case sensitive file systems
Section 3: Misceallaneous tips


Section 1: Reading binary files
-------------------------------

This section is about reading structured binary files. Imagine you want to
read a binary file that contains a 16 bit integer followed by a 32 bit
integer, followed by some other elements. People often write code like this
for this purpose>

 >  struct some_struct { unsigned short element1; unsigned long element2; };
 >  struct some_struct s;
 >  fread(&s, sizeof(s), 1, f);

Code like this is not portable at all. The problem is that this requires that
the internal structure layout that the compiler uses to store a some_struct
be exactly the same as the layout in the binary file. However, ANSI C does
NOT guarantee anything about how a structure will be stored in memory. The
code from above will work on DOS and Linux, but it requries #pragma pack(1),
which is Non-ANSI C. 

The cod from above cannot work on Sparc, or on Power PC processors, because
the big-endian byte order stores multi-byte integers in the reverse order as
compared to Intel little-endian byte order. It cannot work on Alpha or Merced
processors either, because there a long int is 64 bit, not 32. It will
probably not even work on Intel processors using other compilers than GNU or
Microsoft C, because some of them simply cannot handle structure packing in a
MS-DOS-ish style, even with pragma pack. Plus, #pragma pack in general causes
all sorts of problems if used inconsitently.

To put it short: Code like the line above in a open source project that calls 
itself "portable" is COMPLETE CRAP and not a single new line of code should be 
written in this style. DON'T EVER use read/farread/fread to directly read from 
a binary file into a structure, or write/farwrite/fwrite to write. PLEASE!

Instead, to read some_struct, proceed as follows:

  char buffer[6]; /* has room for 6 bytes == 16 bit + 32 bit */
  fread(buffer, 6, 1, f);

This works for sure, because ANSI C guarantees that a char is exactly one
byte, and that array elements are adjacent to each other. This code does not
rely on some obscure sizeof operator, but is simply hard coded to read
exactly 6 bytes (one 16 bit integer and one 32 bit integer) from the file.

Now, we need to interpret the buffer and store it into the integer
variables. This goes as follows:

  s.element1 = (unsigned short)  buffer[0] +
               ((unsigned short)(buffer[1]) << 8L);
  s.element2 = (unsigned long) buffer[2] +
               ((unsigned long)(buffer[3]) << 8L)  +
               ((unsigned long)(buffer[4]) << 16L) +
               ((unsigned long)(buffer[5]) << 24L);

This code assumes that the binary values in the file are stored in
little-endian (Intel-style) byte order, which is true for about every fidonet
file format that I know. This code reads in one byte at a time and weights it
properly. The code works because the "<<" operator is defined to shift bits
in the direction of highest significance (no matter if this actually means
"left" or "right" on your hardware).

Storing integers works in the same way:

  buffer[0] = s.element1 & 0xFF;
  buffer[1] = (s.element1 >> 8) & 0xFF;
  buffer[2] = s.element2 & 0xFF;
  buffer[3] = (s.element2 >> 8) & 0xFF;
  buffer[4] = (s.element2 >> 16) & 0xFF;
  buffer[5] = (s.element2 >> 24) & 0xFF;
  fwrite(buffer, 6, 1, f);

If you need more examples, have a look at the file structrw.c in the SMAPI
directory. It contains routines for reading and writing the binary files in
the SQUISH, FIDO *.MSG, and JAM message bases. It also defines macros that
handle the bit shifiting for you, so writing read/write code for binary
structures with these macros becomes a lot easier.


Section 2: Working with case sensitive file systems
---------------------------------------------------

UNIX file systems are case senstive. This means that if you have a file
"NODELIST.281" and try to open the file "nodelist.281", the call fails. This
is a problem, because in Fidonet case was traditionally handled
case-insensitive.

For Husky, the general rule is that all new files are created with lower case
spelling. However, you may have to handle files that you receive from other
systems that are not in lower case. Just image you write a TIC processor and
receive a TIC file that contains a "filename SOMEFILE.zip" in it, but the
corresponding file that you receive is actually called "SomeFile.ZIP". I have
seen such things happen, and it is a problem.

For this reason, I have added the adaptcase() routine to fidoconfig. It's
purpose is to search a file case-insensitively. Suppose you know the name of
a file, but because you got it from a DOSish system (e.g. inside a TIC file),
you do not know if the spelling of the file is correct, or maybe if the TIC
file contains the name in mixed case but your filesystem has it in lower
case, or in mixed case, or in upper case. Suppose the filename is stored in
fn, then the after doing

  char *fn;

  fn = get_filename_from_somewhere()
  adaptcase(fn);

The variable fn will contain the string in exactly the spelling as it is on 
your hard disk (if it is there, you still need to use fexist to check
this). If the file is not there, the file name will be all in lower case.

Adaptcase uses opendir/readdir/closedir and insensitive pattern matching to
find the correct spelling. As readdir is f*cking slow, adaptcase also builds
up a cache of the directories recently visited to speed things up. The cache
makes the code quite unreadable, but is has proven to work in Msged for
nearly a year now and really speeds things up.

The problem with the cache is that it does not expire
automatically. Therefore, if you add files to a directory (imagine unpacking
a Nodediff archive into a temporary directory) and after that you want to
match these file names with adaptcase, you first need to call
adaptcase_refresh_dir on that particular directory to cause a cache expire:

  char *nodediff = "/var/spool/fido/inbound/nodediff.a81";
  char *tempdir  = "/var/spool/fido/temp";
  char *nodediff_contents = "/var/spool/fido/temp/nodediff.281";

  adaptcase(nodediff); adaptcase(tempdir);
  call_unpacker(nodediff, tempdir);
  adaptcase_refresh_dir(tempdir);  /* IMPORTANT! */
  adaptcase(nodediff_contents);

You also should simetimes call adaptcase_refresh_dir if you are running an
interactive program, or a program that runs for a long time, and want to see
files that were not there when your program was started, but appeared at a
later time because they were created by another process.


Section 3: Miscellaneous Tips and Tricks
----------------------------------------

UNSIGNED AND SIGNED

Please don't mix signed and unsigned integers or characters without thinking
about which format makes sense, and always cast them explicitly. Some
compilers generate huge amounts of warnings if you cast between (char *) and
(unsigned char *) implicitly. This is not really a problem, but the warnings
might distract the view from other warnings that are about real problems.


PREPROCESSOR DIRECTIVES

Preprocessor directives always must start in column 1. do
#include <somestuff.h>
you should never do
  #include <somestuff.h>
and you'd better also not do
#  include <somestuff.h>
or else you will get problems with some compilers not recognizing this as a
preprocessor directives.


FOPEN: B AND T FLAGS

In fopen, you can set some flags in the second argument, like 
fopen(fn, flags). The flags string contains standard chars like 'r' for
reading and 'w' for writing, and there are optional 'b' (binary) and 't'
(text) arguments. Under Unix, it does not matter if you use the 'b' or 't'
flag or not - they are simply ignored. However, this is no so on other
systems, so PLEASE do strictly use the following scheme:

Use the 'b' flag for all files that are binary, i.E. that are not expected
to be used with a text editor, and where you rely on the fact that every
byte from 0 .. 255 is exactly written to the file as you specify it.
Examples are PKT files, Fido *.MSG files or any other message base storage
format, and so on.  Please also read Section 1 of this document for binary
files.

Never use the 't' flag. It is not ANSI C. ANSI C specifies that the absence
of the 'b' flag designates a text file, so 't' is just a synonym for "not
'b'". Some compilers will have problems when you use the 't' flag.

Do not use neither 't' nor 'b' for files that users should be able to view
and/or edit with an ordinary text editor of their operating systems.
Examples are log files,*.?LO files in a Binkley outbound (!), and other types of lists that do not
contain non-printable characters. For text files, on DOS, Windows and OS/2 a
CRLF translation will occur, i.E. when you put a \n the file will contain
\r\n, and when you read \r\n, your C program will only see a \n. For files
that you open without the 'b' flag, i.E. in text mode, you should NOT rely
on any mechanism for obtaining the file size (they just don't work, you must
read the file line by line without knowing its final size), and you should
not use fseek and ftell for other purposes than seeking to the very
beginning or the very end of the file. It usually also works to get an
offset with ftell and later seek to exactly the same offset - but NEVER do
any arithmetic like adding anything to an offset obtained by ftell. This
will NOT reliably work for text mode streams on non-Unix systems.

[EOF]