Tobi's tips and tricks for writing portable code ================================================ I am maintaining a Husky installation on a Tru64 Unix system. I also sometimes try to compile the code on other exotic platforms, like AIX, FreeBSD, OS/2, WinNT, Alpha-Linux, etc. pp. When doing so, I frequently encounter pieces of code in the Husky source that are not really portable, but rely in some sublte form on the peculiarities of GNU C and/or the Linux operating system and/or the Intel processor. I then have to fix or rewrite that code. I don't want to complain about this, after all it was my free choice to work on making the code more portable, but sometimes when i fix the same time for the seventh time, things start to annoy me a little bit. Therefore, I decided to create this document. It is intended as a guideline for every Husky programmer (not just for the beginners, for everybody) on writing portable code, i.E. code that compiles both on the different flavours of Unix (like Linux, SysV, BSD), and on DOS/OS2/NT, both on 32 and 64 bit machines, both with GNU C as well as with other C compilers. Every Husky programmer is encouraged to read this file before writing greater amounts of code. It really helps. :-) Contents -------- Section 1: Reading binary files Section 2: Working with case sensitive file systems Section 3: Misceallaneous tips Section 1: Reading binary files ------------------------------- This section is about reading structured binary files. Imagine you want to read a binary file that contains a 16 bit integer followed by a 32 bit integer, followed by some other elements. People often write code like this for this purpose> > struct some_struct { unsigned short element1; unsigned long element2; }; > struct some_struct s; > fread(&s, sizeof(s), 1, f); Code like this is not portable at all. The problem is that this requires that the internal structure layout that the compiler uses to store a some_struct be exactly the same as the layout in the binary file. However, ANSI C does NOT guarantee anything about how a structure will be stored in memory. The code from above will work on DOS and Linux, but it requries #pragma pack(1), which is Non-ANSI C. The cod from above cannot work on Sparc, or on Power PC processors, because the big-endian byte order stores multi-byte integers in the reverse order as compared to Intel little-endian byte order. It cannot work on Alpha or Merced processors either, because there a long int is 64 bit, not 32. It will probably not even work on Intel processors using other compilers than GNU or Microsoft C, because some of them simply cannot handle structure packing in a MS-DOS-ish style, even with pragma pack. Plus, #pragma pack in general causes all sorts of problems if used inconsitently. To put it short: Code like the line above in a open source project that calls itself "portable" is COMPLETE CRAP and not a single new line of code should be written in this style. DON'T EVER use read/farread/fread to directly read from a binary file into a structure, or write/farwrite/fwrite to write. PLEASE! Instead, to read some_struct, proceed as follows: char buffer[6]; /* has room for 6 bytes == 16 bit + 32 bit */ fread(buffer, 6, 1, f); This works for sure, because ANSI C guarantees that a char is exactly one byte, and that array elements are adjacent to each other. This code does not rely on some obscure sizeof operator, but is simply hard coded to read exactly 6 bytes (one 16 bit integer and one 32 bit integer) from the file. Now, we need to interpret the buffer and store it into the integer variables. This goes as follows: s.element1 = (unsigned short) buffer[0] + ((unsigned short)(buffer[1]) << 8L); s.element2 = (unsigned long) buffer[2] + ((unsigned long)(buffer[3]) << 8L) + ((unsigned long)(buffer[4]) << 16L) + ((unsigned long)(buffer[5]) << 24L); This code assumes that the binary values in the file are stored in little-endian (Intel-style) byte order, which is true for about every fidonet file format that I know. This code reads in one byte at a time and weights it properly. The code works because the "<<" operator is defined to shift bits in the direction of highest significance (no matter if this actually means "left" or "right" on your hardware). Storing integers works in the same way: buffer[0] = s.element1 & 0xFF; buffer[1] = (s.element1 >> 8) & 0xFF; buffer[2] = s.element2 & 0xFF; buffer[3] = (s.element2 >> 8) & 0xFF; buffer[4] = (s.element2 >> 16) & 0xFF; buffer[5] = (s.element2 >> 24) & 0xFF; fwrite(buffer, 6, 1, f); If you need more examples, have a look at the file structrw.c in the SMAPI directory. It contains routines for reading and writing the binary files in the SQUISH, FIDO *.MSG, and JAM message bases. It also defines macros that handle the bit shifiting for you, so writing read/write code for binary structures with these macros becomes a lot easier. Section 2: Working with case sensitive file systems --------------------------------------------------- UNIX file systems are case senstive. This means that if you have a file "NODELIST.281" and try to open the file "nodelist.281", the call fails. This is a problem, because in Fidonet case was traditionally handled case-insensitive. For Husky, the general rule is that all new files are created with lower case spelling. However, you may have to handle files that you receive from other systems that are not in lower case. Just image you write a TIC processor and receive a TIC file that contains a "filename SOMEFILE.zip" in it, but the corresponding file that you receive is actually called "SomeFile.ZIP". I have seen such things happen, and it is a problem. For this reason, I have added the adaptcase() routine to fidoconfig. It's purpose is to search a file case-insensitively. Suppose you know the name of a file, but because you got it from a DOSish system (e.g. inside a TIC file), you do not know if the spelling of the file is correct, or maybe if the TIC file contains the name in mixed case but your filesystem has it in lower case, or in mixed case, or in upper case. Suppose the filename is stored in fn, then the after doing char *fn; fn = get_filename_from_somewhere() adaptcase(fn); The variable fn will contain the string in exactly the spelling as it is on your hard disk (if it is there, you still need to use fexist to check this). If the file is not there, the file name will be all in lower case. Adaptcase uses opendir/readdir/closedir and insensitive pattern matching to find the correct spelling. As readdir is f*cking slow, adaptcase also builds up a cache of the directories recently visited to speed things up. The cache makes the code quite unreadable, but is has proven to work in Msged for nearly a year now and really speeds things up. The problem with the cache is that it does not expire automatically. Therefore, if you add files to a directory (imagine unpacking a Nodediff archive into a temporary directory) and after that you want to match these file names with adaptcase, you first need to call adaptcase_refresh_dir on that particular directory to cause a cache expire: char *nodediff = "/var/spool/fido/inbound/nodediff.a81"; char *tempdir = "/var/spool/fido/temp"; char *nodediff_contents = "/var/spool/fido/temp/nodediff.281"; adaptcase(nodediff); adaptcase(tempdir); call_unpacker(nodediff, tempdir); adaptcase_refresh_dir(tempdir); /* IMPORTANT! */ adaptcase(nodediff_contents); You also should simetimes call adaptcase_refresh_dir if you are running an interactive program, or a program that runs for a long time, and want to see files that were not there when your program was started, but appeared at a later time because they were created by another process. Section 3: Miscellaneous Tips and Tricks ---------------------------------------- UNSIGNED AND SIGNED Please don't mix signed and unsigned integers or characters without thinking about which format makes sense, and always cast them explicitly. Some compilers generate huge amounts of warnings if you cast between (char *) and (unsigned char *) implicitly. This is not really a problem, but the warnings might distract the view from other warnings that are about real problems. PREPROCESSOR DIRECTIVES Preprocessor directives always must start in column 1. do #include <somestuff.h> you should never do #include <somestuff.h> and you'd better also not do # include <somestuff.h> or else you will get problems with some compilers not recognizing this as a preprocessor directives. FOPEN: B AND T FLAGS In fopen, you can set some flags in the second argument, like fopen(fn, flags). The flags string contains standard chars like 'r' for reading and 'w' for writing, and there are optional 'b' (binary) and 't' (text) arguments. Under Unix, it does not matter if you use the 'b' or 't' flag or not - they are simply ignored. However, this is no so on other systems, so PLEASE do strictly use the following scheme: Use the 'b' flag for all files that are binary, i.E. that are not expected to be used with a text editor, and where you rely on the fact that every byte from 0 .. 255 is exactly written to the file as you specify it. Examples are PKT files, Fido *.MSG files or any other message base storage format, and so on. Please also read Section 1 of this document for binary files. Never use the 't' flag. It is not ANSI C. ANSI C specifies that the absence of the 'b' flag designates a text file, so 't' is just a synonym for "not 'b'". Some compilers will have problems when you use the 't' flag. Do not use neither 't' nor 'b' for files that users should be able to view and/or edit with an ordinary text editor of their operating systems. Examples are log files,*.?LO files in a Binkley outbound (!), and other types of lists that do not contain non-printable characters. For text files, on DOS, Windows and OS/2 a CRLF translation will occur, i.E. when you put a \n the file will contain \r\n, and when you read \r\n, your C program will only see a \n. For files that you open without the 'b' flag, i.E. in text mode, you should NOT rely on any mechanism for obtaining the file size (they just don't work, you must read the file line by line without knowing its final size), and you should not use fseek and ftell for other purposes than seeking to the very beginning or the very end of the file. It usually also works to get an offset with ftell and later seek to exactly the same offset - but NEVER do any arithmetic like adding anything to an offset obtained by ftell. This will NOT reliably work for text mode streams on non-Unix systems. [EOF]