Notes on the Foreign Function Interface (ffi) - 12 Feb 2001 This release includes a partial implementation of the Haskell foreign function interface definition: http://www.haskell.org/hdirect/ffi.html http://www.haskell.org/hdirect/ffi-a4.ps.gz http://www.haskell.org/hdirect/ffi-letter.ps.gz http://www.haskell.org/hdirect/ffi-a4.dvi.gz http://www.haskell.org/hdirect/ffi-letter.dvi.gz with two minor caveats (excruciating details appended at the end): o "foreign export static" is not implemented but, fortunately, this is one of the least used parts of the ffi and can be worked around. o "foreign export dynamic" is implemented but only for the x86 architecture but it should be easy to port by any experienced assembly language programmer. Suppose you have some C functions in test.c and some ffi declarations for those functions in Test.hs, you can use them with Hugs as follows: # Generate Test.c (note that it is _not_ test.c) # # [For every Haskell file loaded which contains ffi declarations, # this will generate a .c file _in the current working directory_.] hugs +G Test.hs # Compile and partially link Test.c and test.c putting the # result in Test.so. # # Details on how to partially link files vary from one platform to # another. # Most Unixen: cc -shared -I/usr/local/share/hugs/include Test.c test.c -o Test.so # MacOS X: cc -bundle -I/usr/local/share/hugs/include Test.c test.c -o Test.so # Run Hugs as normal - when Test.hs is loaded, it will load Test.so hugs Test.hs # And now try using the imported or exported functions. Enjoy! -- Alastair Reid reid@cs.utah.edu http://www.cs.utah.edu/~reid/ Known limitations: o foreign export static is not implemented. You can code around this by writing: foreign import dynamic foo_dynamic :: Addr -> (A -> B -> C) foreign label foo_addr :: Addr foo = foo_dynamic foo_addr instead of: foreign import foo :: A -> B -> C Ideally Hugs would do this for you but there are some tricky interactions between ffi and type classes which baffle me. Sorry. o foreign export dynamic is only implemented for the x86 architecture. The following information is intended for those brave souls who try to port the implementation to other architectures and can be safely ignored by everyone else. To make foreign export dynamic work for other architectures, you have to modify the function mkThunk in hugs98/src/builtin.c to generate a short sequence of machine code (and then send your fix to hugs-bugs@haskell.org for inclusion in the next release). The goal of the code is (more or less) to implement this C function rty f(ty1 a1, ... tym am) { return (*app)(s,a1, ... am); } where rty, ty1, ... tym are C types, app is a "apply" function generated by running "hugs +G" and "s" is a "stable pointer" to the Haskell being exported. The reason the function is written in machine code is: o For foreign export dynamic, the function has to be generated dynamically and neither ANSI C nor any extensions we know of let you generate C functions at runtime. The alternative of invoking the C compiler and loader at runtime is not attractive. o The code has to be placed next to a data structure in memory. The data structure has this type: struct thunk_data { struct thunk_data* next; struct thunk_data* prev; HugsStablePtr stable; char code[16]; }; The next and prev pointers are used to implement a doubly-linked list used by the garbage collector to keep track of all dynamically exported functions. The stable pointer stores a stable pointer to the Haskell function being exported. This is used by the garbage collector. The code field stores the machine code. It is expected that the size will have to be changed for other architectures. o By writing in assembly/machine code, it is possible to use the same code sequence no matter what the function type is. This works because the C calling convention on most machines has the stack looking something like this (the stack grows downwards in this picture) | ... | +--------+ | argm | +--------+ ... +--------+ | arg2 | +--------+ | arg1 | +--------+ |ret_addr| +--------+ This calling convention is more or less imposed by the need to support vararg functions in C. To implement the above function, all we need to do is adjust the stack to look like this: | ... | +--------+ | argm | +--------+ ... +--------+ | arg2 | +--------+ | arg1 | +--------+ | s | +--------+ |ret_addr| +--------+ and jump to (tailcall) the start of app. On the x86, you can do this with the following code sequence: pushl (%esp) ; move the return address "up" movl s,4(%esp) ; stick the stable pointer "under" it jmp app ; tail call app On architectures with very different architectures, you can (hopefully) get things working by passing the stable pointer in a global variable or, perhaps, a callee-saves register and tweaking the "app" function (which is generated by implementForeignExport in ffi.c) to expect "s" in that variable instead of on the stack. o It is machine code instead of assembly code because we don't want to invoke an assembler and linker/loader at runtime. Having determined which assembly code sequence to use, use "as -a" (or equivalent) to view the corresponding machine code and then write C code which will insert that code into the code field of a thunk. For the x86, the code looks like this. #if defined(__i386__) /* 3 bytes: pushl (%esp) */ *pc++ = 0xff; *pc++ = 0x34; *pc++ = 0x24; /* 8 bytes: movl s,4(%esp) */ *pc++ = 0xc7; *pc++ = 0x44; *pc++ = 0x24; *pc++ = 0x04; *((HugsStablePtr*)pc)++ = s; /* 5 bytes: jmp app */ *pc++ = 0xe9; *((int*)pc)++ = (char*)app - ((char*)&(thunk->code[16])); #else ... #endif This code contains a copy of the stable pointer because it is convenient to do this on the x86. On architectures such as the Sparc where 32-bit immediate loads are more painful, it may be easier to load the copy of the stable pointer stored in the thunk - this is stored at a fixed offset from the code. Likewise, it may be convenient to add a copy of "app" to the thunk struct.