The Binding Generator C->Haskell Manuel M. T. Chakravarty, chak@cse.unsw.edu.au v0.7, 18 June 2001 C->Haskell is an interface generator that simplifies the development of Haskell bindings to C libraries. The tool processes existing C header files that determine data layout and function signatures on the C side in conjunction with Haskell modules that specify Haskell-side type signatures and marshaling details. Hooks embedded in the Haskell code signal access to C structures and functions -- they are expanded by the interfacing tool in dependence on information from the corre- sponding C header file. Another noteworthy property is the lightweight nature of the approach. More background information is available in a research paper discussing C->Haskell, which is at <http://www.cse.unsw.edu.au/~chak/papers/papers.html#c2hs>. ______________________________________________________________________ Table of Contents 1. Installation 1.1 Where is the Source? 1.2 What Else Do I Need? 1.3 I Got Everything, and Now? 2. Usage of C->Haskell 2.1 Usage of 2.2 Compilation of a Generated Haskell API 3. Implementation of Haskell Binding Modules 3.1 Import Hooks 3.2 Context Hooks 3.3 Type Hooks 3.4 Sizeof Hooks 3.5 Enumeration Hooks 3.6 Call Hooks 3.7 Get Hooks 3.8 Set Hooks 3.9 Pointer Hooks 3.10 Grammar Rules 4. The Haskell FFI Marshalling Library 5. Bug Reports and Suggestions 6. Copyright 7. GNU Free Documentation License 8. Release Notes 8.1 Version 0.9.9 "Blue Ginger" 8.2 Version 0.8.2 "Gentle Moon" 8.3 Version 0.8.1 "Gentle Moon" 8.4 Version 0.7.10 "Afterthought" 8.5 Version 0.7.9 "Afterthought" 8.6 Version 0.7.8 8.7 Version 0.7.7 8.8 Version 0.7.6 8.9 Version 0.7.5 ______________________________________________________________________ CCooppyyrriigghhtt && DDiissttrriibbuuttiioonn Copyright (c) [1999..2001] by Manuel M. T. Chakravarty. The manual is distributed under the terms GNU Free Documentation License available from <http://www.fsf.org/copyleft/fdl.html>. The master copy of this document is at <http://www.cse.unsw.edu.au/~chak/haskell/c2hs/>; the source is in SGML, which allows you to produce a selection of standard formats, including HTML and Postscript. CCoonnttrriibbuuttiioonnss If you have any comments, suggestions, or contributions, please send them to chak@cse.unsw.edu.au. 11.. IInnssttaallllaattiioonn It follows a brief discussion of the installation from source. There is, however, a file INSTALL in the source distribution, which is more frequently updated and should be consulted in any case. 11..11.. WWhheerree iiss tthhee SSoouurrccee?? The master site of C->Haskell is at <http://www.cse.unsw.edu.au/~chak/haskell/c2hs/>. It has all the latest information and sources. Furthermore, it explains how to get anonymous CVS access to C->Haskell's repository and may have pre- compiled binaries for easier installation. 11..22.. WWhhaatt EEllssee DDoo II NNeeeedd?? You need a Haskell system supported by C->Haskell. Currently this is only the _G_l_a_s_g_o_w _H_a_s_k_e_l_l _C_o_m_p_i_l_e_r _(_G_H_C_), which you can obtain from <http://haskell.org/ghc/>. You need a fairly recent version of the Haskell compiler. C->Haskell uses a compiler support library called the _C_o_m_p_i_l_e_r _T_o_o_l_k_i_t. In the main distribution, the Compiler Toolkit is already contained in the source tar ball -- be sure to download a file named c2hs-_x._y._z.tar.gz, were _x._y._z is the version number of the package. To build the documentation, you will also need the _S_G_M_L _T_o_o_l_s, which you find at your nearest sunsite or Linux mirror or at <ftp://ftp.lip6.fr/pub/sgml-tools/>. On an up-to-date Linux system, the tools are probably already installed. 11..33.. II GGoott EEvveerryytthhiinngg,, aanndd NNooww?? The short answer is % gzip -cd c2hs.X.Y.Z.tar.gz | tar xvf - # unpack the sources % cd c2hs.X.Y.Z # change to the toplevel directory % ./configure # run the `configure' script % make # build everything [ Become root if necessary ] % make install # install the tool In the INSTALL file, there are more details. Optionally, you can build the documentation by issuing make doc. 22.. UUssaaggee ooff CC-->>HHaasskkeellll Let's have a brief look at how to call the tool and how to use the generated interfaces. 22..11.. UUssaaggee ooff cc22hhss C->Haskell is implemented by the executable c2hs. It is usually called as c2hs _l_i_b.h _L_i_b.chs where _l_i_b.h is the header file and _L_i_b.chs the Haskell binding module, which define the C- and Haskell-side interface, respectively. If no errors occur, the result is a pure Haskell module _L_i_b.hs, which implements the Haskell API of the library. The executable c2hs has a couple more options: Usage: c2hs [ option... ] header-file binding-file -C CPPOPTS --cppopts=CPPOPTS pass CPPOPTS to the C preprocessor -c CPP --cpp=CPP use executable CPP to invoke C preprocessor -d TYPE --dump=TYPE dump internal information (for debugging) -h, -? --help brief help (the present message) -i INCLUDE --include=INCLUDE include paths for .chi files -k --keep keep pre-processed C header -o FILE --output=FILE output result to FILE (should end in .hs) -v --version show version information --old-ffi[=OLDFFI] use the FFI without `Ptr a' The header file must be a C header file matching the given binding file. The dump TYPE can be trace -- trace compiler phases genbind -- trace binding generation ctrav -- trace C declaration traversal chs -- dump the binding file (adds `.dump' to the name) The most useful of these is probably --cppopts= (or -C). If the C header file needs any special options (like -D or -I) to go through the C pre-processor, here is the place to pass them. A call may look like this: c2hs --cppopts='-I/some/obscure/dir -DEXTRA' _l_i_b.h _L_i_b.chs Do not forget the quotes if you have more than one option that you want to pass to the pre-processor. Often, _l_i_b.h will not be in the current directory, but in one of the header file directories. Apart from the current directory, C->Haskell looks in two places for the header: first, in the standard include directory of the used system, this is usually /usr/include and /usr/local/include; and second, it will look in every directory that is mentioned in a -IDIR option passed to the pre-processor via --cppopts. If the compiled binding module contains import hooks, C->Haskell needs to find the .chi (C->Haskell interface files) produced while compiling the corresponding binding modules. By default, they will be searched for in the current working directory. If they are located elsewhere, the --include=INCLUDE option has to be used to indicate the location, where INCLUDE is a colon-separated list of directories. Multiple such options are admissible. Later paths are searched first. 22..22.. CCoommppiillaattiioonn ooff aa GGeenneerraatteedd HHaasskkeellll AAPPII C->Haskell comes with a marshalling library, called C2HS, which is imported by virtually all library bindings. Consequently, you will have to tell the Haskell compiler where to find the interface files when you compile a generated interface and you have to tell the linker where to find the library archive of C2HS. To simplify this usually operating and compilation system-dependent process, C->Haskell comes with a simple configuration manager, in the form of the executable c2hs-conf. It can be used to inquire information for compilation and linking and pass that information on to the Haskell compiler. The call c2hs-config --cflags returns all flags that need to be given to the Haskell compiler for compilation and c2hs-config --lib returns all flags necessary for linking. Overall, you may want to use a call like the following to compile a generated library module: ghc `c2hs-config --cflags` -c _L_i_b.hs The backquotes cause the shell to call c2hs-config and substitute the call by the flags returned. This, of course, also works in a makefile. Furthermore, c2hs-config can also be used to locate the executable of the tool itself, by calling c2hs-config --c2hs This slightly simplifies configuration management of libraries generated by C->Haskell, as it is sufficient to know the location of c2hs-config to access all other components of C->Haskell. 33.. IImmpplleemmeennttaattiioonn ooff HHaasskkeellll BBiinnddiinngg MMoodduulleess A discussion of binding modules, the principles behind the tool, and a discussion of related work can be found in a research paper located at <http://www.cse.unsw.edu.au/~chak/papers/papers.html#c2hs>. All features described in the paper, except enum define hooks are implemented in the tool, but since the publication of the paper, the tool has been extended further. Furthermore, the distribution contains examples that illustrate the use of C->Haskell. In the source distribution, these examples are located below the directories tests and examples. The latter contains a binding for the Gnome <http://www.gnome.org> HTTP 1.1 library ghttp. The sources of the marshalling library C2HS are in the directory lib and contain a fair amount of comments, which should help getting you started. Since version 0.8.1 the interface of the marshalling library C2HS changed. The new interface essentially consists of the new Haskell FFI Marshalling Library. More details about this library are provided in the next section. For backward compatibilitym the old interface (i.e., the pre-0.8.1 interface) can still be used by importing C2HSDeprecated instead of C2HS. The remainder of this section describes the hooks that are available in binding modules. 33..11.. IImmppoorrtt HHooookkss {#import [qualified] _m_o_d_i_d#} Is translated into the same syntactic form in Haskell, which implies that it may be followed by an explicit import list. Moreover, it implies that the module _m_o_d_i_d is also generated by C->Haskell and instructs the tool to read the file _m_o_d_i_d.chi. If an explicit output file name is given (--output option), this name determines the basename for the .chi file of the currently translated module. Currently, only pointer hooks generate information that is stored in a .chi file and needs to be incorporated into any client module that makes use of these pointer types. It is, however, regarded as good style to use import hooks for any module generated by C->Haskell. 33..22.. CCoonntteexxtt HHooookkss {#context [header = _h_e_a_d_e_r] [lib = _l_i_b] [prefix = _p_r_e_f_i_x]#} Context hooks define a set of global configuration options. Currently, there are three parameters all of which are strings: +o _h_e_a_d_e_r is the C header file containing the definitions, which are bound in the current binding module. +o _l_i_b is a dynamic library that contains symbols needed by the present binding. +o _p_r_e_f_i_x is an identifier prefix that may be omitted in the lexemes of identifiers referring to C definitions in any binding hook. The is useful as C libraries often use a prefix, such as gtk_, as a form of poor man's name spaces. Any occurrence of underline characters between a prefix and the main part of an identifier must also be dropped. Case is not relevant in a prefix. In case of a conflict of the abbreviation with an explicitly defined identifier, the explicit definition takes preference. All three parameters are optional. An example of a context hook is the following: {#context header = "gtkwidget.h" prefix = "gtk"#} If a binding module contains a binding hook, it must be the first hook in the module. 33..33.. TTyyppee HHooookkss {#type _i_d_e_n_t#} A type hooks maps a C type to a Haskell type. As an example, consider type GInt = {#type gint#} The type must be a defined type, primitive types, such as int, are not admissible. 33..44.. SSiizzeeooff HHooookkss {#sizeof _i_d_e_n_t#} A sizeof hooks maps a C type to its size in bytes. As an example, consider gIntSize :: IntgIntSize = {#sizeof gint#} The type must be a defined type, primitive types, such as int, are not admissible. The size of primitive types can always be obtained using Storable.sizeOf. 33..55.. EEnnuummeerraattiioonn HHooookkss {#enum _c_i_d [as _h_s_i_d] {_a_l_i_a_s_1 , ... , _a_l_i_a_s_n} [with prefix = _p_r_e_f] [deriving (_c_l_i_d_1 , ... , _c_l_i_d_n)]#} Rewrite the C enumeration called _c_i_d into a Haskell data type declaration, which is made an instance of Enum such that the ordinals match those of the enumeration values in C. This takes explicit enumeration values in the C definitions into account. If _h_s_i_d is given, this is the name of the Haskell data type. The identifiers _c_l_i_d_1 to _c_l_i_d_n are added to the deriving clause of the Haskell type. By default, the names of the C enumeration are used for the constructors in Haskell. If _a_l_i_a_s_1 is underscoreToCase, the original C names are capitalised and the use of underscores is rewritten to caps. Moreover, _a_l_i_a_s_1 to _a_l_i_a_s_n may be aliases of the form _c_i_d as _h_s_i_d, which map individual C names to Haskell names. Instead of the global prefix introduced by a context hook, a local prefix _p_r_e_f can optionally be specified. As an example, consider {#enum WindowType {underscoreToCase} deriving (Eq)#} NNoottee:: The enum define hooks described in the C->Haskell are not implemented yet. 33..66.. CCaallll HHooookkss {#call [fun] [unsafe] _c_i_d [as _h_s_i_d]#} A call hook rewrites to a call to the C function _c_i_d and also ensures that the appropriate foreign import declaration is generated. The tags fun and unsafe specify that the external function is purely functional and cannot re-enter the Haskell runtime, respectively. If _h_s_i_d is present, it is used as the identifier for the foreign declaration, which otherwise defaults to the _c_i_d. As an example, consider sin :: Float -> Float sin = {#call fun sin as "_sin"#} 33..77.. GGeett HHooookkss {#get _a_p_a_t_h#} A get hook supports accessing a member value of a C structure. The hook itself yields a function that, when given the address of a structure of the right type, performs the structure access. The member that is to be extracted is specified by the access path _a_p_a_t_h. Access paths are formed as follows (following a subset of the C expression syntax): +o The root of any access path is a simple identifier, which denotes either a type name or struct tag. +o An access path of the form *_a_p_a_t_h denotes dereferencing of the pointer yielded by accessing the access path _a_p_a_t_h. +o An access path of the form _a_p_a_t_h._c_i_d specifies that the value of the struct member called _c_i_d should be accessed. +o Finally, an access path of the form _a_p_a_t_h->_c_i_d, as in C, specifies a combination of dereferencing and member selection. For example, we may have visualGetType :: Visual -> IO VisualType visualGetType (Visual vis) = liftM cToEnum $ {#get Visual->type#} vis 33..88.. SSeett HHooookkss {#get _a_p_a_t_h#} Set hooks are formed in the same way as get hooks, but yield a function that assigns a value to a member of a C structure. These functions expect a pointer to the structure as the first and the value to be assigned as the second argument. For example, we may have {#set sockaddr_in.sin_family#} addr_in (cFromEnum AF_NET) 33..99.. PPooiinntteerr HHooookkss {#pointer [*] _c_i_d [as _h_s_i_d] [foreign | stable] [newtype | -> _h_s_i_d_2]#} A pointer hook facilitates the mapping of C to Haskell pointer types. In particular, it enables the use of ForeignPtr and StablePtr types and defines type name translations for pointers to non-basic types. In general, such a hook establishes an association between the C type _c_i_d or *_c_i_d and the Haskell type _h_s_i_d, where the latter defaults to _c_i_d if not explicitly given. The identifier _c_i_d will usually be a type name, but in the case of *_c_i_d may also be a struct, union, or enum tag. If both a type name and a tag of the same name are available, the type name takes precedence. Optionally, the Haskell representation of the pointer can be by a ForeignPtr or StablePtr instead of a plain Ptr. If the newtype tag is given, the Haskell type _h_s_i_d is defined as a newtype rather than a transparent type synonym. In case of a newtype, the type argument to the Haskell pointer type will be _h_s_i_d, which gives a cyclic definition, but the type argument is here really only used as a unique type tag. Without newtype, the default type argument is (), but another type can be specified after the symbol ->. For example, we may have {#pointer *GtkObject as Object foreign newtype#} This will generate a new type Object as follows: newtype Object = Object (ForeignPtr Object) which allows to export Object as an abstract type and facilitates type checking at call sites of imported functions using the encapsulated foreign pointer. The latter is achieved by C->Haskell as follows. The tool remembers the association of the C type *GtkObject with the Haskell type Object, and so, it generates for the C function void gtk_unref_object (GtkObject *obj); the import declaration foreign import gtk_unref_object :: Object -> IO () This function can obviously only be applied to pointers of the right type, and thus, protects against the common mistake of confusing the order of pointer arguments in function calls. However, as the Haskell FFI does not allow to return ForeignPtrs from function calls, the tool will use the type Ptr HsName in this case, where HsName is the Haskell name of the type. In the above example, that would be Ptr Object. As an example that does not represent the pointer as an abstract type, consider the C type declaration: typedef struct {int x, y;} *point; We can represent it in Haskell as data Point = Point {x :: Int, y :: Int} {#pointer point as PointPtr -> Point#} which will translate to data Point = Point {x :: Int, y :: Int} type PointPtr = Ptr Point and establish a type association between point and PointPtr. _R_e_s_t_r_i_c_t_i_o_n_: The name _c_i_d cannot be a basic C type (such as int), it must be a defined name. 33..1100.. GGrraammmmaarr RRuulleess The following grammar rules define the syntax of binding hooks: hook -> `{#' inner `#}' inner -> `import' ['qualified'] ident | `context' ctxt | `type' ident | `sizeof' ident | `enum' idalias trans [`with' prefix] [deriving] | `call' [`fun'] [`unsafe'] idalias | `get' apath | `set' apath | `pointer' ['*'] idalias ptrkind ctxt -> [`header' `=' string] [`lib' `=' string] [prefix] idalias -> ident [`as' ident] prefix -> `prefix' `=' string deriving -> `deriving' `(' ident_1 `,' ... `,' ident_n `)' apath -> ident | `*' apath | apath `.' ident | apath `->' ident trans -> `{' alias_1 `,' ... `,' alias_n `}' alias -> `underscoreToCase' | ident `as' ident ptrkind -> [`foreign' | `stable'] ['newtype' | '->' ident] 44.. TThhee HHaasskkeellll FFFFII MMaarrsshhaalllliinngg LLiibbrraarryy The Haskell FFI Marshalling Library is a proposed standard library for foreign function interoperability. The interface of the C2HS marshalling library as of version 0.8.1 of the tool is a slight extension of the Haskell FFI Marshalling Library, which is documented in the following. The library is partitioned into a language independent and a C specific component. All features of the former are available from the module Foreign and all features of the later from CForeign. Nevertheless, the following module hierarchy is part of the interface definition: +o Foreign +o Int +o Word +o Ptr +o ForeignPtr +o StablePtr +o Storable +o MarshalAlloc +o MarshalArray +o MarshalError +o MarshalUtils +o CForeign +o CTypes +o CTypesISO +o CError +o CString It is recommended to access this functionality in C->Haskell binding modules by merely importing C2HS. 55.. BBuugg RReeppoorrttss aanndd SSuuggggeessttiioonnss Please address any bug reports and suggestions to chak@cse.unsw.edu.au. A good bug report contains information on the used operating system and Haskell compiler as well as the version of C->Haskell that you have been using. You can obtain the version information by running c2hs-config --version. If possible a concise example illustrating your problem would be appreciated. 66.. CCooppyyrriigghhtt C->Haskell is Copyright (C) [1999..2001] Manuel M. T. Chakravarty This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. This manual is Copyright (c) [2000..2001] by Manuel M. T. Chakravarty. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with the no Back- Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". 77.. GGNNUU FFrreeee DDooccuummeennttaattiioonn LLiicceennssee The GNU Free Documentation License is available at <http://www.fsf.org/copyleft/fdl.html>. 88.. RReelleeaassee NNootteess Important changes (especially those affecting the semantics of the tool) are documented in the following. 88..11.. VVeerrssiioonn 00..99..99 ""BBlluuee GGiinnggeerr"" +o Bug fixes +o Library names in foreign imports have been removed until the convention of the new FFI is implemented (they are currently _s_i_l_e_n_t_l_y omitted) +o Added sizeof hooks; sizeof of type names is now also supported in constant expressions +o Local prefix for enum hooks; courtesy of Armin Sander +o Added import hooks +o The documentation includes a description of binding hooks +o Added pointer hooks, which were derived from code for a similar feature by Axel Simon; this includes proper treatment of parametrised pointers +o Integrated deriving option for enum hooks, which was contributed by Axel Simon +o Adapted to GHC 5.0 88..22.. VVeerrssiioonn 00..88..22 ""GGeennttllee MMoooonn"" +o Adaptation layer for legacy StablePtr interface +o Forgot to export FunPtr and associated functions from C2HS +o Forgot to export some names in C2HSDeprecated +o Added support for gcc's __builtin_va_list 88..33.. VVeerrssiioonn 00..88..11 ""GGeennttllee MMoooonn"" +o Library adapted to New FFI; the old interface can still be used by importing C2HSDeprecated +o FFI Library specification added to the documentation 88..44.. VVeerrssiioonn 00..77..1100 ""AAfftteerrtthhoouugghhtt"" +o CygWin support; based on suggestions by Anibal Maffioletti Rodrigues de DEUS <anibaldedeus@email.com> +o IntConv instances for Int8, Word8, and Char 88..55.. VVeerrssiioonn 00..77..99 ""AAfftteerrtthhoouugghhtt"" +o Debugged the stripping of prefixes from enumerators; prefixes are now generally stripped, independent of whether they can be stripped from all enumerators of a given enumeration type +o Comma now correctly required after underscoreToCase. WWAARRNNIINNGG:: TThhiiss bbrreeaakkss ssoouurrccee ccoommppaattiibbiilliittyy wwiitthh pprreevviioouuss vveerrssiioonnss.. 88..66.. VVeerrssiioonn 00..77..88 +o Provisional support for GHC 4.08 +o Corrected constant folding 88..77.. VVeerrssiioonn 00..77..77 Ignores any occurrence of #pragma. 88..88.. VVeerrssiioonn 00..77..66 Bug fixes and support for long long. 88..99.. VVeerrssiioonn 00..77..55 This is mainly a bug fix release. In particular, the space behaviour of C->Haskell has been significantly improved. IMPORTANT NOTE: From this release on, library names in lib tags in context hooks should _n_o_t contain a suffix (i.e., omit .so etc).