<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Poly/ML Interface to the C Programming Language</title> </head> <body> <h1>Poly/ML Interface to the C Programming Language</h1> <h2>Nick Chapman June 6, 1994</h2> <ol> <li><a href="CInterface.html#1 Introduction">Introduction</a></li> <li><a href="CInterface.html#2 Dynamic Libraries">Dynamic Libraries</a></li> <li><a href="CInterface.html#3 Creating a Dynamic Library">Creating a Dynamic Library</a></li> <li><a href="CInterface.html#4 Calling Simple C-functions">Calling Simple C-functions</a></li> <li><a href="CInterface.html#5 Calln functions">A family of <tt>call</tt><i>n</i> functions</a></li> <li><a href="CInterface.html#6 Predefined Conversions">Predefined <tt>Conversion</tt>s</a></li> <li><a href="CInterface.html#7 Volatile Types">Volatile Types: <tt>vol</tt>, <tt>sym</tt> and <tt>dylib</tt>.</a></li> <li><a href="CInterface.html#8 Calling C-functions with return-parameters">Calling C-functions with <em>return-parameters</em></a></li> <li><a href="CInterface.html#9 A family of callnretr functions">A family of <tt>call</tt><i>n</i><tt>ret</tt><i>r</i> functions</a></li> <li><a href="CInterface.html#10 C structures">C structures</a></li> <li><a href="CInterface.html#11 A family of structn Conversionals">A family of <tt>struct</tt><i>n</i> Conversionals</a></li> <li><a href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Lower Level Calling Mechanism: <tt>call_sym</tt></a></li> <li><a href="CInterface.html#13 Creating New Conversions">Creating New <tt>Conversion</tt>s</a></li> <li><a href="CInterface.html#14 Enumerated Types">Enumerated Types</a></li> <li><a href="CInterface.html#15 C Programming Primitives">C Programming Primitives</a></li> <li><a href="CInterface.html#16 Example: Quicksort">Example: Quicksort</a></li> <li><a href="CInterface.html#17 Volatile Implementation">Volatile Implementation</a></li> </ol> <h2><a name="1 Introduction">1 Introduction</a></h2> <p>It is now possible for Poly/ML to call functions which have been written in the C programming language. These functions are accessed from a dynamic library, and so don't have to be statically linked into the Poly/ML runtime system. The C interface is contained in the structure <b><tt>CInterface</tt></b>, which is built into every ML database. The facilities available allow dynamic libraries to be loaded and for symbols to be extracted from these libraries. symbols which represent C-functions can be executed.</p> <p>The arguments to a C-function need to be in a format which the C-function can understand. Similarly, the return value from a C-function will be in a standard C format. All such C-values are represented in ML using the abstract type <b><tt>vol</tt></b>. Values of this type are volatile because they do not persist from one ML session to the next. There are facilities to convert between ML-values and <b><tt>vol</tt></b>s, together with a collection of 'C-programming' primitives to manipulate vols.</p> <h2><a name="2 Dynamic Libraries">2 <b>Dynamic Libraries</b></a></h2> <p><b><tt>exception Foreign of string<br> val load_lib : string -> dylib<br> val load_sym : dylib -> string -> sym<br> val get_sym : string -> string -> sym</tt></b></p> <p>The function <b><tt>load_lib</tt> </b>takes an ML string containing the pathname of a dynamic library. This should preferably be a full pathname. If it is a relative pathname it will be interpreted with respect to the directory in which the ML session was started from. The return value is a <b><tt>dylib</tt></b> representing the dynamic library. If the dynamic library cannot be found, the exception <b><tt>Foreign</tt></b> is raised with a string describing the problem.</p> <p><i>If the file named by the filename exists but is not in the correct format for a dynamic library, the underlying C-function</i> <b><tt>dlopen</tt></b> <i>prints an error message and then kills the ML session. So far, I have been unable to catch this error.</i></p> <p>Once a library has been opened, a symbol may be extracted from the library with the function <b><tt>load_sym</tt></b>. This takes a <b><tt>dylib</tt></b> representing the dynamic library and an ML string naming the symbol. The return value is a <b><tt>sym</tt></b> representing the symbol. If the symbol is not contained in the dynamic library, the exception <b><tt>Foreign</tt></b> is raised with a string describing the problem.</p> <p>Often the return value of the function <b><tt>load_lib</tt></b> is passed directly to the function <b><tt>load_sym</tt></b> . This combination is captured by the function <b><tt>get_sym</tt></b>, which takes two strings naming the dynamic library and the symbol, and returns the <b><tt>sym</tt> </b>representing the symbol, or raises the exception <b><tt>Foreign</tt></b>.</p> <p><b><tt>fun get_sym lib sym = load_sym (load_lib lib) sym;</tt></b></p> <p>Values of type <b><tt>dylib</tt> </b>and <b><tt>sym</tt> </b>share the volatile nature of <b><tt>vol</tt> </b>; they do not persist from one ML session to the next. This is explained in more detail in <a href="CInterface.html#7 Volatile Types">Section 7</a>.</p> <h2><a name="3 Creating a Dynamic Library">3 Creating a Dynamic Library</a></h2> <p>Suppose we have written a C-function called <b><tt>difference</tt></b>, which computes the difference of two integers. The function is contained in a file named <b><tt>sample. c</tt></b>.</p> <p><tt><strong>int difference (int x, int y) {<br> return x > y ? x - y : y - x;<br> }</strong></tt></p> <p>To create a dynamic library containing this function we carry out the following steps at the shell prompt:</p> <p><tt><b>Pinky$ gcc -c sample.c -o sample.o<br> Pinky$ ld -o sample.so sample.o</b></tt></p> <p>These steps create a dynamic library named <b><tt>sample.so</tt></b>. Often many symbols will be retrieved from the same dynamic library, and so it is useful to partially apply the function <b><tt>get_sym</tt></b> to the name of the common library. Most of the examples in this document use symbols retrieved from the library <b><tt>samples.so</tt></b>.</p> <p><tt><strong>val get = get_sym "sample.so";</strong></tt></p> <h2><a name="4 Calling Simple C-functions">4 Calling Simple C-functions</a></h2> <p>To call the C-function <b><tt>difference</tt></b> we use the function <b><tt>call2</tt></b> from the structure <b>CInterface. </b>This function allows us to call C-functions that take two arguments:</p> <p><tt><b>val call2 : sym</b> -> <b>'a Conversion * 'b Conversion</b> -> <b>'c Conversion<br> -> 'a</b> <b> * 'b</b> -> <b> 'c</b></tt></p> <p>The first parameter of <b><tt>call2</tt></b> is the <b><tt>sym</tt></b> representing the symbol that we wish to call. This is usually obtained from a call to <b><tt>get_sym</tt></b>. The second parameter is a pair of <b><tt>Conversions</tt></b> describing the two arguments to the C-function; the third parameter is a <b><tt>Conversion</tt></b> describing the return value of the C-function. The fourth parameter is a pair containing the actual arguments to be passed to the C-function. Notice how the type of each argument matches the type variable contained in the corresponding <b><tt>Conversion</tt></b> parameter.</p> <p>The purpose of a <b><tt>Conversion</tt></b> is twofold. Firstly, it specifies the C-type required by the C-function. This needs to be known at the lowest level so that the correct argument passing and return conventions can be used when calling the C-function. Secondly, the <b><tt>Conversion</tt></b> performs the conversion between a C-value (in this case a C integer) and an ML-value. The conversion necessary to call the example C-function <b><tt>difference</tt></b> is <b><tt>INT</tt></b> which has type <b><tt>int Conversion</tt> </b>.We can now define an ML function as a wrapper around the underlying C-function.</p> <p><tt><strong>val diff = call2 (get "difference") (INT,INT) INT;</strong></tt></p> <p>Because the Conversion <b><tt>INT</tt></b> has type <b><tt>int Conversion</tt></b>, the type of <b><tt>diff</tt></b> is constrained to being<b><tt> int->int->int</tt></b> - which is just what we require. We can now apply the ML function, for example: <b><tt>(diff (13,50))</tt></b>, which evaluates to <b><tt>37</tt></b>.</p> <h2><a name="5 Calln functions">5 A family</a> of <tt>call</tt><i>n</i> functions</h2> <p>There is a family of <tt><b>call</b></tt><i>n</i> functions from <b><tt>call0</tt></b> to <b><tt>call9</tt></b>.</p> <p><tt><strong>val calln :<br> sym -> 'a<small><small>1</small></small> Conversion * ... * 'a<small><small>n</small></small> Conversion<br> -> 'b Conversion<br> -> 'a<small><small>1</small></small> * ... * 'a<small><small>n</small></small> -> 'b </strong></tt></p> <p>We need a collection of functions because we cannot give a legal ML type to a function which takes a list of <b><tt>Conversion</tt></b>s without forcing them all to have the same type parameter. C-functions with more than nine parameters can still be called, but the lower level calling mechanism must be used, see <a href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Section 12</a>.</p> <h2><a name="6 Predefined Conversions">6 Predefined</a> <tt>Conversion</tt>s</h2> <p>In the structure <b><tt>CInterface</tt></b>, there are various predefined <b><tt>Conversion</tt></b>s. The name of each <b><tt>Conversion</tt></b> indicates the C-type required/returned, whereas the ML type of the <b><tt>Conversion</tt></b> constrains the resulting type when the <b><tt>Conversion</tt> </b>is used as an argument to a <b><tt>call</tt></b>n function.</p> <p><tt><strong>val CHAR: char Conversion<br> val DOUBLE : real Conversion<br> val FLOAT : real Conversion<br> val INT : int Conversion<br> val LONG : int Conversion<br> val SHORT : int Conversion<br> val STRING :string Conversion<br> val VOID : unit Conversion<br> val BOOL : bool Conversion<br> val POINTER :vol Conversion</strong></tt></p> <p>The <b><tt>Conversions CHAR, DOUBLE, FLOAT, INT, LONG</tt> </b>and <b><tt>SHORT</tt> </b>are primitive in the sense that they convert between small fixed-size C types.</p> <p>The <b><tt>Conversion STRING</tt></b> converts between an ML string and a C pointer; the pointer points at a null terminated array of characters. This <b><tt>Conversion</tt></b> is built out of the <b><tt>CHAR Conversion</tt></b> and the C programming primitives, see <a href="CInterface.html#15 C Programming Primitives">Section 15</a>.</p> <p>The <b><tt>Conversion VOID</tt></b> is really a one way <b><tt>Conversion</tt></b> intended for the result of C-functions that return <b><tt>void</tt></b>. Attempts to use this <b><tt>Conversion</tt></b> the other way around raise the exception <b><tt>Foreig</tt>n</b> with an appropriate message.</p> <p>The <b><tt>Conversion BOOL</tt></b> is build on top of the <b><tt>Conversion INT</tt></b>. It converts between an ML <b><tt>bool</tt></b> and a C integer.</p> <p>The <b><tt>Conversion POINTER</tt></b> is basically the identity <b><tt>Conversion</tt></b>. No conversion is performed and the underlying <b><tt>vol</tt></b> becomes accessible.</p> <h2><a name="7 Volatile Types">7 Volatile Types</a>: <tt>vol</tt>, <tt>sym</tt> and <tt>dylib</tt>.</h2> <p>There is a problem with the definition of the ML-function <b><tt>diff</tt></b> given above. The call to <b><tt>get_sym</tt></b> (within the partial application <b><tt>get</tt></b>) returns a value of type <b><tt>sym</tt></b> which like values of type <b><tt>vol</tt></b> does not persist from one ML session to the next. If after the definition of <b><tt>diff</tt></b> we were to commit the database and leave the ML session, we would find that on restarting the ML session, the function <b><tt>diff</tt></b> no longer operates as expected, but instead causes the exception <b><tt>Foreign</tt></b> to be raised:</p> <p><tt><strong>> commit();<br> > diff (13,50);<br> val it = 3<br> > quit();<br> Pinky$ ml<br> > diff (13,50);<br> Exception- Foreign "Invalid volatile" raised</strong></tt></p> <p>One solution is to redefine the ML function <b><tt>diff</tt></b> as:</p> <p><strong><tt>fun diff args =<br> cal12 (get "difference") (INT,INT) INT args;</tt></strong></p> <p>The new version of <b><tt>diff</tt></b> is very similar to the old version, except that the subexpression <b><tt>get "difference"</tt></b> will be executed every time the function is applied to the tuple of arguments, instead of just once. This causes the library and symbol to be reloaded on every invocation of the function <b><tt>diff</tt></b> ensuring that the <b><tt>vol</tt></b> is valid. Efficiency wise this is not as horrific as it sounds. The underlying dynamic library manipulation functions appear to cache what has already been loaded, and so do little work on a subsequent calls to load the same library or symbol.</p> <h2><a name="8 Calling C-functions with return-parameters">8 Calling C-functions with <em>return-parameters</em></a></h2> <p>Although C is strictly a <i>call-by-value</i> language, <i>call-by-reference</i> is often simulated with the use of parameters of a pointer type. When a function is called with a parameter that has a pointer type, the called function can then modify the value pointed at by the pointer. For example, the C-function below <b><tt>diff_sum</tt></b> computes both the difference and the sum of two integers. The function has four parameters-two input parameters and two return-parameters.</p> <p><tt><strong>void diff_sum (int x, int y, int *diff, int *sum) {<br> *diff = x > y ? x - y : y - x;<br> *sum = x+y;<br> }</strong></tt></p> <p>With C, this function would be invoked with something like:</p> <p><tt><strong>{<br> int diff,sum;<br> diff_sum(x,y,&diff,&sum);<br> }</strong></tt></p> <p>To call the C-function <b><tt>diff_sum</tt></b> from ML we use the function <b><tt>call4ret2</tt></b>. This allows us to call C-functions that have four parameters, the last two being return-parameters.</p> <p><tt><strong>val call4ret2 : sym<br> -> 'a Conversion * 'b Conversion -> 'c Conversion * 'd Conversion<br> -> 'a * 'b -> 'c * 'd</strong></tt></p> <p>Now we can write an ML wrapper function:</p> <p><strong><tt>fun diff_sum x y =<br> call4ret2 (get "diff_sum") (INT,INT) (INT,INT) (x,y);</tt></strong></p> <p>Evaluating <b><tt>(diff _sum 13 50)</tt></b> results in <b><tt>(37,63)</tt></b>.</p> <h2><a name="9 A family of callnretr functions">9 A family of <tt>call</tt><i>n</i><tt>ret</tt><i>r</i> functions</a></h2> <p>There is a limited family of <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i> </b>functions defined to call C~functions that have<i> n - r input-parameters</i> followed by<i> r return-parameters</i>. This family contains functions for n ranging from 1 to 5, with r as either 1 or 2. (Exception: there is no <b><tt>call1ret2</tt></b> because this makes no sense.)</p> <p><tt><b>val call1ret1 : sym -> unit -> 'a Conversion -> unit -> 'a<br> val call<em>n</em>ret<em>r</em> :<br> sym -> 'a<small>1</small> Conversion * ... * 'a<small>n-r</small> Conversion<br> -> 'a<small>n-r+1</small> Conversion * ... * 'a<small>n</small> Conversion<br> -> 'a<small>1</small> * ... *'a<small>n-r</small> -> 'a<small>n-r+1</small> * ... 'a<small>n</small></b></tt></p> <p>For other combinations of n and r; requiring a non-final parameter in the parameter list to be a return-parameter; or requiring the actual return result together with the use of return parameters, the lower level calling mechanism can be used (<a href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Section 12</a>).</p> <h2><a name="10 C structures">10 C structures</a></h2> <p>C functions may be called which take/return C structure values. For example, the following piece of C defines a <b><tt>typedef</tt></b>ed structure called <b><tt>Point</tt></b>, and a function which manipulates these <b><tt>Points</tt></b> called <b><tt>addPoint</tt></b>.</p> <p><b><tt>typedef struct {int x; int y;} Point;</tt></b></p> <p><b><tt>Point addPoint (Point p1, Point p2) {<br> p1.x += p2.x;<br> p1.y += p2.y;<br> return p1;<br> }</tt></b></p> <p>To create the necessary <b><tt>Conversion</tt></b> for <b><tt>Points</tt></b> we can use the <b><tt>Conversional</tt></b>, <b><tt>STRUCT2</tt></b>. This function takes a pair of <b><tt>Conversion</tt></b>s and returns a new <b><tt>Conversion</tt></b> suitable for a C structure containing those types. The type of <b><tt>STRUCT2</tt></b> is:</p> <p><b>v<tt>al STRUCT2 : 'a Conversion * 'b Conversion -> ('a * 'b) Conversion</tt></b></p> <p>We now define an ML wrapper function for <b><tt>addPoint</tt></b>:</p> <p><tt><strong>val POINT = STRUCT2 (INT,INT);<br> fun addPoint p1 p2 =<br> cal12 (get "addPoint") (POINT,POINT) POINT (p1, p2);</strong></tt></p> <p>Now, <b><tt>(addPoint (5, 6) (8,9))</tt></b> evaluates to <b><tt>(13, 15)</tt></b>.</p> <h2><a name="11 A family of structn Conversionals">11 A family of <tt>struct</tt><i>n</i> Conversionals</a></h2> <p>There is a family of <b><tt>struct</tt></b><i>n</i> functions from <b><tt>struct2</tt></b>to <b><tt>struct9</tt></b>.</p> <p><tt><strong>val structn : 'a<small>1</small> Conversion * ... * 'a<small>n</small> Conversion<br> -> ('a<small>1</small> *... * 'a<small>n</small>) Conversion</strong></tt></p> <p>Manipulation of structures with more than nine components can be achieved with the use of the lower level calling mechanism, <a href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">see Section 12</a>.</p> <h2><a name="12 Lower Level Calling Mechanism: call_sym">12 Lower Level Calling Mechanism: <tt>call_sym</tt></a></h2> <p>Occasionally it is necessary to access the dynamic calling mechanism at a lower level. The collection of functions <b><tt>call</tt></b><i>n</i> and <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i></b> are all defined in terms of the function <b><tt>call_sym</tt></b>, which has the following type:</p> <p><b><tt>val call_sym : sym -> (Ctype * vol) list -> Ctype -> vol</tt></b></p> <p>The second argument to <b><tt>call_sym</tt></b> is a list of <b><tt>Ctype/vol</tt></b> pairs, which allows C-functions of any number of arguments to be called. This function is more cumbersome to use than the <b><tt>call</tt><i>n</i></b> and <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i></b> functions because the two stages of; specification of the C-type, and conversion between ML-values and C-values <b>(vols) </b>have been separated. The specification of the C-type is achieved by using a constructor of the datatype <b><tt>Ctype</tt></b>:</p> <p><tt><strong>datatype Ctype =<br> Cchar | Cdouble | Cfloat | Cint | Clong | Cshort | Cvoid<br> | Cpointer of Ctype<br> | Cstruct of Ctype list<br> | Cfunction of Ctype list * Ctype</strong></tt></p> <p>The following collection of functions is used to convert from and to values of type <b><tt>vol</tt></b>.</p> <p><tt><b>val</b> <b>fromCstring : vol ->string<br> val</b> <b>fromCchar : vol ->char<br> val</b> <b>fromCdouble : vol ->real<br> val</b> <b>fromCfloat : vol ->real<br> val</b> <b>fromCint :</b> <b>vol ->int<br> val</b> <b>fromClong : vol ->int<br> val</b> <b>fromCshort : vol ->int<br> val</b> <b>toCstring : string -></b> <b>vol<br> val</b> <b>toCchar : char -> vol<br> val</b> <b>toCdouble : real ->vol<br> val</b> <b>toCfloat :</b> <b>real ->vol<br> val</b> <b>toCint : int ->vol<br> val</b> <b>toClong :</b> <b>int ->vol<br> val</b> <b>toCshort :</b> <b>int ->vol</b></tt></p> <p>For example, this is how to define <b><tt>diff</tt></b> directly in terms of <b><tt>call_sym</tt></b>.</p> <p><tt><strong>fun diff x y =<br> fromCint (call_sym (get "difference")<br> [(Cint, toCint x),(Cint, toCint y)] Cint)</strong></tt></p> <p>Manipulation of C structures is achieved with the following two functions:</p> <p><tt><b>val make_struct</b> : <b>(Ctype * vol) list</b> -> <b>vol <br> val break_struct</b> : <b>Ctype list -> vol</b> -> <b>vol list</b></tt></p> <h2><a name="13 Creating New Conversions">13 Creating New <tt>Conversion</tt>s</a></h2> <p>Recall a <b><tt>Conversion</tt></b> encapsulates three things: an underlying C-type; a function to convert from the C-value (of type <b><tt>vol</tt></b>) to an ML value of a given type; a function which converts from the ML value back into the C-value (of type <b>vol). </b>Sometimes it is useful to be able to create new <b><tt>Conversions</tt></b>, or to retrieve the components from an existing <b><tt>Conversion</tt></b>.</p> <p><tt><b>val mkConversion</b> : <b>(vol -> 'a) -> ('a -> vol) -> Ctype</b> -> <b>'a Conversion <br> val breakConversion</b> : <b>'a Conversion -> (vol -> 'a) * ('a</b> -> <b>vol) * Ctype</b></tt></p> <p>The function <b><tt>mkConversion</tt></b> creates a new <b><tt>Conversion</tt></b> from its three components. The function <b><tt>breakConversion</tt></b> takes an existing <b><tt>Conversion</tt></b> and returns a triple containing the components. For example, the standard conversion <b><tt>INT</tt></b> might be defined as:</p> <p><strong><tt>val INT = mkConversion fromCint toCint Cint</tt></strong></p> <p>A good reason for creating a new <b><tt>Conversion</tt></b> is to give a different ML type to values of type <b><tt>vol</tt></b> which are to be used in a particular way. For example, we may be interfacing to a collection of C-functions that take/return pointers which are being used to implement a particular abstract type, for example a tree node. By creating a new conversion we can use the ML type system to avoid mixing values of this new type with other normal <b><tt>vol</tt></b>s.</p> <p><strong><tt>abstype node = Node of vol<br> with val NODE = mkConversion Node (fn (Node n) => n) (Cpointer Cvoid)<br> end</tt></strong></p> <p><strong><tt>fun lookupNode s = call1 (get "lookupNode") STRING NODE s<br> fun printNode n = call1 (get "printNode") NODE VOID n</tt></strong></p> <p>The types of these two functions are:</p> <p><tt><b>val lookupNode</b> : <b>string -> node<br> val printNode</b> : <b>node -> unit</b></tt></p> <h2><a name="14 Enumerated Types">14 Enumerated Types</a></h2> <p>Another reason for creating a new <b>Conversion</b> is for when we want to call a C-function that takes/returns values of an enumerated type. For example, suppose <b>colour</b> is declared as:</p> <p><tt><strong>typedef enum {<br> white,<br> red = 5,<br> green,<br> blue,<br> /* leave room for extra colours in the future */<br> black = 100<br> } colour;</strong></tt></p> <p>This example shows that C enumerations are just sugar for integers, so much so, we can even specify which constructors correspond to which integer values. When an enumeration is declared that specifies integer values for just some constructors, (as in <b><tt>colour</tt></b> above): if the first constructor is unspecified, it is assigned 0; successive unspecified constructors are assigned successive integer values, e.g. <b><tt>green</tt></b> is 6.</p> <p>We would like to convert C-enumerations like <b><tt>colour</tt></b> into an equivalent ML datatype, together with functions to convert between values of the datatype and ML integers. This can be achieved automatically by using the script <b><tt>proc-enums</tt></b>, contained in the scripts subdirectory of the source tree.</p> <p><tt><strong>Usage: proc-enums <struct-name> {<filename>}+</strong></tt></p> <p>The first parameter to <b><tt>proc-enums</tt></b> is the name of the generated ML structure. The remaining parameters specify C-files in which to search for C <b><tt>typedef</tt></b>ed enumeration declarations. No formatting conventions are assumed, i.e. arbitrary white space and comments are allowed within the declaration. Other declarations and definitions are ignored. The generated file is named <b><tt><struct-name>.ML</tt></b>.</p> <p>For the colour example, we would type <b><tt>'proc-enums colour colour.h'</tt></b> at the shell prompt. This would generate a file <b><tt>colour.ML</tt></b> containing the following ML definitions.</p> <p><strong><tt>structure colour = struct</tt></strong></p> <p><strong><tt>datatype colour<br> = white<br> | red<br> | green<br> | blue<br> | black</tt></strong></p> <p><strong><tt>exception Int2colour</tt></strong></p> <p><strong><tt>fun int2colour i = case i of <br> 0 => white<br> | 5 => red<br> | 6 => green<br> | 7 => blue<br> | 100 => black<br> | _ => raise Int2colour</tt></strong></p> <p><strong><tt>fun colour2int i = case i of <br> white => 0<br> | red => 5<br> | green =<br> | blue => 7<br> | black => 100</tt></strong></p> <p><strong><tt>end (* struct *)</tt></strong></p> <p>Once these definitions have been generated we can create a new <b>Conversion:</b></p> <p><strong><tt>val COLOUR =<br> mkConversion (int2colour o fromCint) (toCint o colour2int) Cint;</tt></strong></p> <p>Now, suppose we have a C-function <b><tt>nameOfColour</tt></b>,</p> <p><tt><strong>#include "colour.h"<br> char* nameOfColour (colour c) {<br> switch (c) {<br> case white: return"white";<br> case red: return"red";<br> case green: return"green";<br> case blue: return"blue";<br> case black: return"black";<br> default: return"Error: No such colour";<br> }<br> }</strong></tt></p> <p>we can write a ML wrapper for this function as:</p> <p><tt><strong>fun nameOfColour c =<br> call1 (get "nameOfColour") COLOUR STRING c;</strong></tt></p> <p>Now we can execute, <b><tt>(nameOfColour blue)</tt></b>, which evaluates to the ML string <b><tt>"blue"</tt></b>.</p> <h2><a name="15 C Programming Primitives">15 C Programming Primitives</a></h2> <p>Occasionally, we need to manipulate C-values in greater detail. The following example shows how an ML wrapper can be written for the C-function <b><tt>diff _sum</tt></b>, without using a <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i> </b>function.</p> <p><tt><strong>fun diff_sum x y =<br> let val diff = alloc 1 Cint<br> val sum = alloc 1 Cint<br> in<br> cal14 (get "diff_sum") (INT,INT,POINTER,POINTER) VOID<br> (x, y, address diff, address sum);<br> (fromCint diff, fromCint sum)<br> end</strong></tt></p> <p>This example uses two of a collection of six ML functions allowing basic C-programming.</p> <p><tt><strong>val sizeof : Ctype -> int<br> val alloc : int -> Ctype -> vol<br> val address : vol -> vol<br> val deref : vol -> vol<br> val assign : Ctype -> vol -> vol -> unit<br> val offset : int -> Ctype -> vol -> vol</strong></tt></p> <p><i>These functions are intrinsically unsafe-incorrect usage can cause the ML session to die.</i></p> <p>The application <b><tt>(sizeof</tt></b><i> t</i><b><tt>)</tt></b> returns the size (in bytes) of the <b><tt>Ctype</tt></b><i> t</i>.</p> <p>The application <b><tt>(alloc</tt> </b><i>n t</i><b><tt>)</tt></b> returns a <b><tt>vol</tt> </b>encapsulating some freshly allocated memory of size <b><tt>(</tt></b><i>n</i>*<b><tt>sizeof</tt></b> t<b><tt>)</tt></b> bytes. Unlike allocation facilities in C which return a pointer to the newly allocated space,the result of <b><tt>alloc</tt></b> encapsulates the space directly.</p> <p><i>The underlying implementation of</i><b><tt> alloc</tt></b><i> does in fact use</i> <b>malloc </b><i>to gain some newly allocated space, and does in fact consist of a pointer to this space. However, all the above ML functions work at an extra level of indirection to the corresponding C-operation. This extra indirection is removed before the C-value is passed to a real C-function.</i></p> <p>The application <b><tt>(address</tt></b> <i>v</i><b><tt>)</tt></b> returns a new <b><tt>vol</tt> </b>containing the address of <i>v</i>. This function corresponds to the C operator <b><tt>&</tt></b>.</p> <p>The application <b><tt>(deref</tt></b> <i>v</i><b><tt>)</tt></b> returns a <b><tt>vol</tt></b> which is the result of dereferencing the address contained in <i>v</i>. This function corresponds to the C operator <b><tt>*</tt></b>. If <i>v</i> is not a valid address, the ML session will die with a segmentation error.</p> <p>The application <b><tt>(assign</tt></b><i> t v w</i><b><tt>)</tt></b> copies <b><tt>(sizeof</tt></b> <i>t</i><b><tt>)</tt></b> bytes of data from <i>w</i> into <i>v</i>. This function corresponds to the C operator <b><tt>=</tt></b>, or the standard C function <b><b><tt>memcpy</tt></b></b>.</p> <p>The application <b><tt>(offset</tt></b><i> i t v</i><b><tt>)</tt></b> returns a new <b><tt>vol</tt> </b>that is offset <b><tt>(</tt>i</b>*<b><tt>sizeof</tt></b><i> t</i><b><tt>) </tt></b>bytes in memory from <i>v</i>. The closest corresponding operator in C is structure dereferencing <tt>(.)</tt>. Pointer arithmetic can be achieved by combining the function <b><tt>offset</tt></b> with the functions <b><tt>address</tt></b> and <b>d<tt>eref</tt></b>.</p> <p>The functions <b><tt>address</tt></b> and <b><tt>deref</tt></b> create the same aliasing as the corresponding C operators. For example, the following sequence of C statements causes the final value of <b><tt>i</tt> </b>to be 123:</p> <p><tt><strong>{<br> int i = 0;<br> int *p = &i;<br> *p = 123;<br> }</strong></tt></p> <p>Likewise, the following sequence of ML statements:</p> <p><tt><strong>> val i = toCint 0;<br> > val p = address i;<br> > assign Cint (deref p) (toCint 123);<br> > fromCint i;<br> val it = 123</strong></tt></p> <h2><a name="16 Example: Quicksort">16 Example: Quicksort</a></h2> <p>The following example shows how the C-programming primitives are intended to be used. The example involves interfacing to the standard C-function <b>qsort</b>. On many Unix systems this function can be retrieved from a dynamic library in <b><tt>/usr/lib</tt></b>.</p> <p><strong><tt>val getC = get_sym "/usr/lib/libc.so.1.7";</tt></strong></p> <p>The function <b><tt>qsort</tt></b> takes four parameters.</p> <p><strong><tt>void qsort (void *base, int nel, int width, int (*compar)());</tt></strong></p> <p>The first parameter, <b><tt>base</tt></b>, is a pointer to an array of elements to be sorted; the second parameter, <b><tt>nel</tt></b>, is the number of elements in the array; the third parameter, <b><tt>width</tt></b>, is the size (in bytes) of each element; the fourth parameter, <b><tt>compar</tt></b> is a comparison function which must return an integer less than, equal to, or greater than zero. See the <b><tt>qsort</tt></b> manual page for more details.</p> <p>In our example we wish to sort pairs of strings. The first string is the key to be sorted, while the second string is arbitrary data. In C we would represent this pair as a structure, and would write the comparison function <b><tt>compare</tt></b> using <b><tt>strcmp</tt></b>.</p> <p><strong><tt>typedef struct {<br> char *key;<br> char *data;<br> } pair;</tt></strong></p> <p><strong><tt>int compare (pair x, pair y) {<br> return strcmp(x.key, y.key);<br> }</tt></strong></p> <p>We want to define an ML wrapper <b><tt>qsort</tt></b> which takes a list of string pairs and returns the sorted list. Other than the C-programming primitives, the only additional function needed is <b><tt>volOfSym</tt></b>. This is needed to supply the fourth argument to <b><tt>qsort</tt></b>, a pointer to a comparison function. The application <b><tt>(volOfSym</tt></b> <i>s</i><b><tt>)</tt></b> returns the <b><tt>vol</tt></b> encapsulated in the symbol <i>s</i>.</p> <p><strong><tt>val volOfSym : sym -> vol</tt></strong></p> <p>We can now defined <b><tt>qsort</tt></b>, together with two auxiliary function <b><tt>fill</tt></b> and <b><tt>read</tt></b>.</p> <p><strong><tt>val (fromPair,toPair,pairType) = breakConversion (STRUCT2 (STRING,STRING));</tt></strong></p> <p><strong><tt>fun fill p [] = ()<br> | fill p ((key,data)::xs) =<br> (assign pairType p (toPair (key,data)); <br> fill (offset 1 pairType p) xs)</tt></strong></p> <p><strong><tt>fun read p 0 = []<br> | read p n = fromPair p :: read (offset 1 pairType p) (n-1)</tt></strong></p> <p><strong><tt>fun qsort xs =<br> let<br> val len = length xs<br> val table = alloc len pairType<br> val compare = volOfSym (get "compare")<br> val sort = ca114 (getc "qsort") (POINTER,INT,INT,POINTER) VOID<br> in<br> fill table xs;<br> sort (address table, len, sizeof pairType, compare);<br> read table len<br> end</tt></strong></p> <p>The function <b><tt>fill</tt></b> takes a pointer into some allocated space (which must be big enough), and a string pair list. It fills the array with structures created from the list. The function <b><tt>offset</tt></b> is used to move along the allocated area.</p> <p>The function <b><tt>read</tt></b> is the inverse of <b><tt>fill</tt></b>. It takes an array of structures and an integer <i>n</i> and reconstructs a list of <i>n</i> string pairs.</p> <p>The ML function <b><tt>qsort</tt></b> operates by first allocating enough space for the array of structures, then using <b><tt>fill</tt></b> to fill this array from the argument list <b><tt>xs</tt></b>. A call to the C-function <b><tt>qsort</tt></b> is made to sort this array. Notice how the first argument to <b><tt>sort</tt></b> is <b><tt>(address table)</tt></b> which generates the required array pointer for the C-function <b><tt>qsort</tt></b>. Finally, a list is reconstructed from the sorted array using <b><tt>read</tt></b>.</p> <p>Now we can evaluate the following:</p> <p><tt><strong>> qsort [("one","fred"), ("two", "dave"), ("three", "bob"), ("four", "mary")];<br> val it =<br> [( "four", "mary"), ("one", "fred"), ("three", "bob"), ("two", "dave")]</strong></tt></p> <h2><a name="17 Volatile Implementation">17 Volatile Implementation</a></h2> <p>The C-data contained in a volatile is managed in a separate space from normal ML data which is stored in the heap. There are two reasons for this. Data contained in the ML heap is liable to change its address during garbage collection, and C-functions cannot cope with this. The second reason is safety. We do not want foreign C-functions to obtain a pointer into the ML heap. Because the C-function is running in the same Unix process, it is always possible for it to corrupt the ML heap; however the most usual cause of corruption is caused by <i>off-by-one</i> errors. If the C-data is stored in the ML heap this would cause a neighbouring heap cell to be corrupted.</p> <p>Every ML value of type <b><tt>vol</tt></b> has two components: (1) An ML heap cell; (2) A slot in the <b><tt>vols</tt></b> array, a runtime system variable declared and managed in the file <b>Driver/foreign.c </b>. The ML heap cell indexes a slot in the <b><tt>vols</tt></b> array. This slot contains three items: (1) A back pointer, pointing at the corresponding ML heap cell. (2) A C-pointer, pointing to the actual C-data; (3) A boolean, indicating whether this volatile <i>owns</i> the space pointed to by the C-pointer.</p> <p>The combination of <b><tt>vols</tt></b> array index and the back pointer found there enables the validity of a volatile to be checked as it is dereferenced. If the volatile is invalid then the exception <b><tt>Foreign</tt></b> is raised.</p> <p>The collection of functions that convert ML values into <b><tt>vols</tt></b> (e.g. <b><tt>toCint</tt></b> and <b><tt>toCfloat</tt></b>), together with the functions <b><tt>alloc</tt></b> and <b><tt>address</tt></b> create new volatiles; that is, volatiles that <i>own</i> the space pointed to by the C-pointer in their <b>vols </b>array slot. This space is obtained from a call to <tt><b>malloc</b></tt>. There is always exactly one owner of any piece of <b><tt>malloc</tt></b>ed space. The <b><tt>deref</tt></b> and <b><tt>offset</tt></b> functions create <b><tt>vol</tt></b>s that point to previously allocated space and so are not regarded as the owner.</p> <p>Volatiles are garbage collected in such a way that <b><tt>malloc</tt></b>ed space is freed when there are no remaining references to the ML cell which owns that space. However, by itself this scheme is too vicious. For example:</p> <p><strong><tt>val a = address (toCint 999);</tt></strong></p> <p>When a garbage collection occurs, although the space owned by <b>a</b> (containing the pointer) will be preserved, the space allocated to hold the C-integer 999 will be reclaimed because there are no references to its owner, the anonymous expression <b><tt>(toCint 999)</tt></b></p> <p>If we now evaluate the expression <b><tt>(fromCint (deref a))</tt></b>, it will result in whatever garbage happened to be pointed to by the redundant C-pointer contained in the volatile <b>a</b>. What is needed is a way to ensure that the volatile <b><tt>a</tt></b> holds an ML reference to the anonymous volatile <b><tt>(toCint 999)</tt></b> for the duration of its lifetime. In a similar manner, any volatile that does not own its own space, i.e. the result of the expression <b><tt>(deref (address (toCint 999)))</tt></b>, needs to hold a reference to the owner of the space it points at. This scheme of maintaining references is implemented in <b><tt>Volatile.ML</tt></b> in the directory <b><tt>Prelude/Foreign</tt></b>, and is completely transparent to the user.</p> <p>In some unusual situations we might want to allocate some space which persists after all ML references to it have disappeared. For example, we might have to allocate space for a buffer, and then hand a pointer to this buffer over to a foreign C-function. This can be achieved in two ways. We could carefully maintain an ML reference to the <b><tt>vol</tt></b> encapsulating the buffer. Alternatively, we could use the dynamic library manipulation functions to use the real C-function <b><tt>malloc</tt></b>.</p> </body> </html>