<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <META name="GENERATOR" content="hevea 1.06-7 of 2001-11-14"> <TITLE> The core language </TITLE> </HEAD> <BODY TEXT=black BGCOLOR=white> <A HREF="manual002.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> <A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A> <A HREF="manual004.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> <HR> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#2de52d"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc2"><B><FONT SIZE=6>Chapter 1</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=6>The core language</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <A NAME="c:core-xamples"></A> <BR> This part of the manual is a tutorial introduction to the Objective Caml language. A good familiarity with programming in a conventional languages (say, Pascal or C) is assumed, but no prior exposure to functional languages is required. The present chapter introduces the core language. Chapter <A HREF="manual005.html#c:objectexamples">3</A> deals with the object-oriented features, and chapter <A HREF="manual004.html#c:moduleexamples">2</A> with the module system.<BR> <BR> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc3"><B><FONT SIZE=5>1.1</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Basics</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> For this overview of Caml, we use the interactive system, which is started by running <TT>ocaml</TT> from the Unix shell, or by launching the <TT>OCamlwin.exe</TT> application under Windows. This tutorial is presented as the transcript of a session with the interactive system: lines starting with <TT>#</TT> represent user input; the system responses are printed below, without a leading <TT>#</TT>.<BR> <BR> Under the interactive system, the user types Caml phrases, terminated by <TT>;;</TT>, in response to the <TT>#</TT> prompt, and the system compiles them on the fly, executes them, and prints the outcome of evaluation. Phrases are either simple expressions, or <TT>let</TT> definitions of identifiers (either values or functions). <PRE><FONT COLOR=black>#<FONT COLOR=blue>1+2*3;; <FONT COLOR=maroon>- : int = 7 <FONT COLOR=black>#<FONT COLOR=blue>let pi = 4.0 *. atan 1.0;; <FONT COLOR=maroon>val pi : float = 3.14159265359 <FONT COLOR=black>#<FONT COLOR=blue>let square x = x *. x;; <FONT COLOR=maroon>val square : float -> float = <fun> <FONT COLOR=black>#<FONT COLOR=blue>square(sin pi) +. square(cos pi);; <FONT COLOR=maroon>- : float = 1. </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> The Caml system computes both the value and the type for each phrase. Even function parameters need no explicit type declaration: the system infers their types from their usage in the function. Notice also that integers and floating-point numbers are distinct types, with distinct operators: <TT>+</TT> and <TT>*</TT> operate on integers, but <TT>+.</TT> and <TT>*.</TT> operate on floats. <PRE><FONT COLOR=black>#<FONT COLOR=blue><U>1.0</U> * 2;; <FONT COLOR=maroon>This expression has type float but is here used with type int </FONT></FONT></FONT></PRE> Recursive functions are defined with the <TT>let rec</TT> binding: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec fib n = if n < 2 then 1 else fib(n-1) + fib(n-2);; <FONT COLOR=maroon>val fib : int -> int = <fun> <FONT COLOR=black>#<FONT COLOR=blue>fib 10;; <FONT COLOR=maroon>- : int = 89 </FONT></FONT></FONT></FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc4"><B><FONT SIZE=5>1.2</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Data types</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> In addition to integers and floating-point numbers, Caml offers the usual basic data types: booleans, characters, and character strings. <PRE><FONT COLOR=black>#<FONT COLOR=blue>(1 < 2) = false;; <FONT COLOR=maroon>- : bool = false <FONT COLOR=black>#<FONT COLOR=blue>'a';; <FONT COLOR=maroon>- : char = 'a' <FONT COLOR=black>#<FONT COLOR=blue>"Hello world";; <FONT COLOR=maroon>- : string = "Hello world" </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Predefined data structures include tuples, arrays, and lists. General mechanisms for defining your own data structures are also provided. They will be covered in more details later; for now, we concentrate on lists. Lists are either given in extension as a bracketed list of semicolon-separated elements, or built from the empty list <TT>[]</TT> (pronounce ``nil'') by adding elements in front using the <TT>::</TT> (``cons'') operator. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let l = ["is"; "a"; "tale"; "told"; "etc."];; <FONT COLOR=maroon>val l : string list = ["is"; "a"; "tale"; "told"; "etc."] <FONT COLOR=black>#<FONT COLOR=blue>"Life" :: l;; <FONT COLOR=maroon>- : string list = ["Life"; "is"; "a"; "tale"; "told"; "etc."] </FONT></FONT></FONT></FONT></FONT></FONT></PRE> As with all other Caml data structures, lists do not need to be explicitly allocated and deallocated from memory: all memory management is entirely automatic in Caml. Similarly, there is no explicit handling of pointers: the Caml compiler silently introduces pointers where necessary.<BR> <BR> As with most Caml data structures, inspecting and destructuring lists is performed by pattern-matching. List patterns have the exact same shape as list expressions, with identifier representing unspecified parts of the list. As an example, here is insertion sort on a list: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec sort lst = match lst with [] -> [] | head :: tail -> insert head (sort tail) and insert elt lst = match lst with [] -> [elt] | head :: tail -> if elt <= head then elt :: lst else head :: insert elt tail ;; <FONT COLOR=maroon>val sort : 'a list -> 'a list = <fun> val insert : 'a -> 'a list -> 'a list = <fun> <FONT COLOR=black>#<FONT COLOR=blue>sort l;; <FONT COLOR=maroon>- : string list = ["a"; "etc."; "is"; "tale"; "told"] </FONT></FONT></FONT></FONT></FONT></FONT></PRE> The type inferred for <TT>sort</TT>, <TT>'a list -> 'a list</TT>, means that <TT>sort</TT> can actually apply to lists of any type, and returns a list of the same type. The type <TT>'a</TT> is a <EM>type variable</EM>, and stands for any given type. The reason why <TT>sort</TT> can apply to lists of any type is that the comparisons (<TT>=</TT>, <TT><=</TT>, etc.) are <EM>polymorphic</EM> in Caml: they operate between any two values of the same type. This makes <TT>sort</TT> itself polymorphic over all list types. <PRE><FONT COLOR=black>#<FONT COLOR=blue>sort [6;2;5;3];; <FONT COLOR=maroon>- : int list = [2; 3; 5; 6] <FONT COLOR=black>#<FONT COLOR=blue>sort [3.14; 2.718];; <FONT COLOR=maroon>- : float list = [2.718; 3.14] </FONT></FONT></FONT></FONT></FONT></FONT></PRE> The <TT>sort</TT> function above does not modify its input list: it builds and returns a new list containing the same elements as the input list, in ascending order. There is actually no way in Caml to modify in-place a list once it is built: we say that lists are <EM>immutable</EM> data structures. Most Caml data structures are immutable, but a few (most notably arrays) are <EM>mutable</EM>, meaning that they can be modified in-place at any time.<BR> <BR> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc5"><B><FONT SIZE=5>1.3</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Functions as values</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> Caml is a functional language: functions in the full mathematical sense are supported and can be passed around freely just as any other piece of data. For instance, here is a <TT>deriv</TT> function that takes any float function as argument and returns an approximation of its derivative function: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let deriv f dx = function x -> (f(x +. dx) -. f(x)) /. dx;; <FONT COLOR=maroon>val deriv : (float -> float) -> float -> float -> float = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let sin' = deriv sin 1e-6;; <FONT COLOR=maroon>val sin' : float -> float = <fun> <FONT COLOR=black>#<FONT COLOR=blue>sin' pi;; <FONT COLOR=maroon>- : float = -1.00000000014 </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Even function composition is definable: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let compose f g = function x -> f(g(x));; <FONT COLOR=maroon>val compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let cos2 = compose square cos;; <FONT COLOR=maroon>val cos2 : float -> float = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Functions that take other functions as arguments are called ``functionals'', or ``higher-order functions''. Functionals are especially useful to provide iterators or similar generic operations over a data structure. For instance, the standard Caml library provides a <TT>List.map</TT> functional that applies a given function to each element of a list, and returns the list of the results: <PRE><FONT COLOR=black>#<FONT COLOR=blue>List.map (function n -> n * 2 + 1) [0;1;2;3;4];; <FONT COLOR=maroon>- : int list = [1; 3; 5; 7; 9] </FONT></FONT></FONT></PRE> This functional, along with a number of other list and array functionals, is predefined because it is often useful, but there is nothing magic with it: it can easily be defined as follows. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec map f l = match l with [] -> [] | hd :: tl -> f hd :: map f tl;; <FONT COLOR=maroon>val map : ('a -> 'b) -> 'a list -> 'b list = <fun> </FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc6"><B><FONT SIZE=5>1.4</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Records and variants</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <A NAME="s:tut-recvariants"></A><BR> User-defined data structures include records and variants. Both are defined with the <TT>type</TT> declaration. Here, we declare a record type to represent rational numbers. <PRE><FONT COLOR=black>#<FONT COLOR=blue>type ratio = {num: int; denum: int};; <FONT COLOR=maroon>type ratio = { num : int; denum : int; } <FONT COLOR=black>#<FONT COLOR=blue>let add_ratio r1 r2 = {num = r1.num * r2.denum + r2.num * r1.denum; denum = r1.denum * r2.denum};; <FONT COLOR=maroon>val add_ratio : ratio -> ratio -> ratio = <fun> <FONT COLOR=black>#<FONT COLOR=blue>add_ratio {num=1; denum=3} {num=2; denum=5};; <FONT COLOR=maroon>- : ratio = {num = 11; denum = 15} </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> The declaration of a variant type lists all possible shapes for values of that type. Each case is identified by a name, called a constructor, which serves both for constructing values of the variant type and inspecting them by pattern-matching. Constructor names are capitalized to distinguish them from variable names (which must start with a lowercase letter). For instance, here is a variant type for doing mixed arithmetic (integers and floats): <PRE><FONT COLOR=black>#<FONT COLOR=blue>type number = Int of int | Float of float | Error;; <FONT COLOR=maroon>type number = Int of int | Float of float | Error </FONT></FONT></FONT></PRE> This declaration expresses that a value of type <TT>number</TT> is either an integer, a floating-point number, or the constant <TT>Error</TT> representing the result of an invalid operation (e.g. a division by zero).<BR> <BR> Enumerated types are a special case of variant types, where all alternatives are constants: <PRE><FONT COLOR=black>#<FONT COLOR=blue>type sign = Positive | Negative;; <FONT COLOR=maroon>type sign = Positive | Negative <FONT COLOR=black>#<FONT COLOR=blue>let sign_int n = if n >= 0 then Positive else Negative;; <FONT COLOR=maroon>val sign_int : int -> sign = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> To define arithmetic operations for the <TT>number</TT> type, we use pattern-matching on the two numbers involved: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let add_num n1 n2 = match (n1, n2) with (Int i1, Int i2) -> (* Check for overflow of integer addition *) if sign_int i1 = sign_int i2 && sign_int(i1 + i2) <> sign_int i1 then Float(float i1 +. float i2) else Int(i1 + i2) | (Int i1, Float f2) -> Float(float i1 +. f2) | (Float f1, Int i2) -> Float(f1 +. float i2) | (Float f1, Float f2) -> Float(f1 +. f2) | (Error, _) -> Error | (_, Error) -> Error;; <FONT COLOR=maroon>val add_num : number -> number -> number = <fun> <FONT COLOR=black>#<FONT COLOR=blue>add_num (Int 123) (Float 3.14159);; <FONT COLOR=maroon>- : number = Float 126.14159 </FONT></FONT></FONT></FONT></FONT></FONT></PRE> The most common usage of variant types is to describe recursive data structures. Consider for example the type of binary trees: <PRE><FONT COLOR=black>#<FONT COLOR=blue>type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;; <FONT COLOR=maroon>type 'a btree = Empty | Node of 'a * 'a btree * 'a btree </FONT></FONT></FONT></PRE> This definition reads as follow: a binary tree containing values of type <TT>'a</TT> (an arbitrary type) is either empty, or is a node containing one value of type <TT>'a</TT> and two subtrees containing also values of type <TT>'a</TT>, that is, two <TT>'a btree</TT>.<BR> <BR> Operations on binary trees are naturally expressed as recursive functions following the same structure as the type definition itself. For instance, here are functions performing lookup and insertion in ordered binary trees (elements increase from left to right): <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec member x btree = match btree with Empty -> false | Node(y, left, right) -> if x = y then true else if x < y then member x left else member x right;; <FONT COLOR=maroon>val member : 'a -> 'a btree -> bool = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let rec insert x btree = match btree with Empty -> Node(x, Empty, Empty) | Node(y, left, right) -> if x <= y then Node(y, insert x left, right) else Node(y, left, insert x right);; <FONT COLOR=maroon>val insert : 'a -> 'a btree -> 'a btree = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc7"><B><FONT SIZE=5>1.5</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Imperative features</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> Though all examples so far were written in purely applicative style, Caml is also equipped with full imperative features. This includes the usual <TT>while</TT> and <TT>for</TT> loops, as well as mutable data structures such as arrays. Arrays are either given in extension between <TT>[|</TT> and <TT>|]</TT> brackets, or allocated and initialized with the <TT>Array.create</TT> function, then filled up later by assignments. For instance, the function below sums two vectors (represented as float arrays) componentwise. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let add_vect v1 v2 = let len = min (Array.length v1) (Array.length v2) in let res = Array.create len 0.0 in for i = 0 to len - 1 do res.(i) <- v1.(i) +. v2.(i) done; res;; <FONT COLOR=maroon>val add_vect : float array -> float array -> float array = <fun> <FONT COLOR=black>#<FONT COLOR=blue>add_vect [| 1.0; 2.0 |] [| 3.0; 4.0 |];; <FONT COLOR=maroon>- : float array = [|4.; 6.|] </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Record fields can also be modified by assignment, provided they are declared <TT>mutable</TT> in the definition of the record type: <PRE><FONT COLOR=black>#<FONT COLOR=blue>type mutable_point = { mutable x: float; mutable y: float };; <FONT COLOR=maroon>type mutable_point = { mutable x : float; mutable y : float; } <FONT COLOR=black>#<FONT COLOR=blue>let translate p dx dy = p.x <- p.x +. dx; p.y <- p.y +. dy;; <FONT COLOR=maroon>val translate : mutable_point -> float -> float -> unit = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let mypoint = { x = 0.0; y = 0.0 };; <FONT COLOR=maroon>val mypoint : mutable_point = {x = 0.; y = 0.} <FONT COLOR=black>#<FONT COLOR=blue>translate mypoint 1.0 2.0;; <FONT COLOR=maroon>- : unit = () <FONT COLOR=black>#<FONT COLOR=blue>mypoint;; <FONT COLOR=maroon>- : mutable_point = {x = 1.; y = 2.} </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Caml has no built-in notion of variable -- identifiers whose current value can be changed by assignment. (The <TT>let</TT> binding is not an assignment, it introduces a new identifier with a new scope.) However, the standard library provides references, which are mutable indirection cells (or one-element arrays), with operators <TT>!</TT> to fetch the current contents of the reference and <TT>:=</TT> to assign the contents. Variables can then be emulated by <TT>let</TT>-binding a reference. For instance, here is an in-place insertion sort over arrays: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let insertion_sort a = for i = 1 to Array.length a - 1 do let val_i = a.(i) in let j = ref i in while !j > 0 && val_i < a.(!j - 1) do a.(!j) <- a.(!j - 1); j := !j - 1 done; a.(!j) <- val_i done;; <FONT COLOR=maroon>val insertion_sort : 'a array -> unit = <fun> </FONT></FONT></FONT></PRE> References are also useful to write functions that maintain a current state between two calls to the function. For instance, the following pseudo-random number generator keeps the last returned number in a reference: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let current_rand = ref 0;; <FONT COLOR=maroon>val current_rand : int ref = {contents = 0} <FONT COLOR=black>#<FONT COLOR=blue>let random () = current_rand := !current_rand * 25713 + 1345; !current_rand;; <FONT COLOR=maroon>val random : unit -> int = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Again, there is nothing magic with references: they are implemented as a one-field mutable record, as follows. <PRE><FONT COLOR=black>#<FONT COLOR=blue>type 'a ref = { mutable contents: 'a };; <FONT COLOR=maroon>type 'a ref = { mutable contents : 'a; } <FONT COLOR=black>#<FONT COLOR=blue>let (!) r = r.contents;; <FONT COLOR=maroon>val ( ! ) : 'a ref -> 'a = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let (:=) r newval = r.contents <- newval;; <FONT COLOR=maroon>val ( := ) : 'a ref -> 'a -> unit = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> In some special cases, you may need to store a polymorphic function in a data structure, keeping its polymorphism. Without user-provided type annotations, this is not allowed, as polymorphism is only introduced on a global level. However, you can give explicitly polymorphic types to record fields. <PRE><FONT COLOR=black>#<FONT COLOR=blue>type idref = { mutable id: 'a. 'a -> 'a };; <FONT COLOR=maroon>type idref = { mutable id : 'a. 'a -> 'a; } <FONT COLOR=black>#<FONT COLOR=blue>let r = {id = fun x -> x};; <FONT COLOR=maroon>val r : idref = {id = <fun>} <FONT COLOR=black>#<FONT COLOR=blue>let g s = (s.id 1, s.id true);; <FONT COLOR=maroon>val g : idref -> int * bool = <fun> <FONT COLOR=black>#<FONT COLOR=blue>r.id <- (fun x -> print_string "called id\n"; x);; <FONT COLOR=maroon>- : unit = () <FONT COLOR=black>#<FONT COLOR=blue>g r;; <FONT COLOR=maroon>called id called id - : int * bool = (1, true) </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc8"><B><FONT SIZE=5>1.6</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Exceptions</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> Caml provides exceptions for signalling and handling exceptional conditions. Exceptions can also be used as a general-purpose non-local control structure. Exceptions are declared with the <TT>exception</TT> construct, and signalled with the <TT>raise</TT> operator. For instance, the function below for taking the head of a list uses an exception to signal the case where an empty list is given. <PRE><FONT COLOR=black>#<FONT COLOR=blue>exception Empty_list;; <FONT COLOR=maroon>exception Empty_list <FONT COLOR=black>#<FONT COLOR=blue>let head l = match l with [] -> raise Empty_list | hd :: tl -> hd;; <FONT COLOR=maroon>val head : 'a list -> 'a = <fun> <FONT COLOR=black>#<FONT COLOR=blue>head [1;2];; <FONT COLOR=maroon>- : int = 1 <FONT COLOR=black>#<FONT COLOR=blue>head [];; <FONT COLOR=maroon>Exception: Empty_list. </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Exceptions are used throughout the standard library to signal cases where the library functions cannot complete normally. For instance, the <TT>List.assoc</TT> function, which returns the data associated with a given key in a list of (key, data) pairs, raises the predefined exception <TT>Not_found</TT> when the key does not appear in the list: <PRE><FONT COLOR=black>#<FONT COLOR=blue>List.assoc 1 [(0, "zero"); (1, "one")];; <FONT COLOR=maroon>- : string = "one" <FONT COLOR=black>#<FONT COLOR=blue>List.assoc 2 [(0, "zero"); (1, "one")];; <FONT COLOR=maroon>Exception: Not_found. </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Exceptions can be trapped with the <TT>try</TT>...<TT>with</TT> construct: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let name_of_binary_digit digit = try List.assoc digit [0, "zero"; 1, "one"] with Not_found -> "not a binary digit";; <FONT COLOR=maroon>val name_of_binary_digit : int -> string = <fun> <FONT COLOR=black>#<FONT COLOR=blue>name_of_binary_digit 0;; <FONT COLOR=maroon>- : string = "zero" <FONT COLOR=black>#<FONT COLOR=blue>name_of_binary_digit (-1);; <FONT COLOR=maroon>- : string = "not a binary digit" </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> The <TT>with</TT> part is actually a regular pattern-matching on the exception value. Thus, several exceptions can be caught by one <TT>try</TT>...<TT>with</TT> construct. Also, finalization can be performed by trapping all exceptions, performing the finalization, then raising again the exception: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let temporarily_set_reference ref newval funct = let oldval = !ref in try ref := newval; let res = funct () in ref := oldval; res with x -> ref := oldval; raise x;; <FONT COLOR=maroon>val temporarily_set_reference : 'a ref -> 'a -> (unit -> 'b) -> 'b = <fun> </FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc9"><B><FONT SIZE=5>1.7</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Symbolic processing of expressions</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> We finish this introduction with a more complete example representative of the use of Caml for symbolic processing: formal manipulations of arithmetic expressions containing variables. The following variant type describes the expressions we shall manipulate: <PRE><FONT COLOR=black>#<FONT COLOR=blue>type expression = Const of float | Var of string | Sum of expression * expression (* e1 + e2 *) | Diff of expression * expression (* e1 - e2 *) | Prod of expression * expression (* e1 * e2 *) | Quot of expression * expression (* e1 / e2 *) ;; <FONT COLOR=maroon>type expression = Const of float | Var of string | Sum of expression * expression | Diff of expression * expression | Prod of expression * expression | Quot of expression * expression </FONT></FONT></FONT></PRE> We first define a function to evaluate an expression given an environment that maps variable names to their values. For simplicity, the environment is represented as an association list. <PRE><FONT COLOR=black>#<FONT COLOR=blue>exception Unbound_variable of string;; <FONT COLOR=maroon>exception Unbound_variable of string <FONT COLOR=black>#<FONT COLOR=blue>let rec eval env exp = match exp with Const c -> c | Var v -> (try List.assoc v env with Not_found -> raise(Unbound_variable v)) | Sum(f, g) -> eval env f +. eval env g | Diff(f, g) -> eval env f -. eval env g | Prod(f, g) -> eval env f *. eval env g | Quot(f, g) -> eval env f /. eval env g;; <FONT COLOR=maroon>val eval : (string * float) list -> expression -> float = <fun> <FONT COLOR=black>#<FONT COLOR=blue>eval [("x", 1.0); ("y", 3.14)] (Prod(Sum(Var "x", Const 2.0), Var "y"));; <FONT COLOR=maroon>- : float = 9.42 </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Now for a real symbolic processing, we define the derivative of an expression with respect to a variable <TT>dv</TT>: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec deriv exp dv = match exp with Const c -> Const 0.0 | Var v -> if v = dv then Const 1.0 else Const 0.0 | Sum(f, g) -> Sum(deriv f dv, deriv g dv) | Diff(f, g) -> Diff(deriv f dv, deriv g dv) | Prod(f, g) -> Sum(Prod(f, deriv g dv), Prod(deriv f dv, g)) | Quot(f, g) -> Quot(Diff(Prod(deriv f dv, g), Prod(f, deriv g dv)), Prod(g, g)) ;; <FONT COLOR=maroon>val deriv : expression -> string -> expression = <fun> <FONT COLOR=black>#<FONT COLOR=blue>deriv (Quot(Const 1.0, Var "x")) "x";; <FONT COLOR=maroon>- : expression = Quot (Diff (Prod (Const 0., Var "x"), Prod (Const 1., Const 1.)), Prod (Var "x", Var "x")) </FONT></FONT></FONT></FONT></FONT></FONT></PRE> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc10"><B><FONT SIZE=5>1.8</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Pretty-printing and parsing</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> As shown in the examples above, the internal representation (also called <EM>abstract syntax</EM>) of expressions quickly becomes hard to read and write as the expressions get larger. We need a printer and a parser to go back and forth between the abstract syntax and the <EM>concrete syntax</EM>, which in the case of expressions is the familiar algebraic notation (e.g. <TT>2*x+1</TT>).<BR> <BR> For the printing function, we take into account the usual precedence rules (i.e. <TT>*</TT> binds tighter than <TT>+</TT>) to avoid printing unnecessary parentheses. To this end, we maintain the current operator precedence and print parentheses around an operator only if its precedence is less than the current precedence. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let print_expr exp = (* Local function definitions *) let open_paren prec op_prec = if prec > op_prec then print_string "(" in let close_paren prec op_prec = if prec > op_prec then print_string ")" in let rec print prec exp = (* prec is the current precedence *) match exp with Const c -> print_float c | Var v -> print_string v | Sum(f, g) -> open_paren prec 0; print 0 f; print_string " + "; print 0 g; close_paren prec 0 | Diff(f, g) -> open_paren prec 0; print 0 f; print_string " - "; print 1 g; close_paren prec 0 | Prod(f, g) -> open_paren prec 2; print 2 f; print_string " * "; print 2 g; close_paren prec 2 | Quot(f, g) -> open_paren prec 2; print 2 f; print_string " / "; print 3 g; close_paren prec 2 in print 0 exp;; <FONT COLOR=maroon>val print_expr : expression -> unit = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let e = Sum(Prod(Const 2.0, Var "x"), Const 1.0);; <FONT COLOR=maroon>val e : expression = Sum (Prod (Const 2., Var "x"), Const 1.) <FONT COLOR=black>#<FONT COLOR=blue>print_expr e; print_newline();; <FONT COLOR=maroon>2. * x + 1. - : unit = () <FONT COLOR=black>#<FONT COLOR=blue>print_expr (deriv e "x"); print_newline();; <FONT COLOR=maroon>2. * 1. + 0. * x + 0. - : unit = () </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> Parsing (transforming concrete syntax into abstract syntax) is usually more delicate. Caml offers several tools to help write parsers: on the one hand, Caml versions of the lexer generator Lex and the parser generator Yacc (see chapter <A HREF="manual026.html#c:ocamlyacc">12</A>), which handle LALR(1) languages using push-down automata; on the other hand, a predefined type of streams (of characters or tokens) and pattern-matching over streams, which facilitate the writing of recursive-descent parsers for LL(1) languages. An example using <TT>ocamllex</TT> and <TT>ocamlyacc</TT> is given in chapter <A HREF="manual026.html#c:ocamlyacc">12</A>. Here, we will use stream parsers. The syntactic support for stream parsers is provided by the Camlp4 preprocessor, which can be loaded into the interactive toplevel via the <TT>#load</TT> directive below. <PRE><FONT COLOR=black>#<FONT COLOR=blue>#load "camlp4o.cma";; <FONT COLOR=maroon> Camlp4 Parsing version 3.05 (2002-07-22) <FONT COLOR=black>#<FONT COLOR=blue>open Genlex;; let lexer = make_lexer ["("; ")"; "+"; "-"; "*"; "/"];; <FONT COLOR=maroon>val lexer : char Stream.t -> Genlex.token Stream.t = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> For the lexical analysis phase (transformation of the input text into a stream of tokens), we use a ``generic'' lexer provided in the standard library module <TT>Genlex</TT>. The <TT>make_lexer</TT> function takes a list of keywords and returns a lexing function that ``tokenizes'' an input stream of characters. Tokens are either identifiers, keywords, or literals (integer, floats, characters, strings). Whitespace and comments are skipped. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let token_stream = lexer(Stream.of_string "1.0 +x");; <FONT COLOR=maroon>val token_stream : Genlex.token Stream.t = <abstr> <FONT COLOR=black>#<FONT COLOR=blue>Stream.next token_stream;; <FONT COLOR=maroon>- : Genlex.token = Float 1. <FONT COLOR=black>#<FONT COLOR=blue>Stream.next token_stream;; <FONT COLOR=maroon>- : Genlex.token = Kwd "+" <FONT COLOR=black>#<FONT COLOR=blue>Stream.next token_stream;; <FONT COLOR=maroon>- : Genlex.token = Ident "x" </FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></PRE> The parser itself operates by pattern-matching on the stream of tokens. As usual with recursive descent parsers, we use several intermediate parsing functions to reflect the precedence and associativity of operators. Pattern-matching over streams is more powerful than on regular data structures, as it allows recursive calls to parsing functions inside the patterns, for matching sub-components of the input stream. See chapter <A HREF="manual021.html#c:extensions">7</A> for more details.<BR> <BR> In order to use stream parsers at toplevel, we must first load the <TT>camlp4</TT> preprocessor. <PRE><FONT COLOR=black>#<FONT COLOR=blue>#load"camlp4o.cma";; <FONT COLOR=maroon> Camlp4 Parsing version 3.05 (2002-07-22) </FONT></FONT></FONT></PRE> Then we are ready to define our parser. <PRE><FONT COLOR=black>#<FONT COLOR=blue>let rec parse_expr = parser [< e1 = parse_mult; e = parse_more_adds e1 >] -> e and parse_more_adds e1 = parser [< 'Kwd "+"; e2 = parse_mult; e = parse_more_adds (Sum(e1, e2)) >] -> e | [< 'Kwd "-"; e2 = parse_mult; e = parse_more_adds (Diff(e1, e2)) >] -> e | [< >] -> e1 and parse_mult = parser [< e1 = parse_simple; e = parse_more_mults e1 >] -> e and parse_more_mults e1 = parser [< 'Kwd "*"; e2 = parse_simple; e = parse_more_mults (Prod(e1, e2)) >] -> e | [< 'Kwd "/"; e2 = parse_simple; e = parse_more_mults (Quot(e1, e2)) >] -> e | [< >] -> e1 and parse_simple = parser [< 'Ident s >] -> Var s | [< 'Int i >] -> Const(float i) | [< 'Float f >] -> Const f | [< 'Kwd "("; e = parse_expr; 'Kwd ")" >] -> e;; <FONT COLOR=maroon>val parse_expr : Genlex.token Stream.t -> expression = <fun> val parse_more_adds : expression -> Genlex.token Stream.t -> expression = <fun> val parse_mult : Genlex.token Stream.t -> expression = <fun> val parse_more_mults : expression -> Genlex.token Stream.t -> expression = <fun> val parse_simple : Genlex.token Stream.t -> expression = <fun> <FONT COLOR=black>#<FONT COLOR=blue>let parse_expression = parser [< e = parse_expr; _ = Stream.empty >] -> e;; <FONT COLOR=maroon>val parse_expression : Genlex.token Stream.t -> expression = <fun> </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Composing the lexer and parser, we finally obtain a function to read an expression from a character string: <PRE><FONT COLOR=black>#<FONT COLOR=blue>let read_expression s = parse_expression(lexer(Stream.of_string s));; <FONT COLOR=maroon>val read_expression : string -> expression = <fun> <FONT COLOR=black>#<FONT COLOR=blue>read_expression "2*(x+y)";; <FONT COLOR=maroon>- : expression = Prod (Const 2., Sum (Var "x", Var "y")) </FONT></FONT></FONT></FONT></FONT></FONT></PRE> A small puzzle: why do we get different results in the following two examples? <PRE><FONT COLOR=black>#<FONT COLOR=blue>read_expression "x - 1";; <FONT COLOR=maroon>- : expression = Diff (Var "x", Const 1.) <FONT COLOR=black>#<FONT COLOR=blue>read_expression "x-1";; <FONT COLOR=maroon>Exception: Stream.Error "". </FONT></FONT></FONT></FONT></FONT></FONT></PRE> Answer: the generic lexer provided by <TT>Genlex</TT> recognizes negative integer literals as one integer token. Hence, <TT>x-1</TT> is read as the token <TT>Ident "x"</TT> followed by the token <TT>Int(-1)</TT>; this sequence does not match any of the parser rules. On the other hand, the second space in <TT>x - 1</TT> causes the lexer to return the three expected tokens: <TT>Ident "x"</TT>, then <TT>Kwd "-"</TT>, then <TT>Int(1)</TT>.<BR> <BR> <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE> <TR><TD><A NAME="htoc11"><B><FONT SIZE=5>1.9</FONT></B></A></TD> <TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Standalone Caml programs</FONT></B></TD> </TR></TABLE></DIV></TD> </TR></TABLE> <BR> All examples given so far were executed under the interactive system. Caml code can also be compiled separately and executed non-interactively using the batch compilers <TT>ocamlc</TT> or <TT>ocamlopt</TT>. The source code must be put in a file with extension <TT>.ml</TT>. It consists of a sequence of phrases, which will be evaluated at runtime in their order of appearance in the source file. Unlike in interactive mode, types and values are not printed automatically; the program must call printing functions explicitly to produce some output. Here is a sample standalone program to print Fibonacci numbers: <PRE> (* File fib.ml *) let rec fib n = if n < 2 then 1 else fib(n-1) + fib(n-2);; let main () = let arg = int_of_string Sys.argv.(1) in print_int(fib arg); print_newline(); exit 0;; main ();; </PRE><TT>Sys.argv</TT> is an array of strings containing the command-line parameters. <TT>Sys.argv.(1)</TT> is thus the first command-line parameter. The program above is compiled and executed with the following shell commands: <PRE> $ ocamlc -o fib fib.ml $ ./fib 10 89 $ ./fib 20 10946 </PRE> <HR> <A HREF="manual002.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> <A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A> <A HREF="manual004.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> </BODY> </HTML>