<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta name="robots" content="index,nofollow"> <title>Unicode - MLton Standard ML Compiler (SML Compiler)</title> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="all" href="common.css"> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="screen" href="screen.css"> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="print" href="print.css"> <link rel="Start" href="Home"> </head> <body lang="en" dir="ltr"> <script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-833377-1"; urchinTracker(); </script> <table bgcolor = lightblue cellspacing = 0 style = "border: 0px;" width = 100%> <tr> <td style = " border: 0px; color: darkblue; font-size: 150%; text-align: left;"> <a class = mltona href="Home">MLton MLTONWIKIVERSION</a> <td style = " border: 0px; font-size: 150%; text-align: center; width: 50%;"> Unicode <td style = " border: 0px; text-align: right;"> <table cellspacing = 0 style = "border: 0px"> <tr style = "vertical-align: middle;"> </table> <tr style = "background-color: white;"> <td colspan = 3 style = " border: 0px; font-size:70%; text-align: right;"> <a href = "Home">Home</a> <a href = "TitleIndex">Index</a> </table> <div id="content" lang="en" dir="ltr"> The current release of MLton does not support Unicode. We are working on adding support. <ul> <li> <p> <tt>WideChar</tt> structure. </p> </li> <li> <p> UTF-8 encoded source files. </p> </li> </ul> <p> There is no real support for Unicode in the <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away sentences along the lines of "ASCII must be a subset of the character set in programs". </p> <p> Neither is there real support for Unicode in the <a href="BasisLibrary">Basis Library</a>. The general consensus (which includes the opinions of the editors of the Basis Library) is that the <tt>WideChar</tt> structure is insufficient for the purposes of Unicode. There is no <tt>LargeChar</tt> structure, which in itself is a deficiency, since a programmer can not program against the largest supported character size. </p> <p> MLton has some preliminary support for 16 and 32 bit characters and strings. It is even possible to include arbitrary Unicode characters in 32-bit strings using a <tt>\Uxxxxxxxx</tt> escape sequence. (This longer escape sequence is a minor extension over the Definition which only allows <tt>\uxxxx</tt>.) This is by no means completely satisfactory in terms of support for Unicode, but it is what is currently available. </p> <p> There are periodic flurries of questions and discussion about Unicode in MLton/SML. In December 2004, there was a discussion that led to some seemingly sound design decisions. The discussion started at: </p> <ul> <a href="http://mlton.org/pipermail/mlton/2004-December/026396.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2004-December/026396.html</a> </ul> <p> There is a good summary of points at: </p> <ul> <a href="http://mlton.org/pipermail/mlton/2004-December/026440.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2004-December/026440.html</a> </ul> <p> In November 2005, there was a followup discussion and the beginning of some coding. </p> <ul> <a href="http://mlton.org/pipermail/mlton/2005-November/028300.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2005-November/028300.html</a> </ul> <p> We are optimistic that support will appear in the next MLton release. </p> <h2 id="head-a4bc8bf5caf54b18cea9f58e83dd4acb488deb17">Also see</h2> <p> The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode documents. </p> </div> <p> <hr> Last edited on 2007-08-15 22:07:35 by <span title="fenrir.uchicago.edu"><a href="MatthewFluet">MatthewFluet</a></span>. </body></html>