Sophie

Sophie

distrib > Mandriva > 2008.1 > x86_64 > media > contrib-release > by-pkgid > 535a7a10fe62254ee9ca7e6375f081a9 > files > 196

ocaml-ocamlnet-2.2.7-4mdv2008.1.x86_64.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="previous" href="Netstream.html">
<link rel="next" href="Netmime.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of class attributes" rel=Appendix href="index_attributes.html">
<link title="Index of class methods" rel=Appendix href="index_methods.html">
<link title="Index of classes" rel=Appendix href="index_classes.html">
<link title="Index of class types" rel=Appendix href="index_class_types.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Index of module types" rel=Appendix href="index_module_types.html">
<link title="Uq_gtk" rel="Chapter" href="Uq_gtk.html">
<link title="Equeue" rel="Chapter" href="Equeue.html">
<link title="Unixqueue" rel="Chapter" href="Unixqueue.html">
<link title="Uq_engines" rel="Chapter" href="Uq_engines.html">
<link title="Uq_socks5" rel="Chapter" href="Uq_socks5.html">
<link title="Unixqueue_mt" rel="Chapter" href="Unixqueue_mt.html">
<link title="Equeue_intro" rel="Chapter" href="Equeue_intro.html">
<link title="Uq_ssl" rel="Chapter" href="Uq_ssl.html">
<link title="Uq_tcl" rel="Chapter" href="Uq_tcl.html">
<link title="Netcgi_common" rel="Chapter" href="Netcgi_common.html">
<link title="Netcgi" rel="Chapter" href="Netcgi.html">
<link title="Netcgi_ajp" rel="Chapter" href="Netcgi_ajp.html">
<link title="Netcgi_scgi" rel="Chapter" href="Netcgi_scgi.html">
<link title="Netcgi_cgi" rel="Chapter" href="Netcgi_cgi.html">
<link title="Netcgi_fcgi" rel="Chapter" href="Netcgi_fcgi.html">
<link title="Netcgi_dbi" rel="Chapter" href="Netcgi_dbi.html">
<link title="Netcgi1_compat" rel="Chapter" href="Netcgi1_compat.html">
<link title="Netcgi_test" rel="Chapter" href="Netcgi_test.html">
<link title="Netcgi_porting" rel="Chapter" href="Netcgi_porting.html">
<link title="Netcgi_plex" rel="Chapter" href="Netcgi_plex.html">
<link title="Http_client" rel="Chapter" href="Http_client.html">
<link title="Telnet_client" rel="Chapter" href="Telnet_client.html">
<link title="Ftp_data_endpoint" rel="Chapter" href="Ftp_data_endpoint.html">
<link title="Ftp_client" rel="Chapter" href="Ftp_client.html">
<link title="Nethttpd_types" rel="Chapter" href="Nethttpd_types.html">
<link title="Nethttpd_kernel" rel="Chapter" href="Nethttpd_kernel.html">
<link title="Nethttpd_reactor" rel="Chapter" href="Nethttpd_reactor.html">
<link title="Nethttpd_engine" rel="Chapter" href="Nethttpd_engine.html">
<link title="Nethttpd_services" rel="Chapter" href="Nethttpd_services.html">
<link title="Nethttpd_plex" rel="Chapter" href="Nethttpd_plex.html">
<link title="Nethttpd_intro" rel="Chapter" href="Nethttpd_intro.html">
<link title="Netplex_types" rel="Chapter" href="Netplex_types.html">
<link title="Netplex_mp" rel="Chapter" href="Netplex_mp.html">
<link title="Netplex_mt" rel="Chapter" href="Netplex_mt.html">
<link title="Netplex_log" rel="Chapter" href="Netplex_log.html">
<link title="Netplex_controller" rel="Chapter" href="Netplex_controller.html">
<link title="Netplex_container" rel="Chapter" href="Netplex_container.html">
<link title="Netplex_sockserv" rel="Chapter" href="Netplex_sockserv.html">
<link title="Netplex_workload" rel="Chapter" href="Netplex_workload.html">
<link title="Netplex_main" rel="Chapter" href="Netplex_main.html">
<link title="Netplex_config" rel="Chapter" href="Netplex_config.html">
<link title="Netplex_kit" rel="Chapter" href="Netplex_kit.html">
<link title="Rpc_netplex" rel="Chapter" href="Rpc_netplex.html">
<link title="Netplex_cenv" rel="Chapter" href="Netplex_cenv.html">
<link title="Netplex_intro" rel="Chapter" href="Netplex_intro.html">
<link title="Netshm" rel="Chapter" href="Netshm.html">
<link title="Netshm_data" rel="Chapter" href="Netshm_data.html">
<link title="Netshm_hashtbl" rel="Chapter" href="Netshm_hashtbl.html">
<link title="Netshm_array" rel="Chapter" href="Netshm_array.html">
<link title="Netshm_intro" rel="Chapter" href="Netshm_intro.html">
<link title="Netconversion" rel="Chapter" href="Netconversion.html">
<link title="Netchannels" rel="Chapter" href="Netchannels.html">
<link title="Netstream" rel="Chapter" href="Netstream.html">
<link title="Mimestring" rel="Chapter" href="Mimestring.html">
<link title="Netmime" rel="Chapter" href="Netmime.html">
<link title="Netsendmail" rel="Chapter" href="Netsendmail.html">
<link title="Neturl" rel="Chapter" href="Neturl.html">
<link title="Netaddress" rel="Chapter" href="Netaddress.html">
<link title="Netbuffer" rel="Chapter" href="Netbuffer.html">
<link title="Netdate" rel="Chapter" href="Netdate.html">
<link title="Netencoding" rel="Chapter" href="Netencoding.html">
<link title="Netulex" rel="Chapter" href="Netulex.html">
<link title="Netaccel" rel="Chapter" href="Netaccel.html">
<link title="Netaccel_link" rel="Chapter" href="Netaccel_link.html">
<link title="Nethtml" rel="Chapter" href="Nethtml.html">
<link title="Netstring_str" rel="Chapter" href="Netstring_str.html">
<link title="Netstring_pcre" rel="Chapter" href="Netstring_pcre.html">
<link title="Netstring_mt" rel="Chapter" href="Netstring_mt.html">
<link title="Netmappings" rel="Chapter" href="Netmappings.html">
<link title="Netaux" rel="Chapter" href="Netaux.html">
<link title="Nethttp" rel="Chapter" href="Nethttp.html">
<link title="Netchannels_tut" rel="Chapter" href="Netchannels_tut.html">
<link title="Netmime_tut" rel="Chapter" href="Netmime_tut.html">
<link title="Netsendmail_tut" rel="Chapter" href="Netsendmail_tut.html">
<link title="Netulex_tut" rel="Chapter" href="Netulex_tut.html">
<link title="Neturl_tut" rel="Chapter" href="Neturl_tut.html">
<link title="Netsys" rel="Chapter" href="Netsys.html">
<link title="Netpop" rel="Chapter" href="Netpop.html">
<link title="Rpc_auth_dh" rel="Chapter" href="Rpc_auth_dh.html">
<link title="Rpc_key_service" rel="Chapter" href="Rpc_key_service.html">
<link title="Rpc_time" rel="Chapter" href="Rpc_time.html">
<link title="Rpc_auth_local" rel="Chapter" href="Rpc_auth_local.html">
<link title="Rtypes" rel="Chapter" href="Rtypes.html">
<link title="Xdr" rel="Chapter" href="Xdr.html">
<link title="Rpc" rel="Chapter" href="Rpc.html">
<link title="Rpc_program" rel="Chapter" href="Rpc_program.html">
<link title="Rpc_portmapper_aux" rel="Chapter" href="Rpc_portmapper_aux.html">
<link title="Rpc_packer" rel="Chapter" href="Rpc_packer.html">
<link title="Rpc_transport" rel="Chapter" href="Rpc_transport.html">
<link title="Rpc_client" rel="Chapter" href="Rpc_client.html">
<link title="Rpc_simple_client" rel="Chapter" href="Rpc_simple_client.html">
<link title="Rpc_portmapper_clnt" rel="Chapter" href="Rpc_portmapper_clnt.html">
<link title="Rpc_portmapper" rel="Chapter" href="Rpc_portmapper.html">
<link title="Rpc_server" rel="Chapter" href="Rpc_server.html">
<link title="Rpc_auth_sys" rel="Chapter" href="Rpc_auth_sys.html">
<link title="Rpc_intro" rel="Chapter" href="Rpc_intro.html">
<link title="Rpc_mapping_ref" rel="Chapter" href="Rpc_mapping_ref.html">
<link title="Rpc_ssl" rel="Chapter" href="Rpc_ssl.html">
<link title="Rpc_xti_client" rel="Chapter" href="Rpc_xti_client.html">
<link title="Shell_sys" rel="Chapter" href="Shell_sys.html">
<link title="Shell" rel="Chapter" href="Shell.html">
<link title="Shell_uq" rel="Chapter" href="Shell_uq.html">
<link title="Shell_mt" rel="Chapter" href="Shell_mt.html">
<link title="Shell_intro" rel="Chapter" href="Shell_intro.html">
<link title="Netsmtp" rel="Chapter" href="Netsmtp.html"><link title="Parsing and Printing Mail Headers" rel="Section" href="#headers">
<link title="Parsing Structured Values" rel="Section" href="#structured_values">
<link title="Parsing Certain Forms of Structured Values" rel="Section" href="#parsers_for_structured_values">
<link title="Printing Structured Values" rel="Section" href="#printers_for_structured_values">
<link title="Scanning MIME Messages" rel="Section" href="#scanning_mime">
<link title="Helpers for MIME Messages" rel="Section" href="#helpers_mime">
<title>Ocamlnet 2 Reference Manual : Mimestring</title>
</head>
<body>
<div class="navbar"><a href="Netstream.html">Previous</a>
&nbsp;<a href="index.html">Up</a>
&nbsp;<a href="Netmime.html">Next</a>
</div>
<center><h1>Module <a href="type_Mimestring.html">Mimestring</a></h1></center>
<br>
<pre><span class="keyword">module</span> Mimestring: <code class="code">sig</code> <a href="Mimestring.html">..</a> <code class="code">end</code></pre>Low-level functions to parse and print mail and MIME messages 
<p>

 <code class="code">Mimestring</code> contains a lot of functions to scan and print strings
 formatted as MIME messages. For a higher-level view on this topic,
 see the <code class="code">Netmime</code> module.
<p>

 <b>Contents</b><ul>
<li><a href="Mimestring.html#headers"><i>Parsing and Printing Mail Headers</i></a></li>
<li><a href="Mimestring.html#structured_values"><i>Parsing Structured Values</i></a></li>
<li><a href="Mimestring.html#parsers_for_structured_values"><i>Parsing Certain Forms of Structured Values</i></a></li>
<li><a href="Mimestring.html#printers_for_structured_values"><i>Printing Structured Values</i></a></li>
<li><a href="Mimestring.html#scanning_mime"><i>Scanning MIME Messages</i></a></li>
<li><a href="Mimestring.html#helpers_mime"><i>Helpers for MIME Messages</i></a>
</li>
</ul>
<br>
<hr width="100%">
<br>
<a name="headers"></a>
<h1>Parsing and Printing Mail Headers</h1><br>
<pre><span class="keyword">val</span> <a name="VALscan_header"></a>scan_header : <code class="type">?downcase:bool -><br>       ?unfold:bool -><br>       ?strip:bool -><br>       string -> start_pos:int -> end_pos:int -> (string * string) list * int</code></pre><div class="info">
<code class="code">let params, header_end_pos = scan_header s start_pos end_pos</code>:
<p>

 Scans the mail header that begins at position <code class="code">start_pos</code> in the string 
 <code class="code">s</code> and that must end somewhere before position <code class="code">end_pos</code>. It is intended
 that in <code class="code">end_pos</code> the character position following the end of the body of
 the MIME message is passed.
<p>

 Returns the parameters of the header as <code class="code">(name,value)</code> pairs (in
 <code class="code">params</code>), and in <code class="code">header_end_pos</code> the position of the character following
 directly after the header (i.e. after the blank line separating
 the header from the body).
<p>

 The following normalizations have already been applied:<ul>
<li>(D) The names are converted to lowercase characters</li>
<li>(U) Newline characters (CR and LF) in the middle of the header fields
     have been removed</li>
<li>(S) Whitespace at the beginning and at the end of field values has been
     removed </li>
</ul>

 The default is to apply all three normalizations (D), (U), and (S)
 (for historic reasons). The three arguments <code class="code">downcase</code>, <code class="code">unfold</code>,
 and <code class="code">strip</code> control which normalizations are performed (and for
 historic reasons, too, this is not what you would expect - backwards
 compatibility can sometimes be a burden):
<p>
<ul>
<li>If <code class="code">downcase</code>, do (D); if <code class="code">not downcase</code>, don't do (D).</li>
<li>If <code class="code">unfold</code>, do (U); if <code class="code">not unfold</code>, don't do (U).</li>
<li>If <code class="code">unfold || strip</code>, do (S); if <code class="code">not unfold &amp;&amp; not strip</code>,
   don't do (S)</li>
<li>Defaults: <code class="code">downcase</code>, <code class="code">unfold</code>, <code class="code">not strip</code>.</li>
</ul>

 This means that <code class="code">unfold</code> not only removes CR/LF from the field value,
 but also removes whitespace at the beginning and at the end of the
 field value. <code class="code">strip</code> causes not to remove CR/LF if it occurs
 somewhere within the field value, but all whitespace (including
 CR/LF) at the beginning of the field value and at the end of the
 field value is still deleted. Note that if you only want (S)
 you have to pass <code class="code">~unfold:false</code> and <code class="code">~strip:true</code>.
<p>

 The rules to postprocess mail messages in MIME format are <b>not</b>
 applied (e.g. encoding transformations as indicated by RFC 2047).
<p>

 The function fails if the header violates the header format
 strongly. (Some minor deviations are tolerated, e.g. it is sufficient
 to separate lines by only LF instead of CRLF.)
<p>

 <b>The Format of Mail Messages</b>
<p>

 Messages
 consist of a header and a body; the first empty line separates both
 parts. The header contains lines "<i>param-name</i><code class="code">:</code> <i>param-value</i>" where
 the param-name must begin on column 0 of the line, and the "<code class="code">:</code>"
 separates the name and the value. So the format is roughly:
<p>

 <pre><code class="code"> param1-name: param1-value
 ...
 paramN-name: paramN-value
 _
 body </code></pre>
<p>

 (Where "_" denotes an empty line.)
<p>

 This function wants in <code class="code">start_pos</code> the position of the first character of
 <code class="code">param1-name</code> in the string, and in <code class="code">end_pos</code> the position of the character
 following <code class="code">body</code>. It returns as <code class="code">header_end_pos</code> the position where
 <code class="code">body</code> begins. Furthermore, in <code class="code">params</code> all parameters are returned the
 function finds in the header.
<p>

 <b>Details</b>
<p>

 Note that parameter values are restricted; you cannot represent
 arbitrary strings. The following problems can arise:<ul>
<li>Values cannot begin with whitespace characters, because there
   may be an arbitrary number of whitespaces between the "<code class="code">:</code>" and the
   value.</li>
<li>Values (and names of parameters, too) must only be formed of
   7 bit ASCII characters. (If this is not enough, the MIME standard
   knows the extension RFC 2047 that allows that header values may
   be composed of arbitrary characters of arbitrary character sets.
   See below how to decode such characters in values returned by
   this function.)</li>
<li>Header values may be broken into several lines. Continuation
   lines must begin with whitespace characters. This means that values
   must not contain line breaks as semantic part of the value.
   And it may mean that <i>one</i> whitespace character is not distinguishable
   from <i>several</i> whitespace characters.</li>
<li>Header lines must not be longer than 78 characters (soft limit) or
   998 characters (hard limit). Values that
   would result into longer lines must be broken into several lines.
   This means that you cannot represent strings that contain too few
   whitespace characters.
   (Note: The soft limit is to avoid that user agents have problems
   with long lines. The hard limit means that transfer agents sometimes
   do not transfer longer lines correctly.)</li>
<li>Some old gateways pad the lines with spaces at the end of the lines.</li>
</ul>

 This implementation of a mail scanner tolerates a number of
 deviations from the standard: long lines are not rejected; 8 bit
 values are generally accepted; lines may be ended only with LF instead of
 CRLF.
<p>

 Furthermore, the transformations (D), (U), and (S) can be performed
 resulting in values that are simpler to process.
<p>

 <b>Compatibility</b>
<p>

 This function can parse all mail headers that conform to RFC 822 or
 RFC 2822.
<p>

 But there may be still problems, as RFC 822 allows some crazy
 representations that are actually not used in practice.
 In particular, RFC 822 allows it to use backslashes to "indicate"
 that a CRLF sequence is semantically meant as line break. As this
 function normally deletes CRLFs, it is not possible to recognize such
 indicators in the result of the function.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALread_header"></a>read_header : <code class="type">?downcase:bool -><br>       ?unfold:bool -><br>       ?strip:bool -> <a href="Netstream.in_obj_stream.html">Netstream.in_obj_stream</a> -> (string * string) list</code></pre><div class="info">
This function expects that the current position of the passed
 <code class="code">in_obj_stream</code> is the first byte of the header. The function scans the
 header and returns it. After that, the stream position is after
 the header and the terminating empty line (i.e. at the beginning of
 the message body).
<p>

 The options <code class="code">downcase</code>, <code class="code">unfold</code>, and <code class="code">strip</code> have the same meaning
 as in <code class="code">scan_header</code>.
<p>

 <b>Example</b>
<p>

 To read the mail message "<code class="code">file.txt</code>":
<p>

 <pre><code class="code"> let ch = Netchannels.input_channel (open_in "file.txt") in
 let stream = Netstream.input_stream ch in
 let header = read_header stream in
 stream#close_in()  (* no need to close ch *)
 </code></pre><br>
</div>
<pre><span class="keyword">val</span> <a name="VALwrite_header"></a>write_header : <code class="type">?soft_eol:string -><br>       ?eol:string -> <a href="Netchannels.out_obj_channel.html">Netchannels.out_obj_channel</a> -> (string * string) list -> unit</code></pre><div class="info">
This function writes the header to the passed <code class="code">out_obj_channel</code>. The
 empty line following the header is also written.
<p>

 Exact output format: 
 <ul>
<li>The header is not folded, i.e. no additional CRLF sequences
    are inserted into the header to avoid long header lines.
    In order to produce correct headers, the necessary CRLF bytes
    must already exist in the field values. (You can use the
    function <code class="code">write_value</code> below for this.)</li>
<li>However, this function helps getting some details right. First,
    whitespace at the beginning of field values is suppressed.
<p>

    <b>Example:</b>
<p>

    <code class="code">write_header ch ["x","Field value"; "y","   Other value"]</code> outputs:
 <pre><code class="code"> x: Field value\r\n
 y: Other value\r\n
 \r\n</code></pre></li>
<li>The end-of-line sequences LF, and CRLF, followed by
    whitespace are replaced by the passed <code class="code">soft_eol</code> string. If the
    necessary space or tab character following the eol is missing, an
    additional space character will be inserted.
<p>

    <b>Example:</b>
<p>

    <code class="code">write_header ch ["x","Field\nvalue"; "y","Other\r\n\tvalue"]</code> outputs:
 <pre><code class="code"> x: Field\r\n
  value
 y: Other\r\n
 \tvalue</code></pre></li>
<li>Empty lines (and lines only consisting of whitespace) are suppressed
    if they occur inside the header.
<p>

    <b>Example:</b>
<p>

    <code class="code">write_header ch ["x","Field\n\nvalue"]</code> outputs:
 <pre><code class="code"> x: Field\r\n
  value</code></pre></li>
<li>Whitespace at the end of a header field is suppressed. One field
    is separated from the next field by printing <code class="code">eol</code> once.</li>
</ul>

<p>

 These rules ensure that the printed header will be well-formed with
 two exceptions:<ul>
<li>Long lines (&gt; 72 characters) are neither folded nor rejected</li>
<li>True 8 bit characters are neither properly encoded nor rejected</li>
</ul>

 These two problems cannot be addressed without taking the syntax
 of the header fields into account. See below how to create
 proper header fields from <code class="code">s_token</code> lists.<br>
</div>
<br>
<a name="structured_values"></a>
<h1>Parsing Structured Values</h1><br>
<br>
The following types and functions allow it to build scanners for
 structured mail and MIME values in a highly configurable way.
<p>

 <b>Structured Values</b>
<p>

 RFC 822 (together with some other RFCs) defines lexical rules
 how formal mail header values should be divided up into tokens. Formal
 mail headers are those headers that are formed according to some
 grammar, e.g. mail addresses or MIME types.
<p>

    Some of the characters separate phrases of the value; these are
 the "special" characters. For example, '@' is normally a special
 character for mail addresses, because it separates the user name
 from the domain name (as in <code class="code">user@domain</code>). RFC 822 defines a fixed set
 of special
 characters, but other RFCs use different sets. Because of this,
 the following functions allow it to configure the set of special characters.
<p>

    Every sequence of characters may be embraced by double quotes,
 which means that the sequence is meant as literal data item;
 special characters are not recognized inside a quoted string. You may
 use the backslash to insert any character (including double quotes)
 verbatim into the quoted string (e.g. "He said: \"Give it to me!\"").
 The sequence of a backslash character and another character is called
 a quoted pair.
<p>

    Structured values may contain comments. The beginning of a comment
 is indicated by '(', and the end by ')'. Comments may be nested.
 Comments may contain quoted pairs. A
 comment counts as if a space character were written instead of it.
<p>

    Control characters are the ASCII characters 0 to 31, and 127.
 RFC 822 demands that mail headers are 7 bit ASCII strings. Because
 of this, this module also counts the characters 128 to 255 as
 control characters.
<p>

    Domain literals are strings embraced by '[' and ']'; such literals
 may contain quoted pairs. Today, domain literals are used to specify
 IP addresses (rare), e.g. <code class="code">user@[192.168.0.44]</code>.
<p>

    Every character sequence not falling in one of the above categories
 is an atom (a sequence of non-special and non-control characters).
 When recognized, atoms may be encoded in a character set different than
 US-ASCII; such atoms are called encoded words (see RFC 2047).
<p>

 <b>Scanning Using the Extended Interface</b>
<p>

 In order to scan a string containing a structured value, you must first
 create a <code class="code">mime_scanner</code> using the function <code class="code">create_mime_scanner</code>.
 The scanner contains the reference to the scanned string, and a 
 specification how the string is to be scanned. The specification
 consists of the lists <code class="code">specials</code> and <code class="code">scan_options</code>.
<p>

 The character list <code class="code">specials</code> specifies the set of special characters.
 These are the characters that are not regarded as part of atoms, 
 because they work as delimiters that separate atoms (like <code class="code">@</code> in the
 above example). In addition to this, when '"', '(', and '[' are
 seen as regular characters not delimiting quoted string, comments, and
 domain literals, respectively, these characters must also be added
 to <code class="code">specials</code>. In detail, these rules apply:
<p>

 <ul>
<li><b>Spaces:</b><ul>
<li>If <code class="code">' '</code> <i>in</i> <code class="code">specials</code>: A space character is returned as <code class="code">Special ' '</code>.
       Note that there may also be an effect on how comments are returned
       (see below).</li>
<li>If <code class="code">' '</code> <i>not in</i> <code class="code">specials</code>: Spaces are not returned, although
      they still delimit atoms.</li>
</ul>

   </li>
<li><b>Tabs, CRs, LFs:</b><ul>
<li>If <code class="code">'\t'</code> <i>in</i> <code class="code">specials</code>: A tab character is returned as 
      <code class="code">Special '\t'</code>.</li>
<li>If <code class="code">'\t'</code> <i>not in</i> <code class="code">specials</code>: Tabs are not returned, although
      they still delimit atoms.</li>
<li>If <code class="code">'\r'</code> <i>in</i> <code class="code">specials</code>: A CR character is returned as 
      <code class="code">Special '\r'</code>.</li>
<li>If <code class="code">'\r'</code> <i>not in</i> <code class="code">specials</code>: CRs are not returned, although
      they still delimit atoms.</li>
<li>If <code class="code">'\n'</code> <i>in</i> <code class="code">specials</code>: A LF character is returned as
      <code class="code">Special '\n'</code>.</li>
<li>If <code class="code">'\n'</code> <i>not in</i> <code class="code">specials</code>: LFs are not returned, although
      they still delimit atoms.</li>
</ul>

   </li>
<li><b>Comments:</b>
    <ul>
<li>If <code class="code">'('</code> <i>in</i> <code class="code">specials</code>: Comments are not recognized. The 
       character '(' is returned as <code class="code">Special '('</code>.</li>
<li>If <code class="code">'('</code> <i>not in</i> <code class="code">specials</code>: Comments are recognized. How comments
       are returned, depends on the following:<OL>
<li>If <code class="code">Return_comments</code> <i>in</i> <code class="code">scan_options</code>: Outer comments are
         returned as <code class="code">Comment</code> (note that inner comments are recognized but
         are not returned as tokens)</li>
<li>If otherwise <code class="code">' '</code> <i>in</i> <code class="code">specials</code>: Outer comments are returned as
         <code class="code">Special ' '</code></li>
<li>Otherwise: Comments are recognized but not returned at all.</li>
</OL>

       </li>
</ul>

  </li>
<li><b>Quoted strings:</b><ul>
<li>If <code class="code">'"'</code> <i>in</i> <code class="code">specials</code>: Quoted strings are not recognized, and
      double quotes are returned as <code class="code">Special '"'</code>.</li>
<li>If <code class="code">'"'</code> <i>not in</i> <code class="code">specials</code>: Quoted strings are returned as
      <code class="code">QString</code> tokens.</li>
</ul>

   </li>
<li><b>Domain literals:</b>
    <ul>
<li>If '[' <i>in</i> <code class="code">specials</code>: Domain literals are not recognized, and
       left brackets are returned as <code class="code">Special</code> '['.</li>
<li>If '[' <i>not in</i> <code class="code">specials</code>: Domain literals are returned as
       <code class="code">DomainLiteral</code> tokens.</li>
</ul>

   </li>
</ul>

<p>

 If recognized, quoted strings are returned as <code class="code">QString s</code>, where
 <code class="code">s</code> is the string without the embracing quotes, and with already
 decoded quoted pairs.
<p>

 Control characters <code class="code">c</code> are returned as <code class="code">Control c</code>.
<p>

 If recognized, comments may either be returned as spaces (in the case
 you are not interested in the contents of comments), or as <code class="code">Comment</code> tokens.
 The contents of comments are not further scanned; you must start a
 subscanner to analyze comments as structured values.
<p>

 If recognized, domain literals are returned as <code class="code">DomainLiteral s</code>, where
 <code class="code">s</code> is the literal without brackets, and with decoded quoted pairs.
<p>

 Atoms are returned as <code class="code">Atom s</code> where <code class="code">s</code> is a longest sequence of
 atomic characters (all characters which are neither special nor control
 characters nor delimiters for substructures). If the option
 <code class="code">Recognize_encoded_words</code> is on, atoms which look like encoded words
 are returned as <code class="code">EncodedWord</code> tokens. (Important note: Neither '?' nor
 '=' must be special in order to enable this functionality.)
<p>

 After the <code class="code">mime_scanner</code> has been created, you can scan the tokens by
 invoking <code class="code">scan_token</code> which returns one token at a time, or by invoking
 <code class="code">scan_token_list</code> which returns all following tokens.
<p>

 There are two token types: <code class="code">s_token</code> is the base type and is intended to
 be used for pattern matching. <code class="code">s_extended_token</code> is a wrapper that 
 additionally contains information where the token occurs.
<p>

 <b>Scanning Using the Simple Interface</b>
<p>

 Instead of creating a <code class="code">mime_scanner</code> and calling the scan functions,
 you may also invoke <code class="code">scan_structured_value</code>. This function returns the
 list of tokens directly; however, it is restricted to <code class="code">s_token</code>.
<p>

 <b>Examples</b>
<p>
<ul>
<li>Simple address: <pre><code class="code"> scan_structured_value "user@domain.com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 </code></pre></li>
<li>Spaces are not returned: <pre><code class="code"> scan_structured_value "user @ domain . com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 </code></pre></li>
<li>Comments are not returned: <pre><code class="code"> scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 </code></pre></li>
<li>Comments are indicated if requested: <pre><code class="code"> scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] 
     [ Return_comments ]
   = [ Atom "user"; Comment; Special '@'; Atom "domain"; Special '.'; 
       Atom "com" ]
 </code></pre></li>
<li>Spaces are returned if special: <pre><code class="code"> scan_structured_value "user (Do you know him?) @ domain . com" 
     [ '@'; '.'; ' ' ] []
   = [ Atom "user"; Special ' '; Special ' '; Special ' '; Special '@'; 
       Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
 </code></pre></li>
<li>Both spaces and comments are requested: <pre><code class="code"> scan_structured_value "user (Do you know him?) @ domain . com" 
     [ '@'; '.'; ' ' ] [ Return_comments ]
   = [ Atom "user"; Special ' '; Comment; Special ' '; Special '@'; 
       Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
 </code></pre></li>
<li>Another case: <pre><code class="code"> scan_structured_value "user @ domain . com" [ '@'; '.'; ' ' ] []
   = [ Atom "user"; Special ' '; Special '@'; Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
 </code></pre></li>
<li>'(' is special: <pre><code class="code"> scan_structured_value "user(Do you know him?)@domain.com" ['@'; '.'; '(']
     []
   = [ Atom "user"; Special '('; Atom "Do"; Atom "you"; Atom "know";
       Atom "him?)"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 </code></pre></li>
<li>Quoted strings: <pre><code class="code"> scan_structured_value "\"My.name\"@domain.com" [ '@'; '.' ] []
   = [ QString "My.name"; Special '@'; Atom "domain"; Special '.';
       Atom "com" ]
 </code></pre></li>
<li>Encoded words are not returned: <pre><code class="code"> scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
     [ ] [ ] 
   = [ Atom "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" ]
 </code></pre></li>
<li>Encoded words are returned if requested: <pre><code class="code"> scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
     [ ] [ Recognize_encoded_words ] 
   = [ EncodedWord(("ISO-8859-1",""), "Q", "Keld_J=F8rn_Simonsen") ]
 </code></pre></li>
</ul>
<br>
<br><code><span class="keyword">type</span> <a name="TYPEs_token"></a><code class="type"></code>s_token = </code><table class="typetable">
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Atom</span> <span class="keyword">of</span> <code class="type">string</code></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">EncodedWord</span> <span class="keyword">of</span> <code class="type">((string * string) * string * string)</code></code></td>
<td class="typefieldcomment" align="left" valign="top" ><code>(*</code></td><td class="typefieldcomment" align="left" valign="top" >Args: <code class="code">((charset,lang),encoding,encoded_word)</code></td><td class="typefieldcomment" align="left" valign="bottom" ><code>*)</code></td>
</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">QString</span> <span class="keyword">of</span> <code class="type">string</code></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Control</span> <span class="keyword">of</span> <code class="type">char</code></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Special</span> <span class="keyword">of</span> <code class="type">char</code></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">DomainLiteral</span> <span class="keyword">of</span> <code class="type">string</code></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Comment</span></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">End</span></code></td>

</tr></table>

<div class="info">
A token may be one of:<ul>
<li><code class="code">QString s</code>: The quoted string <code class="code">s</code>, i.e a string between double
   quotes. Quoted pairs are already decoded in <code class="code">s</code>.</li>
<li><code class="code">Control c</code>: The control character <code class="code">c</code> (0-31, 127, 128-255)</li>
<li><code class="code">Special c</code>: The special character <code class="code">c</code>, i.e. a character from 
   the <code class="code">specials</code> list</li>
<li><code class="code">DomainLiteral s</code>: The bracketed string <code class="code">s</code>, i.e. a string between
   brackets.  Quoted pairs are already decoded in <code class="code">s</code>.</li>
<li><code class="code">Comment</code>: A string between parentheses. This kind of token is only
   generated when the option <code class="code">Return_comments</code> is in effect.</li>
<li><code class="code">EncodedWord((charset,lang),encoding,encoded_word)</code>: An RFC-2047 style
   encoded word: <code class="code">charset</code> is the name of the character set; <code class="code">lang</code> is
   the language specifier (from RFC 2231) or ""; <code class="code">encoding</code> is either
   "Q" or "B"; and <code class="code">encoded_word</code> is the word encoded in <code class="code">charset</code> and
   <code class="code">encoding</code>. This kind of token is only generated when the option
   <code class="code">Recognize_encoded_words</code> is in effect (if not, <code class="code">Atom</code> is generated
   instead).</li>
<li><code class="code">Atom s</code>: A string which is neither quoted not bracketed nor 
   written in RFC 2047 notation, and which is not a control or special
   character, i.e. the "rest"</li>
<li><code class="code">End</code>: The end of the string</li>
</ul>
<br>
</div>

<br><code><span class="keyword">type</span> <a name="TYPEs_option"></a><code class="type"></code>s_option = </code><table class="typetable">
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">No_backslash_escaping</span></code></td>
<td class="typefieldcomment" align="left" valign="top" ><code>(*</code></td><td class="typefieldcomment" align="left" valign="top" >Do not handle backslashes in quoted string and comments as escape
 characters; backslashes are handled as normal characters.
 For example: The wrong qstring <code class="code">"C:\dir\file"</code> will be returned as
 <code class="code">QString "C:\dir\file"</code> when this option is in effect, and not as
 <code class="code">QString "C:dirfile"</code> as by default. 
 -- This is a common error in many MIME implementations.</td><td class="typefieldcomment" align="left" valign="bottom" ><code>*)</code></td>
</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Return_comments</span></code></td>
<td class="typefieldcomment" align="left" valign="top" ><code>(*</code></td><td class="typefieldcomment" align="left" valign="top" >Comments are returned as token <code class="code">Comment</code> (unless '(' is included
 in the list of special characters, in which case comments are
 not recognized at all).
 You may get the exact location of the comment by applying
 <code class="code">get_pos</code> and <code class="code">get_length</code> to the extended token.</td><td class="typefieldcomment" align="left" valign="bottom" ><code>*)</code></td>
</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span class="constructor">Recognize_encoded_words</span></code></td>
<td class="typefieldcomment" align="left" valign="top" ><code>(*</code></td><td class="typefieldcomment" align="left" valign="top" >Enables that encoded words are recognized and returned as
 <code class="code">EncodedWord</code> instead of <code class="code">Atom</code>.</td><td class="typefieldcomment" align="left" valign="bottom" ><code>*)</code></td>
</tr></table>


<pre><span class="keyword">type</span> <a name="TYPEs_extended_token"></a><code class="type"></code>s_extended_token </pre>
<div class="info">
An opaque type containing the information of <code class="code">s_token</code> plus:<ul>
<li>where the token occurs</li>
<li>RFC-2047 access functions</li>
</ul>
<br>
</div>

<pre><span class="keyword">val</span> <a name="VALget_token"></a>get_token : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> <a href="Mimestring.html#TYPEs_token">s_token</a></code></pre><div class="info">
Return the <code class="code">s_token</code> within the <code class="code">s_extended_token</code><br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_decoded_word"></a>get_decoded_word : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> string</code></pre><pre><span class="keyword">val</span> <a name="VALget_charset"></a>get_charset : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> string</code></pre><div class="info">
Return the decoded word (the contents of the word after decoding the
 "Q" or "B" representation), and the character set of the decoded word
 (uppercase).
<p>

 These functions not only work for <code class="code">EncodedWord</code>. The function
 <code class="code">get_decoded_word</code> returns for the other kinds of token:<ul>
<li><code class="code">Atom</code>: Returns the atom without decoding it</li>
<li><code class="code">QString</code>: Returns the characters inside the double quotes, and
   ensures that any quoted pairs are decoded</li>
<li><code class="code">Control</code>: Returns the one-character string</li>
<li><code class="code">Special</code>: Returns the one-character string</li>
<li><code class="code">DomainLiteral</code>: Returns the characters inside the brackets, and
   ensures that any quoted pairs are decoded</li>
<li><code class="code">Comment</code>: Returns <code class="code">""</code></li>
</ul>

 The function <code class="code">get_charset</code> returns <code class="code">"US-ASCII"</code> for them.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_language"></a>get_language : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> string</code></pre><div class="info">
Returns the language if the token is an <code class="code">EncodedWord</code>, and <code class="code">""</code> for
 all other tokens.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_pos"></a>get_pos : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> int</code></pre><div class="info">
Return the byte position where the token starts in the string 
 (the first byte has position 0)<br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_line"></a>get_line : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> int</code></pre><div class="info">
Return the line number where the token starts (numbering begins
 usually with 1)<br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_column"></a>get_column : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> int</code></pre><div class="info">
Return the column of the line where the token starts (first column
 is number 0)<br>
</div>
<pre><span class="keyword">val</span> <a name="VALget_length"></a>get_length : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> int</code></pre><div class="info">
Return the length of the token in bytes<br>
</div>
<pre><span class="keyword">val</span> <a name="VALseparates_adjacent_encoded_words"></a>separates_adjacent_encoded_words : <code class="type"><a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> -> bool</code></pre><div class="info">
True iff the current token is white space (i.e. <code class="code">Special ' '</code>, 
 <code class="code">Special '\t'</code>, <code class="code">Special '\r'</code> or <code class="code">Special '\n'</code>) and the last
 non-white space token was <code class="code">EncodedWord</code> and the next non-white
 space token will be <code class="code">EncodedWord</code>.
<p>

 The background of this function is that white space between
 encoded words does not have a meaning, and must be ignored
 by any application interpreting encoded words.<br>
</div>
<pre><span class="keyword">type</span> <a name="TYPEmime_scanner"></a><code class="type"></code>mime_scanner </pre>
<div class="info">
The opaque type of a scanner for structured values<br>
</div>

<pre><span class="keyword">val</span> <a name="VALcreate_mime_scanner"></a>create_mime_scanner : <code class="type">specials:char list -><br>       scan_options:<a href="Mimestring.html#TYPEs_option">s_option</a> list -><br>       ?pos:int -> ?line:int -> ?column:int -> string -> <a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a></code></pre><div class="info">
Creates a new <code class="code">mime_scanner</code> scanning the passed string.
<p>

<br>
</div>
<div class="param_info"><code class="code">specials</code> : The list of characters recognized as special characters.</div>
<div class="param_info"><code class="code">scan_options</code> : The list of global options modifying the behaviour
   of the scanner</div>
<div class="param_info"><code class="code">pos</code> : The position of the byte where the scanner starts in the
   passed string. Defaults to 0.</div>
<div class="param_info"><code class="code">line</code> : The line number of this first byte. Defaults to 1.</div>
<div class="param_info"><code class="code">column</code> : The column number of this first byte. Default to 0.</div>
<br>
Note for <code class="code">create_mime_scanner</code>:
<p>

 The optional parameters <code class="code">pos</code>, <code class="code">line</code>, <code class="code">column</code> are intentionally placed after
 <code class="code">scan_options</code> and before the string argument, so you can specify
 scanners by partially applying arguments to <code class="code">create_mime_scanner</code>
 which are not yet connected with a particular string:
 <pre><code class="code"> let my_scanner_spec = create_mime_scanner my_specials my_options in
 ...
 let my_scanner = my_scanner_spec my_string in 
 ...</code></pre><br>
<pre><span class="keyword">val</span> <a name="VALget_pos_of_scanner"></a>get_pos_of_scanner : <code class="type"><a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a> -> int</code></pre><pre><span class="keyword">val</span> <a name="VALget_line_of_scanner"></a>get_line_of_scanner : <code class="type"><a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a> -> int</code></pre><pre><span class="keyword">val</span> <a name="VALget_column_of_scanner"></a>get_column_of_scanner : <code class="type"><a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a> -> int</code></pre><div class="info">
Return the current position, line, and column of a <code class="code">mime_scanner</code>.
 The primary purpose of these functions is to simplify switching
 from one <code class="code">mime_scanner</code> to another within a string:
<p>

 <pre><code class="code"> let scanner1 = create_mime_scanner ... s in
 ... now scanning some tokens from s using scanner1 ...
 let scanner2 = create_mime_scanner ... 
                  ?pos:(get_pos_of_scanner scanner1)
                  ?line:(get_line_of_scanner scanner1)
                  ?column:(get_column_of_scanner scanner1)
                  s in
 ... scanning more tokens from s using scanner2 ... </code></pre>
<p>

 <b>Restriction:</b> These functions are not available if the option
 <code class="code">Recognize_encoded_words</code> is on. The reason is that this option
 enables look-ahead scanning; please use the location of the last
 scanned token instead.
<p>

 Note: To improve the performance of switching, it is recommended to
 create scanner specs in advance (see the example <code class="code">my_scanner_spec</code>
 above).<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_token"></a>scan_token : <code class="type"><a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a> -> <a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> * <a href="Mimestring.html#TYPEs_token">s_token</a></code></pre><div class="info">
Returns the next token, or <code class="code">End</code> if there is no more token. The 
 token is returned both as extended and as normal token.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_token_list"></a>scan_token_list : <code class="type"><a href="Mimestring.html#TYPEmime_scanner">mime_scanner</a> -><br>       (<a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> * <a href="Mimestring.html#TYPEs_token">s_token</a>) list</code></pre><div class="info">
Returns all following tokens as a list (excluding <code class="code">End</code>)<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_structured_value"></a>scan_structured_value : <code class="type">string -> char list -> <a href="Mimestring.html#TYPEs_option">s_option</a> list -> <a href="Mimestring.html#TYPEs_token">s_token</a> list</code></pre><div class="info">
This function is included for backwards compatibility, and for all
 cases not requiring extended tokens.
<p>

 It scans the passed string according to the list of special characters
 and the list of options, and returns the list of all tokens.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALspecials_rfc822"></a>specials_rfc822 : <code class="type">char list</code></pre><pre><span class="keyword">val</span> <a name="VALspecials_rfc2045"></a>specials_rfc2045 : <code class="type">char list</code></pre><div class="info">
The sets of special characters defined by the RFCs 822 and 2045.<br>
</div>
<br>
<a name="parsers_for_structured_values"></a>
<h1>Parsing Certain Forms of Structured Values</h1><br>
<pre><span class="keyword">val</span> <a name="VALscan_encoded_text_value"></a>scan_encoded_text_value : <code class="type">string -> <a href="Mimestring.html#TYPEs_extended_token">s_extended_token</a> list</code></pre><div class="info">
Scans a "text" value. The returned token list contains only
 <code class="code">Special</code>, <code class="code">Atom</code> and <code class="code">EncodedWord</code> tokens. 
 Spaces, TABs, CRs, LFs are returned (as <code class="code">Special</code>) unless
 they occur between adjacent encoded words in which case
 they are suppressed. The characters '(', '[', and '"' are also
 returned as <code class="code">Special</code> tokens, and are not interpreted as delimiters.
<p>

 For instance, this function can be used to scan the "Subject"
 field of mail messages.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_value_with_parameters"></a>scan_value_with_parameters : <code class="type">string -> <a href="Mimestring.html#TYPEs_option">s_option</a> list -> string * (string * string) list</code></pre><div class="info">
<code class="code">let name, params = scan_value_with_parameters s options</code>:
 Scans values with annotations like
    <code class="code">name ; p1=v1 ; p2=v2 ; ...</code>
 For example, MIME types like "text/plain;charset=ISO-8859-1" can
 be parsed.
<p>

 The values may or may not be quoted. The characters ";", "=", and
 even "," are only accepted as part of values when they are quoted.
 On sytax errors, the function fails.
<p>

 RFC 2231: This function supports some features of this RFC:
 Continued parameter values are concatenated. For example:
<p>

 <pre><code class="code"> Content-Type: message/external-body; access-type=URL;
    URL*0="ftp://";
    URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" </code></pre>
<p>

 This is returned as:
 <pre><code class="code">"message/external-body", 
   [ ("access-type", "URL");
     ("URL", "ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar") ]
      ) </code></pre>
<p>

 However, encoded parameter values are not handled specially. The
 parameter
   <code class="code">title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A</code>
 would be returned as
   <code class="code">("title*", "us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A")</code>.
 Use <code class="code">scan_values_with_parameters_ep</code> instead (see below).
<p>

 Raises <code class="code">Failure</code> on syntax errors.<br>
</div>
<pre><span class="keyword">type</span> <a name="TYPEs_param"></a><code class="type"></code>s_param </pre>
<div class="info">
The type of encoded parameters (RFC 2231)<br>
</div>

<pre><span class="keyword">val</span> <a name="VALparam_value"></a>param_value : <code class="type"><a href="Mimestring.html#TYPEs_param">s_param</a> -> string</code></pre><pre><span class="keyword">val</span> <a name="VALparam_charset"></a>param_charset : <code class="type"><a href="Mimestring.html#TYPEs_param">s_param</a> -> string</code></pre><pre><span class="keyword">val</span> <a name="VALparam_language"></a>param_language : <code class="type"><a href="Mimestring.html#TYPEs_param">s_param</a> -> string</code></pre><div class="info">
Return the decoded value of the parameter, the charset (uppercase),
 and the language.
 If the charset is not available, <code class="code">""</code> will be returned. 
 If the language is not available, <code class="code">""</code> will be returned.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALmk_param"></a>mk_param : <code class="type">?charset:string -> ?language:string -> string -> <a href="Mimestring.html#TYPEs_param">s_param</a></code></pre><div class="info">
Creates a parameter from a value (in decoded form). The parameter
 may have a charset and a language.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALprint_s_param"></a>print_s_param : <code class="type">Format.formatter -> <a href="Mimestring.html#TYPEs_param">s_param</a> -> unit</code></pre><div class="info">
Prints a parameter to the formatter (as toploop printer)<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_value_with_parameters_ep"></a>scan_value_with_parameters_ep : <code class="type">string -><br>       <a href="Mimestring.html#TYPEs_option">s_option</a> list -> string * (string * <a href="Mimestring.html#TYPEs_param">s_param</a>) list</code></pre><div class="info">
<code class="code">let name, params = scan_value_with_parameters_ep s options</code>:
 This version of the scanner copes with encoded parameters according
 to RFC 2231.
 Note: "ep" means "encoded parameters".
<p>

 Example:
   <code class="code">doc.html;title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A</code>
<p>

 The parameter <code class="code">title</code> would be returned as:<ul>
<li>name is <code class="code">"title"</code></li>
<li>value is <code class="code">"This is ***fun***"</code></li>
<li>charset is <code class="code">"US-ASCII"</code></li>
<li>language is <code class="code">"en-us"</code></li>
</ul>

 Raises <code class="code">Failure</code> on syntax errors.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_mime_type"></a>scan_mime_type : <code class="type">string -> <a href="Mimestring.html#TYPEs_option">s_option</a> list -> string * (string * string) list</code></pre><div class="info">
<code class="code">let name, params = scan_mime_type s options</code>:
 Scans MIME types like
    <code class="code">text/plain; charset=iso-8859-1</code>
 The name of the type and the names of the parameters are converted
 to lower case.
<p>

 Raises <code class="code">Failure</code> on syntax errors.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_mime_type_ep"></a>scan_mime_type_ep : <code class="type">string -><br>       <a href="Mimestring.html#TYPEs_option">s_option</a> list -> string * (string * <a href="Mimestring.html#TYPEs_param">s_param</a>) list</code></pre><div class="info">
<code class="code">let name, params = scan_mime_type_ep s options</code>:
 This version copes with RFC-2231-encoded parameters.
<p>

 Raises <code class="code">Failure</code> on syntax errors.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALsplit_mime_type"></a>split_mime_type : <code class="type">string -> string * string</code></pre><div class="info">
<code class="code">let (main_type, sub_type) = split_mime_type content_type</code>:
 Splits the MIME type into main and sub type, for example
 <code class="code"> split_mime_type "text/plain" = ("text", "plain") </code>.
 The returned strings are always lowercase.
<p>

 Raises <code class="code">Failure</code> on syntax errors.<br>
</div>
<br>
<a name="printers_for_structured_values"></a>
<h1>Printing Structured Values</h1><br>
<pre><span class="keyword">exception</span> <a name="EXCEPTIONLine_too_long"></a>Line_too_long</pre>
<div class="info">
Raised when the hard limit of the line length is exceeded<br>
</div>
<pre><span class="keyword">val</span> <a name="VALwrite_value"></a>write_value : <code class="type">?maxlen1:int -><br>       ?maxlen:int -><br>       ?hardmaxlen1:int -><br>       ?hardmaxlen:int -><br>       ?fold_qstring:bool -><br>       ?fold_literal:bool -><br>       ?unused:int Pervasives.ref -><br>       ?hardunused:int Pervasives.ref -><br>       <a href="Netchannels.out_obj_channel.html">Netchannels.out_obj_channel</a> -> <a href="Mimestring.html#TYPEs_token">s_token</a> list -> unit</code></pre><div class="info">
Writes the list of <code class="code">s_token</code> to the <code class="code">out_obj_channel</code>. The value
 is optionally folded into several lines while writing, but this
 is off by default. To enable folding, pass <b>both</b> <code class="code">maxlen1</code> and
 <code class="code">maxlen</code>:
 The <code class="code">maxlen1</code> parameter specifies the length of the first line
 to write, the <code class="code">maxlen</code> parameter specifies the length of the
 other lines.
<p>

 If enabled, folding tries to ensure that the value is written
 in several lines that are not longer as specified by 
 <code class="code">maxlen1</code> and <code class="code">maxlen</code>. The value is split into lines by inserting
 "folding space" at certain locations (which is usually a linefeed
 followed by a space character, see below). The following
 table specifies between which tokens folding may happen:
<p>

 <pre><code class="code">               +=========================================================+
 1st   \   2nd | Atom | QString | DLiteral | EncWord | Special | Spec ' '|
 ==============+======+=========+==========+=========+=========+=========+
          Atom | FS   |  FS     |   FS     |   FS    |    -    |    F    |
       QString | FS   |  FS     |   FS     |   FS    |    -    |    F    |
 DomainLiteral | FS   |  FS     |   FS     |   FS    |    -    |    F    |
   EncodedWord | FS   |  FS     |   FS     |   FS    |    -    |    F    |
       Special | -    |  -      |   -      |   -     |    -    |    F    |
   Special ' ' | -    |  -      |   -      |   -     |    -    |    -    |
 ==============+======+=========+==========+=========+=========+=========+
</code></pre>
<p>

 The table shows between which two types of tokens a space or a folding
 space is inserted:<ul>
<li><code class="code">FS</code>: folding space</li>
<li><code class="code">F</code>:  linefeed without extra space</li>
<li><code class="code">-</code>:  nothing can be inserted here</li>
</ul>

 Folding space is <code class="code">"\n "</code>, i.e. only LF, not CRLF is used as end-of-line
 character. The function <code class="code">write_header</code> will convert these LF to CRLF
 if needed.
<p>

 <code class="code">Special '\t'</code> is handled like <code class="code">Special ' '</code>. Control characters are just
 printed, without folding. Comments, however, are substituted by 
 either space or folding space. The token <code class="code">End</code> is ignored.
<p>

 Furthermore, folding may also happen within tokens:<ul>
<li><code class="code">Atom</code>, <code class="code">Control</code>, and <code class="code">Special</code> are never split up into parts.
   They are simply printed.</li>
<li><code class="code">EncodedWord</code>s, however, are reformatted. This especially means:
   adjacent encoded words are first concatenated if possible
   (same character set, same encoding, same language), and then
   split up into several pieces with optimally chosen lengths.
   <b>Note:</b> Because this function gets <code class="code">s_token</code> as input and not
   <code class="code">s_extended_token</code>, it is not known whether <code class="code">Special ' '</code> tokens
   (or other whitespace) between adjacent EncodedWords must be
   ignored. Because of this, <code class="code">write_value</code> only reformats adjacent encoded 
   words when there is not any whitespace between them.</li>
<li><code class="code">QString</code> may be split up in a special way unless <code class="code">fold_qstring</code>
   is set to <code class="code">false</code>. For example, <code class="code">"One Two  Three"</code> may be split up into
   three lines <code class="code">"One\n Two\n \ Three"</code>. Because some header fields
   explicitly forbid folding of quoted strings, it is possible to
   set <code class="code">~fold_qstring:false</code> (it is <code class="code">true</code> by default).
   <b>Note:</b> Software should not rely on that the different types of
   whitespace (especially space and TAB) remain intact at the
   beginning of a line. Furthermore, it may also happen that 
   additional whitespace is added at the end of a line by the
   transport layer.</li>
<li><code class="code">DomainLiteral</code>: These are handled like <code class="code">QString</code>. The parameter
   <code class="code">~fold_literal:false</code> turns folding off if it must be prevented,
   it is <code class="code">true</code> by default.</li>
<li><code class="code">Comment</code>: Comments are effectively omitted! Instead of <code class="code">Comment</code>,
   a space or folding space is printed. However, you can output comments
   by passing sequences like <code class="code"> Special "("; ...; Special ")" </code>.</li>
</ul>

 It is possible to get the actual number of characters back that
 can still be printed into the last line without making the line
 too long. Pass an <code class="code">int ref</code> as <code class="code">unused</code> to get this value (it may
 be negative!). Pass an
 <code class="code">int ref</code> as <code class="code">hardunused</code> to get the number of characters that may
 be printed until the hard limit is exceeded.
<p>

 The function normally does not fail when a line becomes too long,
 i.e. it exceeds <code class="code">maxlen1</code> or <code class="code">maxlen</code>.
 However, it is possible to specify a hard maximum length
 (<code class="code">hardmaxlen1</code> and <code class="code">hardmaxlen</code>). If these are exceeded, the function
 will raise <code class="code">Line_too_long</code>.
<p>

 For electronic mail, a <code class="code">maxlen</code> of 78 and a <code class="code">hardmaxlen</code> of 998 is
 recommended.
<p>

 <b>Known Problems:</b> <ul>
<li>The reformatter for EncodedWords takes into
   account that multi-byte characters must not be split up. However,
   this works only when the multi-byte character set is known
   to <code class="code">Netconversion</code>. You can assume that UTF-8 and UTF-16 always
   work. If the character set is not known the reformatter may
   split the string at wrong positions.</li>
<li>The reformatter for EncodedWords may parse the token, and if
   this fails, you will get the exception <code class="code">Malformed_code</code>.
   This is only done in some special cases, however.</li>
<li>The function prints spaces between adjacent atoms. Although
   this is allowed in principal, other MIME implementations might fail when
   there are spaces at unexpected locations. Workaround: If
   no spaces are desired, concatenate adjacent atoms before
   passing them to this function.</li>
</ul>

 <b>Further Tips:</b><ul>
<li>Pass ~maxlen1:0 and ~maxlen:0 to get shortest lines</li>
<li>Use the reformatter for encoded words! It works well. For
   example, to output a long sentence, just wrap it into
   <b>one</b> <code class="code">EncodedWord</code>. The reformatter takes care to
   fold the word into several lines.</li>
</ul>
<br>
</div>
<pre><span class="keyword">val</span> <a name="VALparam_tokens"></a>param_tokens : <code class="type">?maxlen:int -> (string * <a href="Mimestring.html#TYPEs_param">s_param</a>) list -> <a href="Mimestring.html#TYPEs_token">s_token</a> list</code></pre><div class="info">
Formats a parameter list. For example, 
 <code class="code">[ "a", "b"; "c", "d" ]</code> is transformed to the token sequence
 corresponding to <code class="code">; a=b; c=d</code>.
 If <code class="code">maxlen</code> is specified, it is ensured that the individual
 parameter (e.g. <code class="code">"a=b;"</code>) is not longer than <code class="code">maxlen-1</code>, such that
 it will fit into a line with maximum length <code class="code">maxlen</code>.
 By default, no maximum length is guaranteed.
 If <code class="code">maxlen</code> is passed, or if a parameter specifies a character
 set or language, the encoding of RFC 2231 will be applied. If these
 conditions are not met, the parameters will be encoded traditionally.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALsplit_uri"></a>split_uri : <code class="type">string -> <a href="Mimestring.html#TYPEs_token">s_token</a> list</code></pre><div class="info">
Splits a long URI according to the algorithm of RFC 2017.
 The input string must only contain 7 bit characters, and
 must be, if necessary, already be URL-encoded.<br>
</div>
<br>
<a name="scanning_mime"></a>
<h1>Scanning MIME Messages</h1><br>
<pre><span class="keyword">val</span> <a name="VALscan_multipart_body"></a>scan_multipart_body : <code class="type">string -><br>       start_pos:int -><br>       end_pos:int -> boundary:string -> ((string * string) list * string) list</code></pre><div class="info">
<code class="code">let [params1, value1; params2, value2; ...]
   = scan_multipart_body s start_pos end_pos boundary</code>:
<p>

 Scans the string <code class="code">s</code> that is the body of a multipart message.
 The multipart message begins at position <code class="code">start_pos</code> in <code class="code">s</code>, and 
 <code class="code">end_pos</code> is the position
 of the character following the message. In <code class="code">boundary</code> the boundary string
 must be passed (this is the "boundary" parameter of the multipart
 MIME type, e.g. <code class="code">multipart/mixed;boundary="some string"</code> ).
<p>

     The return value is the list of the parts, where each part
 is returned as pair <code class="code">(params, value)</code>. The left component <code class="code">params</code>
 is the list of name/value pairs of the header of the part. The
 right component is the raw content of the part, i.e. if the part
 is encoded ("content-transfer-encoding"), the content is returned
 in the encoded representation. The caller is responsible for decoding
 the content.
<p>

     The material before the first boundary and after the last
 boundary is not returned.
<p>

 <b>Multipart Messages</b>
<p>

 The MIME standard defines a way to group several message parts to
 a larger message (for E-Mails this technique is known as "attaching"
 files to messages); these are the so-called multipart messages.
 Such messages are recognized by the major type string "multipart",
 e.g. <code class="code">multipart/mixed</code> or <code class="code">multipart/form-data</code>. Multipart types MUST
 have a <code class="code">boundary</code> parameter because boundaries are essential for the
 representation.
<p>

    Multipart messages have a format like (where "_" denotes empty lines):
 <pre><code class="code"> ...Header...
 Content-type: multipart/xyz; boundary="abc"
 ...Header...
 _
 Body begins here ("prologue")
 --abc
 ...Header part 1...
 _
 ...Body part 1...
 --abc
 ...Header part 2...
 _
 ...Body part 2
 --abc
 ...
 --abc--
 Epilogue </code></pre>
<p>

 The parts are separated by boundary lines which begin with "--" and
 the string passed as boundary parameter. (Note that there may follow
 arbitrary text on boundary lines after "--abc".) The boundary is
 chosen such that it does not occur as prefix of any line of the
 inner parts of the message.
<p>

     The parts are again MIME messages, with header and body. Note
 that it is explicitely allowed that the parts are even multipart
 messages.
<p>

     The texts before the first boundary and after the last boundary
 are ignored.
<p>

     Note that multipart messages as a whole MUST NOT be encoded.
 Only the PARTS of the messages may be encoded (if they are not
 multipart messages themselves).
<p>

 Please read RFC 2046 if want to know the gory details of this
 brain-dead format.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_multipart_body_and_decode"></a>scan_multipart_body_and_decode : <code class="type">string -><br>       start_pos:int -><br>       end_pos:int -> boundary:string -> ((string * string) list * string) list</code></pre><div class="info">
Same as <code class="code">scan_multipart_body</code>, but decodes the bodies of the parts
 if they are encoded using the methods "base64" or "quoted printable".
 Fails, if an unknown encoding is used.<br>
</div>
<pre><span class="keyword">val</span> <a name="VALscan_multipart_body_from_netstream"></a>scan_multipart_body_from_netstream : <code class="type"><a href="Netstream.in_obj_stream.html">Netstream.in_obj_stream</a> -><br>       boundary:string -><br>       create:((string * string) list -> 'a) -><br>       add:('a -> <a href="Netstream.in_obj_stream.html">Netstream.in_obj_stream</a> -> int -> int -> unit) -><br>       stop:('a -> unit) -> unit</code></pre><div class="info">
<code class="code">scan_multipart_body_from_netstream s boundary create add stop</code>:
<p>

 Reads the MIME message from the netstream <code class="code">s</code> block by block. The
 parts are delimited by the <code class="code">boundary</code>.
<p>

 Once a new part is detected and begins, the function <code class="code">create</code> is
 called with the MIME header as argument. The result <code class="code">p</code> of this function
 may be of any type.
<p>

 For every chunk of the part that is being read, the function <code class="code">add</code>
 is invoked: <code class="code">add p s k n</code>.
<p>

 Here, <code class="code">p</code> is the value returned by the <code class="code">create</code> invocation for the
 current part. <code class="code">s</code> is the netstream. The current window of <code class="code">s</code> contains
 the read chunk completely; the chunk begins at position <code class="code">k</code> of the
 window (relative to the beginning of the window) and has a length
 of <code class="code">n</code> bytes.
<p>

 When the part has been fully read, the function <code class="code">stop</code> is
 called with <code class="code">p</code> as argument.
<p>

 That means, for every part the following is executed:<ul>
<li><code class="code">let p = create h</code></li>
<li><code class="code">add p s k1 n1</code></li>
<li><code class="code">add p s k2 n2</code></li>
<li>...</li>
<li><code class="code">add p s kN nN</code></li>
<li><code class="code">stop p</code></li>
</ul>

 <b>Important Precondition:</b><ul>
<li>The block size of the netstream <code class="code">s</code> must be at least
   <code class="code">String.length boundary + 4</code></li>
</ul>

 <b>Exceptions:</b><ul>
<li>Exceptions can happen because of ill-formed input, and within
   the callbacks of the functions <code class="code">create</code>, <code class="code">add</code>, <code class="code">stop</code>.</li>
<li>If the exception happens while part <code class="code">p</code> is being read, and the
   <code class="code">create</code> function has already been called (successfully), the
   <code class="code">stop</code> function is also called (you have the chance to close files).
   The exception is re-raised after <code class="code">stop</code> returns.</li>
</ul>
<br>
</div>
<pre><span class="keyword">val</span> <a name="VALread_multipart_body"></a>read_multipart_body : <code class="type">(<a href="Netstream.in_obj_stream.html">Netstream.in_obj_stream</a> -> 'a) -><br>       string -> <a href="Netstream.in_obj_stream.html">Netstream.in_obj_stream</a> -> 'a list</code></pre><div class="info">
This is the "next generation" multipart message parser. It is 
 called as follows:
<p>

   <code class="code">let parts = read_multipart_body f boundary s</code>
<p>

 As precondition, the current position of the stream <code class="code">s</code> must be at
 the beginning of the message body. The string <code class="code">boundary</code> must
 be the message boundary (without "--"). The function <code class="code">f</code> is called
 for every message part, and the resulting list <code class="code">parts</code> is the
 concatentation of the values returned by <code class="code">f</code>. 
<p>

 The stream passed to <code class="code">f</code> is a substream of <code class="code">s</code> that begins at the
 first byte of the header of the message part. The function <code class="code">f</code>
 can read data from the substream as necessary. The substream
 terminates at the end of the message part. This means that <code class="code">f</code> can simply
 read the data of the substream from the beginning to the end. It is
 not necessary that <code class="code">f</code> reads the substream until EOF, however.
<p>

 After all parts have been read, the trailing material of stream <code class="code">s</code> 
 is skipped until EOF of <code class="code">s</code> is reached.<br>
</div>
<br>
<a name="helpers_mime"></a>
<h1>Helpers for MIME Messages</h1><br>
<pre><span class="keyword">val</span> <a name="VALcreate_boundary"></a>create_boundary : <code class="type">?random:string list -> ?nr:int -> unit -> string</code></pre><div class="info">
Creates a boundary string that can be used to separate multipart
 messages.
 The string is 63 characters long and has the following "features":<ul>
<li>Most of the string consists of the minus character yielding
   a clear optical effect</li>
<li>The string contains "=__". This sequence cannot be obtained
   by the quoted-printable encoding, so you need not to care whether
   strings encoded as quoted-printable contain the boundary.</li>
<li>The string contains "&lt;&amp;&gt;;" which is illegal in HTML, XML, and
   SGML.</li>
<li>The string does not contain double quotes or backslashes,
   so you can safely put double quotes around it in the MIME header.</li>
<li>The string contains <code class="code">nr</code>, so you can safely distinguish between
   several boundaries occurring in the same MIME body if you 
   assign different <code class="code">nr</code>.</li>
<li>The string contains a hash value composed of the first
   256 bytes of all strings passed as <code class="code">random</code>, and influenced
   by the current GC state.</li>
</ul>
<br>
</div>
</body></html>