Sophie

Sophie

distrib > Mageia > 3 > x86_64 > by-pkgid > bf80f380eb00c13ddf07907738654985 > files > 9

ocaml-ulex-devel-1.1-10.mga3.x86_64.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link rel="stylesheet" href="style.css" type="text/css">
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
<link rel="Start" href="index.html">
<link rel="Up" href="index.html">
<link title="Index of types" rel=Appendix href="index_types.html">
<link title="Index of exceptions" rel=Appendix href="index_exceptions.html">
<link title="Index of values" rel=Appendix href="index_values.html">
<link title="Index of modules" rel=Appendix href="index_modules.html">
<link title="Ulexing" rel="Chapter" href="Ulexing.html"><link title="Clients interface" rel="Section" href="#6_Clientsinterface">
<link title="Interface for lexers semantic actions" rel="Section" href="#6_Interfaceforlexerssemanticactions">
<link title="Internal interface" rel="Section" href="#6_Internalinterface">
<title>Ulexing</title>
</head>
<body>
<div class="navbar">&nbsp;<a class="up" href="index.html" title="Index">Up</a>
&nbsp;</div>
<h1>Module <a href="type_Ulexing.html">Ulexing</a></h1>
<pre><span class="keyword">module</span> Ulexing: <code class="code">sig</code> <a href="Ulexing.html">..</a> <code class="code">end</code></pre><div class="info">
Runtime support for lexers generated by <code class="code">ulex</code>.
  This module is roughly equivalent to the module Lexing from 
  the OCaml standard library, except that its lexbuffers handles
  Unicode code points (OCaml type: <code class="code">int</code> in the range
  <code class="code">0..0x10ffff</code>) instead of bytes (OCaml type: <code class="code">char</code>).
<p>

  It is possible to have ulex-generated lexers work on a custom
  implementation for lex buffers. To do this, define a module <code class="code">L</code> which
  implements the <code class="code">start</code>, <code class="code">next</code>, <code class="code">mark</code> and <code class="code">backtrack</code> functions
  (See the Internal Interface section below for a specification),
  and the <code class="code">Error</code> exception.  
  They need not work on a type named <code class="code">lexbuf</code>: you can use the type
  name you want. Then, just do in your ulex-processed source, before
  the first lexer specification:
<p>

  <code class="code">module Ulexing = L</code>
<p>

  Of course, you'll probably want to define functions like <code class="code">lexeme</code>
  to be used in the lexers semantic actions.<br>
</div>
<hr width="100%">
<pre><span id="TYPElexbuf"><span class="keyword">type</span> <code class="type"></code>lexbuf</span> </pre>
<div class="info">
The type of lexer buffers. A lexer buffer is the argument passed
    to the scanning functions defined by the generated lexers.
    The lexer buffer holds the internal information for the
    scanners, including the code points of the token currently scanned,
    its position from the beginning of the input stream,
    and the current position of the lexer.<br>
</div>

<pre><span id="EXCEPTIONError"><span class="keyword">exception</span> Error</span></pre>
<div class="info">
Raised by a lexer when it cannot parse a token from the lexbuf. 
    The functions <code class="code">Ulexing.lexeme_start</code> (resp. <code class="code">Ulexing.lexeme_end</code>) can be 
    used to find to positions of the first code point of the current
    matched substring (resp. the first code point that yield the error).<br>
</div>
<pre><span id="EXCEPTIONInvalidCodepoint"><span class="keyword">exception</span> InvalidCodepoint</span> <span class="keyword">of</span> <code class="type">int</code></pre>
<div class="info">
Raised by some functions to signal that some code point is not
    compatible with a specified encoding.<br>
</div>
<br>
<h6 id="6_Clientsinterface">Clients interface</h6><br>
<pre><span id="VALcreate"><span class="keyword">val</span> create</span> : <code class="type">(int array -> int -> int -> int) -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a generic lexer buffer.  When the lexer needs more
    characters, it will call the given function, giving it an array of
    integers <code class="code">a</code>, a position <code class="code">pos</code> and a code point count <code class="code">n</code>.  The
    function should put <code class="code">n</code> code points or less in <code class="code">a</code>, starting at
    position <code class="code">pos</code>, and return the number of characters provided. A
    return value of 0 means end of input.<br>
</div>
<pre><span id="VALfrom_stream"><span class="keyword">val</span> from_stream</span> : <code class="type">int Stream.t -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a stream of Unicode code points.<br>
</div>
<pre><span id="VALfrom_int_array"><span class="keyword">val</span> from_int_array</span> : <code class="type">int array -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from an array of Unicode code points.<br>
</div>
<pre><span id="VALfrom_latin1_stream"><span class="keyword">val</span> from_latin1_stream</span> : <code class="type">char Stream.t -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a Latin1 encoded stream (ie a stream
    of Unicode code points in the range <code class="code">0..255</code>)<br>
</div>
<pre><span id="VALfrom_latin1_channel"><span class="keyword">val</span> from_latin1_channel</span> : <code class="type">Pervasives.in_channel -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a Latin1 encoded input channel.
    The client is responsible for closing the channel.<br>
</div>
<pre><span id="VALfrom_latin1_string"><span class="keyword">val</span> from_latin1_string</span> : <code class="type">string -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a Latin1 encoded string.<br>
</div>
<pre><span id="VALfrom_utf8_stream"><span class="keyword">val</span> from_utf8_stream</span> : <code class="type">char Stream.t -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a UTF-8 encoded stream.<br>
</div>
<pre><span id="VALfrom_utf8_channel"><span class="keyword">val</span> from_utf8_channel</span> : <code class="type">Pervasives.in_channel -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a UTF-8 encoded input channel.<br>
</div>
<pre><span id="VALfrom_utf8_string"><span class="keyword">val</span> from_utf8_string</span> : <code class="type">string -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a UTF-8 encoded string.<br>
</div>
<pre><code><span id="TYPEenc"><span class="keyword">type</span> <code class="type"></code>enc</span> = </code></pre><table class="typetable">
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span id="TYPEELTenc.Ascii"><span class="constructor">Ascii</span></span></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span id="TYPEELTenc.Latin1"><span class="constructor">Latin1</span></span></code></td>

</tr>
<tr>
<td align="left" valign="top" >
<code><span class="keyword">|</span></code></td>
<td align="left" valign="top" >
<code><span id="TYPEELTenc.Utf8"><span class="constructor">Utf8</span></span></code></td>

</tr></table>


<pre><span id="VALfrom_var_enc_stream"><span class="keyword">val</span> from_var_enc_stream</span> : <code class="type"><a href="Ulexing.html#TYPEenc">enc</a> Pervasives.ref -> char Stream.t -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Create a lexbuf from a stream whose encoding is subject
    to change during lexing. The reference can be changed at any point.
    Note that bytes that have been consumed by the lexer buffer
    are not re-interpreted with the new encoding.
<p>

    In <code class="code">Ascii</code> mode, non-ASCII bytes (ie <code class="code">&gt;127</code>) in the stream
    raise an <code class="code">InvalidCodepoint</code> exception.<br>
</div>
<pre><span id="VALfrom_var_enc_string"><span class="keyword">val</span> from_var_enc_string</span> : <code class="type"><a href="Ulexing.html#TYPEenc">enc</a> Pervasives.ref -> string -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Same as <code class="code">Ulexing.from_var_enc_stream</code> with a string as input.<br>
</div>
<pre><span id="VALfrom_var_enc_channel"><span class="keyword">val</span> from_var_enc_channel</span> : <code class="type"><a href="Ulexing.html#TYPEenc">enc</a> Pervasives.ref -> Pervasives.in_channel -> <a href="Ulexing.html#TYPElexbuf">lexbuf</a></code></pre><div class="info">
Same as <code class="code">Ulexing.from_var_enc_stream</code> with a channel as input.<br>
</div>
<br>
<h6 id="6_Interfaceforlexerssemanticactions">Interface for lexers semantic actions</h6><br>
<br>
The following functions can be called from the semantic actions of
  lexer definitions.  They give access to the character string matched
  by the regular expression associated with the semantic action. These
  functions must be applied to the argument <code class="code">lexbuf</code>, which, in the
  code generated by <code class="code">ulex</code>, is bound to the lexer buffer passed to the
  parsing function.
<p>

  These functions can also be called when capturing a <code class="code">Ulexing.Error</code> 
  exception to retrieve the problematic string.<br>
<pre><span id="VALlexeme_start"><span class="keyword">val</span> lexeme_start</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
<code class="code">Ulexing.lexeme_start lexbuf</code> returns the offset in the
    input stream of the first code point of the matched string.
    The first code point of the stream has offset 0.<br>
</div>
<pre><span id="VALlexeme_end"><span class="keyword">val</span> lexeme_end</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
<code class="code">Ulexing.lexeme_end lexbuf</code> returns the offset in the input stream
   of the character following the last code point of the matched
   string. The first character of the stream has offset 0.<br>
</div>
<pre><span id="VALloc"><span class="keyword">val</span> loc</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int * int</code></pre><div class="info">
<code class="code">Ulexing.loc lexbuf</code> returns the pair 
  <code class="code">(Ulexing.lexeme_start lexbuf,Ulexing.lexeme_end lexbuf)</code>.<br>
</div>
<pre><span id="VALlexeme_length"><span class="keyword">val</span> lexeme_length</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
<code class="code">Ulexing.loc lexbuf</code> returns the difference 
  <code class="code">(Ulexing.lexeme_end lexbuf) - (Ulexing.lexeme_start lexbuf)</code>,
  that is, the length (in code points) of the matched string.<br>
</div>
<pre><span id="VALlexeme"><span class="keyword">val</span> lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int array</code></pre><div class="info">
<code class="code">Ulexing.lexeme lexbuf</code> returns the string matched by
  the regular expression as an array of Unicode code point.<br>
</div>
<pre><span id="VALget_buf"><span class="keyword">val</span> get_buf</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int array</code></pre><div class="info">
Direct access to the internal buffer.<br>
</div>
<pre><span id="VALget_start"><span class="keyword">val</span> get_start</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
Direct access to the starting position of the lexeme in the
      internal buffer.<br>
</div>
<pre><span id="VALget_pos"><span class="keyword">val</span> get_pos</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
Direct access to the current position (end of lexeme) in the
      internal buffer.<br>
</div>
<pre><span id="VALlexeme_char"><span class="keyword">val</span> lexeme_char</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> int</code></pre><div class="info">
<code class="code">Ulexing.lexeme_char lexbuf pos</code> returns code point number <code class="code">pos</code> in
      the matched string.<br>
</div>
<pre><span id="VALsub_lexeme"><span class="keyword">val</span> sub_lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> int -> int array</code></pre><div class="info">
<code class="code">Ulexing.lexeme lexbuf pos len</code> returns a substring of the string
  matched by the regular expression as an array of Unicode code point.<br>
</div>
<pre><span id="VALlatin1_lexeme"><span class="keyword">val</span> latin1_lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> string</code></pre><div class="info">
As <code class="code">Ulexing.lexeme</code> with a result encoded in Latin1.
  This function throws an exception <code class="code">InvalidCodepoint</code> if it is not possible
  to encode the result in Latin1.<br>
</div>
<pre><span id="VALlatin1_sub_lexeme"><span class="keyword">val</span> latin1_sub_lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> int -> string</code></pre><div class="info">
As <code class="code">Ulexing.sub_lexeme</code> with a result encoded in Latin1.
  This function throws an exception <code class="code">InvalidCodepoint</code> if it is not possible
  to encode the result in Latin1.<br>
</div>
<pre><span id="VALlatin1_lexeme_char"><span class="keyword">val</span> latin1_lexeme_char</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> char</code></pre><div class="info">
As <code class="code">Ulexing.lexeme_char</code> with a result encoded in Latin1.
  This function throws an exception <code class="code">InvalidCodepoint</code> if it is not possible
  to encode the result in Latin1.<br>
</div>
<pre><span id="VALutf8_lexeme"><span class="keyword">val</span> utf8_lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> string</code></pre><div class="info">
As <code class="code">Ulexing.lexeme</code> with a result encoded in UTF-8.<br>
</div>
<pre><span id="VALutf8_sub_lexeme"><span class="keyword">val</span> utf8_sub_lexeme</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> int -> string</code></pre><div class="info">
As <code class="code">Ulexing.sub_lexeme</code> with a result encoded in UTF-8.<br>
</div>
<pre><span id="VALrollback"><span class="keyword">val</span> rollback</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> unit</code></pre><div class="info">
<code class="code">Ulexing.rollback lexbuf</code> puts <code class="code">lexbuf</code> back in its configuration before
  the last lexeme was matched. It is then possible to use another
  lexer to parse the same characters again. The other functions
  above in this section should not be used in the semantic action
  after a call to <code class="code">Ulexing.rollback</code>.<br>
</div>
<br>
<h6 id="6_Internalinterface">Internal interface</h6><br>
<br>
These functions are used internally by the lexers. They could be used
  to write lexers by hand, or with a lexer generator different from
  <code class="code">ulex</code>. The lexer buffers have a unique internal slot that can store
  an integer. They also store a "backtrack" position.<br>
<pre><span id="VALstart"><span class="keyword">val</span> start</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> unit</code></pre><div class="info">
<code class="code">Ulexing.start lexbuf</code> informs the lexer buffer that any
  code points until the current position can be discarded.
  The current position become the "start" position as returned
  by <code class="code">Ulexing.lexeme_start</code>. Moreover, the internal slot is set to
  <code class="code">-1</code> and the backtrack position is set to the current position.<br>
</div>
<pre><span id="VALnext"><span class="keyword">val</span> next</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
<code class="code">Ulexing.next lexbuf next</code> extracts the next code point from the
  lexer buffer and increments to current position. If the input stream
  is exhausted, the function returns <code class="code">-1</code>.<br>
</div>
<pre><span id="VALmark"><span class="keyword">val</span> mark</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int -> unit</code></pre><div class="info">
<code class="code">Ulexing.mark lexbuf i</code> stores the integer <code class="code">i</code> in the internal
  slot. The backtrack position is set to the current position.<br>
</div>
<pre><span id="VALbacktrack"><span class="keyword">val</span> backtrack</span> : <code class="type"><a href="Ulexing.html#TYPElexbuf">lexbuf</a> -> int</code></pre><div class="info">
<code class="code">Ulexing.backtrack lexbuf</code> returns the value stored in the
  internal slot of the buffer, and performs backtracking
  (the current position is set to the value of the backtrack position).<br>
</div>
</body></html>