<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" > <head><meta content="text/html;charset=&quot;utf-8&quot;" http-equiv="Content-type"/><link href="faldoc.css" rel="stylesheet" type="text/css"/><title> - Class Tokenizer</title></head><body class="faldoc"><ul class="navi_top"><li class="top"><a href="index.html">Top: Table of contents</a></li> <li class="up"><a href="core.html">Up: The core module</a></li> <li class="prev"><a href="core_TimeZone.html">Previous: Class TimeZone</a></li> <li class="next"><a href="core_URI.html">Next: Class URI</a></li> <li class="clear"></li> </ul><div id="page_body"><h1><span class="toc_number">1.38</span>Class Tokenizer</h1><p class="brief">Simple stream-oriented parser for efficient basic recognition of incoming data. </p> <pre class="prototype">Class Tokenizer( [seps],[options],[tokLen],[source] )</pre> <table class="prototype"> <tbody><tr class="optparam"><td class="name">seps</td><td class="content"> A string representing the separators. </td></tr> <tr class="optparam"><td class="name">options</td><td class="content"> Tokenization options. </td></tr> <tr class="optparam"><td class="name">tokLen</td><td class="content"> Maximum length of returned tokens. </td></tr> <tr class="optparam"><td class="name">source</td><td class="content"> The string to be tokenized, or a stream to be read for tokens. </td></tr> </tbody> </table> <p>The tokenizer class is meant to provide simple and efficient logic to parse incoming data (mainly, incoming from string). </p> <p>The source can also be set at a second time with the <a href="core_Tokenizer.html#parse">Tokenizer.parse</a> method. <b>seps</b> defaults to " " if not given. </p> <p>The <b>options</b> parameter can be a binary combinations of the following values: </p> <p>- <b>Tokenizer.groupsep</b>: Groups different tokens into one. If not given, when a token immediately follows another, an empty field is returned. <ul><li><b>Tokenizer.bindsep</b>: Return separators inbound with their token. </li><li><b>Tokenizer.trim</b>: trim whitespaces away from returned tokens. </li><li><b>Tokenizer.wsAsToken</b>: Treat a sequence of whitespaces as a single token. </li><li><b>Tokenizer.retsep</b>: Return separators as separate tokens. </li></ul></p> <table class="members"> <tbody><tr class="member_type"><td class="member_type" colspan="2">Methods</td></tr> <tr><td><a href="#hasCurrent">hasCurrent</a></td><td>Return true if the tokenizer has a current token. </td></tr> <tr><td><a href="#next">next</a></td><td>Advances the tokenizer up to the next token. </td></tr> <tr><td><a href="#nextToken">nextToken</a></td><td>Returns the next token from the tokenizer </td></tr> <tr><td><a href="#parse">parse</a></td><td>Changes or set the source data for this tokenizer. </td></tr> <tr><td><a href="#rewind">rewind</a></td><td>Resets the status of the tokenizer. </td></tr> <tr><td><a href="#token">token</a></td><td>Get the current token. </td></tr> </tbody> </table> <h2>Methods</h2><h3><a name="hasCurrent">hasCurrent</a></h3><p class="brief">Return true if the tokenizer has a current token. </p> <pre class="prototype">Tokenizer.hasCurrent()</pre> <table class="prototype"> <tbody><tr class="return"><td class="name">Return</td><td class="content">True if a token is now available, false otherwise. </td></tr> </tbody> </table> <p>Contrarily to iterators, it is necessary to call this <a href="core_Tokenizer.html#next">Tokenizer.next</a> at least once before calling this method. </p> <p class="see_also">See also: <a href="core_Tokenizer.html">Tokenizer</a>, <a href="core_Tokenizer.html">Tokenizer</a>.</p> <h3><a name="next">next</a></h3><p class="brief">Advances the tokenizer up to the next token. </p> <pre class="prototype">Tokenizer.next()</pre> <table class="prototype"> <tbody><tr class="return"><td class="name">Return</td><td class="content">True if a new token is now available, false otherwise. </td></tr> <tr class="raise"><td class="name">Raise</td><td class="content"><table> <tbody><tr><td class="name"><a href="core_IoError.html">IoError</a></td><td class="content"> on errors on the underlying stream. </td></tr> <tr><td class="name"><a href="core_CodeError.html">CodeError</a></td><td class="content"> if called on an unprepared Tokenizer. </td></tr> </tbody> </table> </td></tr> </tbody> </table> <p>For example: </p> <pre> t = Tokenizer( source|"A string to be tokenized" ) while t.hasCurrent() > "Token: ", t.token() t.next() end </pre><p class="see_also">See also: <a href="core_Tokenizer.html">Tokenizer</a>.</p> <h3><a name="nextToken">nextToken</a></h3><p class="brief">Returns the next token from the tokenizer </p> <pre class="prototype">Tokenizer.nextToken()</pre> <table class="prototype"> <tbody><tr class="return"><td class="name">Return</td><td class="content">A string or nil at the end of the tokenization. </td></tr> <tr class="raise"><td class="name">Raise</td><td class="content"><table> <tbody><tr><td class="name"><a href="core_IoError.html">IoError</a></td><td class="content"> on errors on the underlying stream. </td></tr> <tr><td class="name"><a href="core_CodeError.html">CodeError</a></td><td class="content"> if called on an unprepared Tokenizer. </td></tr> </tbody> </table> </td></tr> </tbody> </table> <p>This method is actually a combination of <a href="core_Tokenizer.html#next">Tokenizer.next</a> followed by <a href="core_Tokenizer.html#token">Tokenizer.token</a>. </p> <p>Sample usage: </p> <pre> t = Tokenizer( source|"A string to be tokenized" ) while (token = t.nextToken()) != nil > "Token: ", token end </pre><p class='note'><b>Note:</b> When looping, remember to check the value of the returned token against nil, as empty strings can be legally returned multiple times, and they are considered false in logic checks. </p> <h3><a name="parse">parse</a></h3><p class="brief">Changes or set the source data for this tokenizer. </p> <pre class="prototype">Tokenizer.parse( source )</pre> <table class="prototype"> <tbody><tr class="param"><td class="name">source</td><td class="content"> A string or a stream to be used as a source for the tokenizer. </td></tr> <tr class="raise"><td class="name">Raise</td><td class="content"><table> <tbody><tr><td class="name"><a href="core_IoError.html">IoError</a></td><td class="content"> on errors on the underlying stream. </td></tr> </tbody> </table> </td></tr> </tbody> </table> <p>The first token is immediately read and set as the current token. If it's not empty, that is, if at least a token can be read, <a href="core_Tokenizer.html#hasCurrent">Tokenizer.hasCurrent</a> returns true, and <a href="core_Tokenizer.html#token">Tokenizer.token</a> returns its value. </p> <h3><a name="rewind">rewind</a></h3><p class="brief">Resets the status of the tokenizer. </p> <pre class="prototype">Tokenizer.rewind()</pre> <table class="prototype"> <tbody><tr class="raise"><td class="name">Raise</td><td class="content"><table> <tbody><tr><td class="name"><a href="core_IoError.html">IoError</a></td><td class="content"> if the tokenizer is tokenizing a non-rewindable stream. </td></tr> </tbody> </table> </td></tr> </tbody> </table> <h3><a name="token">token</a></h3><p class="brief">Get the current token. </p> <pre class="prototype">Tokenizer.token()</pre> <table class="prototype"> <tbody><tr class="return"><td class="name">Return</td><td class="content">True if a new token is now available, false otherwise. </td></tr> <tr class="raise"><td class="name">Raise</td><td class="content"><table> <tbody><tr><td class="name"><a href="core_IoError.html">IoError</a></td><td class="content"> on errors on the underlying stream. </td></tr> <tr><td class="name"><a href="core_CodeError.html">CodeError</a></td><td class="content"> if called on an unprepared Tokenizer, or before next(). </td></tr> </tbody> </table> </td></tr> </tbody> </table> <p>This method returns the current token. </p> <p class="see_also">See also: <a href="core_Tokenizer.html">Tokenizer</a>, <a href="core_Tokenizer.html">Tokenizer</a>.</p> </div><ul class="navi_bottom"><li class="top"><a href="index.html">Top: Table of contents</a></li> <li class="up"><a href="core.html">Up: The core module</a></li> <li class="prev"><a href="core_TimeZone.html">Previous: Class TimeZone</a></li> <li class="next"><a href="core_URI.html">Next: Class URI</a></li> <li class="clear"></li> </ul><div class="signature">Made with <a href="faldoc 3.0">http://www.falconpl.org</a></div></body></html>