Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > b9ba69a436161613d8fb030c8c726a8e > files > 577

spirit-1.5.1-2mdk.noarch.rpm

This is a simple implementation of a lexer that I did while studying the
lexer chapter of the dragon book.  I never intended to release it, so the
code is a little disorganized.  However, it does have the following features:

- It's a rather complicated example of spirit usage, to parse the regular
    expressions.
- Supports full regular expression syntax (lex style).  Note: beginning 
    or tailing contexts (denoted by /) are not supported.
- Supports wide chars.  Even wide char character classes (e.g. [\x1234-\x5678])
- Can do case insensitivity for ansi chars.  (This doesn't work if your
    regular expression contains any wide chars with a value > 255.  But it
    will work with wide char input if the regular expressions are all narrow
    chars.)
- Fully dynamic.  You can load up a bunch of regular expressions, compile the
    DFA and then start lexing, without having to generate a C file in the
    middle of the process :-)
- Supports callbacks.  The default when a match is made is to return a
    pre-registered value.  Also, a callback (function pointer, functor,
    basically anything that supports operator()) can be registered.  The
    callback will be passed a lexer control object that it can use to modify
    the lexer's behavior.
- Test suite.  run runtest.sh in the testfiles sub-directory.

Here are the limitations (could also be a TODO list :-)
- Tables compression isn't done, so the tables can be quite large.
- No pre-computation of the tables is done, so DFA computation can take
    a lot of startup time (it takes about 0.5s for compute a C lexer on my
    1.4Ghz Athlon.)
- No code to output the table.  This needs to be done if the lexer is to
    ever have any useful value.  A couple of scenarios are possible:
    1. The table is written into a binary file.  Then the lexing engine
        can dynamically load the file at runtime.
    2. Write the table out to C/C++ arrays suitable for compilation in code.
        The the lexing engine can use the compiled DFA.
    3. Write out code that would do the same thing as the lexing engine,
        similar to how re2c works.  This would be the fastest implementation.
- Can't use case insensitivity with wide char regular expressions.
- Beginning and trailing contexts aren't implemented.
- Needs to be updated to use the latest spirit features, such as grammars and
    attr_rule.

lexer.hpp contains all of the lexer code.
lexer.cpp is an example of how to use it.
lextest.cpp is the test suite driver (also a good example of how to use slex)
callback.cpp is an example of how to use callbacks with the lexer.

--Dan Nuffer