This is a simple implementation of a lexer that I did while studying the lexer chapter of the dragon book. I never intended to release it, so the code is a little disorganized. However, it does have the following features: - It's a rather complicated example of spirit usage, to parse the regular expressions. - Supports full regular expression syntax (lex style). Note: beginning or tailing contexts (denoted by /) are not supported. - Supports wide chars. Even wide char character classes (e.g. [\x1234-\x5678]) - Can do case insensitivity for ansi chars. (This doesn't work if your regular expression contains any wide chars with a value > 255. But it will work with wide char input if the regular expressions are all narrow chars.) - Fully dynamic. You can load up a bunch of regular expressions, compile the DFA and then start lexing, without having to generate a C file in the middle of the process :-) - Supports callbacks. The default when a match is made is to return a pre-registered value. Also, a callback (function pointer, functor, basically anything that supports operator()) can be registered. The callback will be passed a lexer control object that it can use to modify the lexer's behavior. - Test suite. run runtest.sh in the testfiles sub-directory. Here are the limitations (could also be a TODO list :-) - Tables compression isn't done, so the tables can be quite large. - No pre-computation of the tables is done, so DFA computation can take a lot of startup time (it takes about 0.5s for compute a C lexer on my 1.4Ghz Athlon.) - No code to output the table. This needs to be done if the lexer is to ever have any useful value. A couple of scenarios are possible: 1. The table is written into a binary file. Then the lexing engine can dynamically load the file at runtime. 2. Write the table out to C/C++ arrays suitable for compilation in code. The the lexing engine can use the compiled DFA. 3. Write out code that would do the same thing as the lexing engine, similar to how re2c works. This would be the fastest implementation. - Can't use case insensitivity with wide char regular expressions. - Beginning and trailing contexts aren't implemented. - Needs to be updated to use the latest spirit features, such as grammars and attr_rule. lexer.hpp contains all of the lexer code. lexer.cpp is an example of how to use it. lextest.cpp is the test suite driver (also a good example of how to use slex) callback.cpp is an example of how to use callbacks with the lexer. --Dan Nuffer