Re2c is a free and open-source lexer generator for C and C++.
The main goal of the project is to generate very fast lexers that match or
exceed the speed of carefully optimized hand-written code. Instead of using
traditional table-driven approach, re2c encodes the underlying finite state
automata directly in the form of conditional jumps and applies numerous
optimizations to the generated code. The resulting programs are faster and
often smaller than their table-driven counterparts, and they are much easier
to debug and understand.
Re2c has an unusual flexible user interface: instead of assuming a fixed
program template, it leaves the definition of the interface code to the user
and allows to configure almost every aspect of the generated code. This gives
the users a lot of freedom in the way they bind the lexer to their particular
environment and allows them to decide on the optimal input model.
Re2c supports fast and lightweight submatch extraction which does not requre
the overhead on full parsing — a feature that is rarely found in the wild.
Re2c is used by many other projects (such as php, ninja, yasm, spamassassin,
BRL-CAD and wake) and aims at being fully backward compatible.
On the other hand, it is a research project and a playground for the
development of new algorithms in the field of formal grammars and automata. |