Took place Release re2c, a free lexical analyzer generator for C and C++. Recall that re2c was written in 1993 by Peter Bamboulis as an experimental generator of very fast lexical analyzers, distinguished from other generators by the speed of the generated code and an unusually flexible user interface that makes it easy and efficient to embed parsers into an existing codebase. Since then, the project has been developed by the community and continues to be a platform for experimentation and research in the field of formal grammars and state machines.
The preparation of the release took almost a whole year. Most of the time, as always, was spent on developing a theoretical framework and writing
articles "Efficient POSIX Submatch Extraction on NFAΒ«.
The algorithms described in the article are implemented in the experimental libre2c library
(building the library and performance tests is disabled by default and enabled with the configure option "--enable-libs"). The library is not intended as a competitor to existing projects such as RE2, but as a research platform for the development of new ones.
algorithms (which can then be used in re2c or in other projects). It is also convenient in terms of testing, performance measurement and creating bindings to other languages.
Main new features in re2c 1.2:
Added a new simplified way to check the end of the input data ("EOF rule"). For this, the "re2c:eof" configuration has been added,
allowing you to select a terminal character,
and a special rule "$", which is triggered if the lexer
successfully reached the end of the input data.
Historically, re2c provides several ways to check for
end of inputs varying in limitedness, efficiency, and simplicity
applications. The new method is designed to simplify writing code, while
while remaining effective and widely applicable. old ways
still work and may be preferred in some cases.
Added the ability to include external files using the directive
"/*!include:re2c "file.re" */", where "file.re" is the name of the file to include. Re2c looks for files in the directory of the containing file,
as well as in the list of paths specified using the "-I" option.
Included files can include other files.
Re2c provides "standard" files in the "include/" directory
project - it is expected that useful definitions will accumulate there
regular expressions, something in the spirit of the standard library.
So far, at the request of the workers, one file with definitions of Unicode categories has been added.
Added the ability to generate header files with arbitrary
content using the "-t --type-header" options (or the appropriate
configurations) and new directives "/*!header:re2c:on*/" and
"/*!header:re2c:off*/". This may be useful in cases where
when re2c needs to generate definitions for variables, structures and macros,
used in other translation units.
Re2c now understands UTF8 literals and character classes in regular expressions.
By default, re2c parses expressions like "βx βy" as
sequence of 1-bit ASCII characters "e2 88 80 78 20 e2 88 83 79"
(hex codes), and users have to escape Unicode characters manually:
"\\u2200x \\u2203y". This is very inconvenient and unexpected for many
users (as evidenced by constant bug reports). So now
re2c provides the option "--input-encoding {ascii | utf8}",
which allows you to change the behavior and parse "βx βy" as
"2200 78 20 2203 79".
Re2c now allows regular re2c blocks to be used in "-r --reuse" mode.
This is convenient if the input file contains many blocks, and only some of them
needs to be reused.
Now you can set the format of warnings and error messages
with the new option "--location-format {gnu | msvc}". GNU format displayed
as "filename:line:column:" and the MSVC format as "filename(line,column)".
This feature may come in handy for IDE lovers.
A "--verbose" option has also been added, which prints a short victory message on success.
Improved "compatibility" mode with flex - fixed some parsing errors and
incorrect operator precedence in rare cases.
Historically, the "-F --flex-suppor" option allows you to write code
mixed in flex style and re2c style, which makes it a bit difficult to parse.
Flex compatibility mode is rarely used in new code,
but re2c continues to support it for backwards compatibility.
Character class subtraction operator "/" is now applied
before unwrapping the encoding, which allows it to be used in more cases,
if a variable length encoding is used (such as UTF8).
The output file is now created atomically: re2c first creates a temporary file
and writes the result to it, and then renames the temporary file to the output
one operation.
From the developer's point of view, re2c has got a more complete subsystem
debugging. Debug code is now disabled in release builds and
can be enabled with the configure option "--enable-debug".