The release of re2c 4.0, a lexical analyzer generator (aka a compiler of regular expressions into code in the target programming language), has been published. Re2c specializes in generating fast and easily embeddable lexers and differs from its better-known analogue Flex in its flexible interface, generation of optimized non-table lexers, and support for submatch extraction based on tagged deterministic finite automata (TDFA). re2c is used in projects where the speed of the lexer is important, for example in Ninja and PHP.
In release 4.0, the code generation subsystem has been fundamentally redesigned, which has made it possible to add support for eight new languages (D, Haskell, Java, JavaScript, OCaml, Python, V, Zig) in addition to those already supported (C/C++, Go, Rust), as well as implement a general mechanism for adding new languages through configuration files.
The code generator is responsible for translating the already built and optimized finite state machine into code, i.e. its task is to select control structures, data types, general program model, etc. that are suitable for the target language. Previously, all this logic was part of the re2c source code, and in order to change it or add a new language, it was necessary to patch the source code and rebuild re2c. Such patches were not accepted into the main repository without implementing a standard set of examples and tests, which further complicated the entire process.
Now all this logic has been moved to syntax files — text configuration files that can be provided by the user (by default, re2c uses standard ones). The re2c source code is completely free of language-specific details and relies only on the syntax file. The user can partially override the existing syntax file or write a new one from scratch. For all officially supported languages, there is full documentation with examples.
The release also includes many other changes to simplify the user interface and improve work with capturing groups. An online environment for editing and compiling examples has been added.
Source: opennet.ru
