Mozilla switched to using a common regular expression engine with Chromium

Firefox's SpiderMonkey JavaScript engine translated to use an updated implementation of regular expressions based on the current code Irregexp from the V8 JavaScript engine used in browsers based on the Chromium project. A new implementation of RegExp will be proposed in the June 78 release of Firefox 30, and will allow the browser to implement all the missing ECMAScript elements related to regular expressions.

It is noted that the RegExp engine in SpiderMonkey is designed as a separate component, which makes it relatively independent and suitable for replacement without the need to make significant changes to the code base. Modularity allowed in 2014 to replace the YARR RegExp engine originally used in Firefox with a fork of the Irregexp engine from V8. Irregexp is tied to the V8 API, tied to the garbage collector, uses V8-specific string representation and object model. In the process of adapting to the internal API of SpiderMonkey in 2014, the Irregexp engine was partially rewritten, and the changes that appear, such as the '\u' flag, where possible endured to a fork maintained by Mozilla.

Unfortunately, maintaining a synchronized fork is difficult and requires a lot of resources to maintain. With the introduction of new features related to regular expressions in the ECMAScript 2018 standard, Mozilla developers began to think about how they could easily port changes from Irregexp. As a way out, the concept of binding was proposed, which allows using the almost unchanged Irregexp engine in SpiderMonkey (the changes come down to the automatic replacement of “#include” blocks).

Mozilla switched to using a common regular expression engine with Chromium

The binding provides Irregexp with the necessary V8-specific features, including memory management and code generation functions, as well as initial data structures that are implemented on the basis of memory management mechanisms, code generators, and SpiderMonkey structures.

An update to the RegExp engine will allow Firefox to support features such as named captures, escaping Unicode character classes, the dotAll flag, and Lookbehind mode:

  • Named groups allow you to associate parts of a string matched by a regular expression with specific names instead of serial numbers of matches (for example, instead of "/(\d{4})-(\d{2})-(\d{2})/" you can specify "/( ? \d{4})-(? \d{2})-(? \d{2})/" and access the year not through result[1], but through result.groups.year).
  • Escaping classes Unicode characters adds the \p{…} and \P{…} constructs, for example, \p{Number} defines all possible characters with the image of numbers (including characters like ①), \p{Alphabetic} - letters (including hieroglyphs ), \p{Math} — mathematical symbols, etc.
  • Flag dotAll causes the mask "." including newline characters.
  • Mode Lookbehind allows you to determine in a regular expression that one pattern precedes another (for example, match a dollar amount without capturing the dollar sign).

The project was implemented with the participation of the V8 developers, who, for their part, worked to reduce the dependence of Irregexp on V8, and moved some features that cannot be implemented based on SpiderMonkey into disabled "#ifdef" blocks. The collaboration proved to be mutually beneficial. For their part, the Mozilla developers have submitted changes to Irregexp that eliminate some inconsistencies with the requirements of the JavaScript standard and improving code quality. Also, during the fuzzing testing of Firefox, previously unnoticed errors in the Irregexp code, leading to crashes, were identified and fixed.

Source: opennet.ru

Add a comment