This is a joint project between members of CNRS and EPFL, which takes a new look at modern regexes:

  • We work on new linear-time algorithms to match modern regex features.
  • We design and mechanize the semantics of real-world regex languages.
  • We write mechanized proofs of regex properties, and of the correctness of matching algorithms.
A long-term goal is to develop Linden: a formally verified, linear-time, efficient engine for JavaScript regexes.

Feel free to reach out to Aurèle Barrière and Clément Pit-Claudel if you're a student looking for internships, master thesis, or PhD opportunities! One internship proposal is currently available, but we have many more topics to choose from!

Main Developments

  • RegElk: a linear engine for JavaScript regexes, in OCaml.
    We developed new linear-time algorithms for modern regex features. This includes the first linear-time algorithms for unbounded lookarounds with capture groups. We also designed new regex-size-linear algorithms for JavaScript specificities, like capture group reset and the JavaScript quantifiers.
    Our algorithms are described in our PLDI24 paper. We have also merged some of these algorithms in the linear engine of V8: lookarounds, nullable quantifiers, capture group reset.
  • RE2-lookbehinds and Rust-regex-lookbehinds: forks of the RE2 and rust-lang/regex linear engines to add support for captureless lookbehinds in linear time.
    These forks have been described in the following blog posts: RE2 blog post and Rust blog post.
  • Warblre: a faithful Rocq mechanization of the regex chapter of ECMAScript 2023.
    With a line-by-line translation, we provide an executable and faithful Rocq semantics for JavaScript regexes. We used this semantics to prove properties of the language (termination and absence of failure), and an optimization. This mechanization is described in our ICFP24 paper.
  • Linden: a formally verified linear engine for JavaScript regexes.
    This includes a new backtracking tree semantics, designed with formal verification in mind, and proved to be equivalent to Warblre. Using this new semantics, we present a new definition of JavaScript regex equivalence, and prove some equivalences. We also prove the correctness of the PikeVM matching algorithm, relating it to the ECMAScript semantics. This work has been described in our POPL26 paper.

Publications

Conference Publications

Theses

  • Master Thesis, 2025 Optimizing Regex Compilers.
    Zacharie Tevaearai. Thesis.
  • Master Thesis, 2024 Formal Verification of Memoized Backtracking for JavaScript Regexes.
    Sophie Ammann. Thesis.
  • Master Thesis, 2024 Mechanized Semantics for JavaScript Regular Expressions.
    Noé De Santo. Thesis.

Informal Publications

  • Adding lookbehinds to rust-lang/regex. Robin Hänni & Marcin Wojnarowski. Blog post.
  • SpecMerger, a tool designed for mechanized specification audits. Martin Crettol. Blog post.
  • An optimizing compiler for JavaScript regular expressions. Zacharie Tevaearai. Project report.
  • Adding linear-time lookbehinds to RE2. Erik Giorgis. Blog post.

Contributors

  • Aurèle Barrière. Lead, since 2023.
  • Clément Pit-Claudel. Lead, since 2023.
  • Victor Deng. PhD student, 2025. Complexity of modern regex matching problems.
  • Marcin Wojnarowski. Masters thesis, 2025. Formally verified regex matching optimizations and extensions.
  • Théo Hollender. Masters project, 2025. Algorithmic extensions for PikeVM regex matching.
  • Zacharie Tevaearai. Masters thesis, 2025. Optimizing regex compilers.
  • Victor Deng. Masters internship, 2025. Inductive semantics for JavaScript regexes.
  • Marcin Wojnarowski and Robin Hänni. Masters project, 2025. Matching algorithms for Rust regexes.
  • Eugène Flesselle. Masters project, 2024. Formally verified regex optimizations.
  • Martin Crettol. Bachelors project, 2024. Conformance checking for mechanized specifications.
  • Sophie Ammann. Masters thesis, 2024. Formal verification of memoized regex backtracking.
  • Zacharie Tevaearai. Masters project, 2024. Regex to WASM compilation.
  • Erik Giorgis. Masters project, 2024. RE2 implementation of NFA simulation extensions.
  • Alexandre Pinazza. PhD student, 2023. Coq verification of a regex compiler for NFA simulation.
  • Noé De Santo. Masters thesis, 2023. Coq mechanized specification of ECMAScript regexes.
  • Ludovic Mermod. Master student assistant, 2023. Implementation of regex matching algorithms in the V8 NFA simulation engine.
  • Thomas Houhou. Bachelors project, 2023. Development of a semantic-aware JavaScript regex fuzzer.

Funding

  • SNSF Project Grant. ReSET: Regex matching made Safe, Effective, and Trustworthy.
  • Contribute Grant from the Open Research Data Program of the ETH Board.