Ian Erik Varatalu

ian erik varatalu

i'm a phd candidate at taltech in estonia, working at the high-assurance software laboratory. my research is mostly about automata theory and string algorithms, with a focus on high-performance regex engines, but also anything else to do with text and streams. if you need to search through your email, process tons of documentation, or find that one file from your computer, i'm your guy (!). i've been recently looking for more industrial applications to my research, so feel free to reach out if you need consulting or collaboration on anything related to my work.

research and projects

i work with my advisors Juhan Ernits (taltech) and Margus Veanes (microsoft research) on Derivative-Based Symbolic Automata, an incremental approach to hard problems by pushing heavy computation into lazy step-by-step unwinding of automata - i.e., solving only as much as needed, when needed. we've applied this to regex matching, model checking, and string constraint solving, with promising results.

the main project i've been working on for the past few years is RE#, a high-performance regex engine originally in F# and also in rust for string solving. RE# supports extended regex features (lookarounds, intersection, complement) while being fast and memory efficient thanks to derivative-based algorithms and optimizations. some discoveries we made with RE# helped improve the performance of the .NET NonBacktracking regex engine, originally developed by my advisor Margus Veanes, as well as inspired some other projects, like the LLM tokenizer for the AICI project at microsoft (the thing that helps constrain LLM outputs).

i'm a big of F# and functional programming in practice, and i'm constantly working on various smaller projects and libraries in my spare time, which you can find on my github. some highlights include fflat, a tiny native compiler for F#, built on top of bflat, a plugin for obsidian, a note taking app i use daily and r-toml which is just a little demo of how automata based techniques can process .toml files vastly faster than traditional parsers. i enjoy microoptimizing things and do a lot of benchmarking both in research and other projects, so if you like reading assembly or want 'the fastest possible' implementations of something, you might find something interesting through my repos.

papers

interests

  • automata theory and formal languages
  • functional programming (F#)
  • compilers and systems programming (Rust)
  • minimalism and cybersecurity
  • linux, embedded systems
  • dogs, coffee, lifting?

presentations

  • .. tbd