What
 · Technology
 · · Quamina Diary

QRS: Dot-matching Redux · Recently I posted Matching “.” in UTF-8, in which I claimed that you could match the regular-expression “.” in a UTF-8 stream with either four or five states in a byte-driven finite automaton, depending how you define the problem. That statement was arguably wrong, and you might need three more states, for a total of eight. But you can make a case that really, only four should be needed, and another case calling for quite a few more. Because that phrase “depending how you define the problem” is doing a lot of work ...
 
QRS: Matching “.” in UTF-8 · Back on December 13th, I posted a challenge on Mastodon: In a simple UTF-8 byte-driven finite automaton, how many states does it take to match the regular-expression construct “.”, i.e. “any character”? Commenter Anthony Williams responded, getting it almost right I think, but I found his description a little hard to understand. In this piece I’m going to dig into what . actually means, and then how many states you need to match it ...
[10 comments]  
QRS: Parsing Regexps · Parsing regular expression syntax is hard. I’ve written a lot of parsers and,for this one, adopted a couple of new techniques that I haven’t used before. I learned things that might be of general interest ...
[2 comments]  
QRS: Quamina Regexp Series · Implementing regular expressions is hard. Hard in interesting ways that make me want to share the lessons. Thus this series, QRS for short ...
[4 comments]  
Unbackslash · Old software joke: “After the apocalypse, all that’ll be left will be cockroaches, Keith Richards, and markup characters that have been escaped (or unescaped) one too many (or few) times.” I’m working on a programming problem where escaping is a major pain in the ass, specifically “”. So, for reasons that seem good to me, I want to replace it. What with? ...
[10 comments]  
Q Numbers Redux · Back in July I wrote about Q numbers, which make it possible to compare numeric values using a finite automaton. It represented a subset of numbers as 14-hex-digit strings. In a remarkable instance of BDD (Blog-Driven Development, obviously) Arne Hormann and Axel Wagner figured out a way to represent all 64-bit floats in at most ten bytes of UTF-8 and often fewer. This feels nearly miraculous to me; read on for heroic bit-twiddling ...
[1 comment]  
Q Numbers Redux Explained · [The first guest post in some years. Welcome, Arne Hormann!] ...
 
Union of Finite Automata · In building Quamina, I needed to compute the union of two finite automata (FAs). I remembered from some university course 100 years ago that this was possible in theory, so I went looking for the algorithm, but was left unhappy. The descriptions I found tended to be hyper-academic, loaded with mathematical notation that I found unhelpful, and didn’t describe an approach that I thought a reasonable programmer would reasonably take. The purpose of this ongoing entry is to present a programmer-friendly description of the problem and of the algorithm I adopted, with the hope that some future developer, facing the same problem, will have a more satisfying search experience ...
 
Q Numbers · This ongoing fragment describes how to match and compare numbers using a finite automaton, which involves transforming them into strings with the right lexical properties. My hope is that there are at least twelve people in the world who are interested in the intersection of numeric representation and finite automata.
[Note: This whole piece, except for the description of the problem, has been obsoleted by Q Numbers Redux.]
 ...
[4 comments]  
Epsilon Love · Quamina was for a time my favorite among all my software contributions. But then it stalled after I shipped 1.0 in January of 2023. First of all, I got busy with the expert witness for Uncle Sam gig and second, there was a horrible problem in there that I couldn’t fix. Except for now I have! And I haven’t done much codeblogging recently. So, here are notes on nondeterministic finite automata, epsilon transitions, Ken Thompson, Golang generics, and prettyprinting. If some subset of those things interests you, you’ll probably like this ...
[2 comments]  
Quamina v1.0.0 · Today I hit the “release” button on v1.0.0 of Quamina (introduction here), a fast open-source pattern-matching library in Go. Gotta keep doing some coding to keep me honest! The purpose of today’s piece is to provide a snapshot status report and record a few technology gripes just to get them off my chest; sharing is caring ...
 
Autumn Golang Diary · I’ve posted here about my experiences with Go since 2013 and I guess it’s too late to stop now. This one is truly miscellaneous, just a bunch of things that built up in “should write about this” notes to myself while working on Quamina ...
[2 comments]  
Hello, Ruler · Hey, look what’s been open-sourced: AWS Event Ruler! Check out the announcement blog. I built v1.0 of this Java library while I was at AWS, and wrote about it in Filtering Lessons. Tl;dr: It offers APIs for declaring pattern-matching Rules, as many as you like, then for presenting data records called Events and finding out which Rules each Event matches, very quickly. It’s in production in multiple Amazon (not just AWS) services, notably EventBridge. Also see: Content-based Filtering ...
 
Small Tables · Computer programs organize bits and bytes into “data structures”. In software of any import, the data structures are usually more interesting than the code around them. This part of the Quamina Diary takes a close look at a very simple data structure that I have greatly enjoyed using to build finite automata, and which I think has lessons to teach; it’s called smallTable ...
[3 comments]  
Making Code Faster · I’ve enjoyed writing software for 40+ years now. Lots of activities fall into that “writing software” basket, and here’s my favorite: When you have a body of code with a decent unit-test suite and you need to make it go faster. This part of the Quamina diary is a case study of making a piece of the library faster ...
[3 comments]  
Field-Value Automata · When I introduced Quamina, I described the core trick: You prepare an arbitrary JSON blob for automaton-based matching by “Flattening” it into a list of name/value pairs, then sorting them in name order. Today, a closer look at how you work with the flattened data ...
[3 comments]  
Golang Diaries: Generics · My coding time this year has been invested in Quamina, an event-filtering library implemented in Go. Just in the last couple of weeks I spotted an opportunity to bring Go’s shiny new generics to bear, and not just anywhere, but for a central data structure. I got good results, but the process was sort of painful; I kept trying things that looked like they ought to work but didn’t. I’m sharing this tour through that experience because I’m a reasonably competent programmer with mainstream tastes and if I hit these bumps in the road others probably will too ...
[4 comments]  
Meet Quamina · No parent will name a favorite among their children. But I do have one among my brainchildren, my software contributions over the decades: The event-streaming code I helped build at AWS. After rage-quitting I missed it so much that over the last few months, I wrote a library (in Go) called Quamina (GitHub) that does some of the same things. This is about that ...
[4 comments]  
author · Dad
colophon · rights
Random image, linked to its containing fragment

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!