In building Quamina, I needed to compute the union of two finite automata (FAs). I remembered from some university course 100 years ago that this was possible in theory, so I went looking for the algorithm, but was left unhappy. The descriptions I found tended to be hyper-academic, loaded with mathematical notation that I found unhelpful, and didn’t describe an approach that I thought a reasonable programmer would reasonably take. The purpose of this ongoing entry is to present a programmer-friendly description of the problem and of the algorithm I adopted, with the hope that some future developer, facing the same problem, will have a more satisfying search experience.

There is very little math in this discussion (a few subscripts), and no circles-and-arrows pictures. But it does have working Go code.

Finite automata? · I’m not going to rehash the theory of FAs (often called state machines). In practice the purpose of an FA is to match (or fail to match) some input against some pattern. What the software does when the input matches the pattern (or doesn’t) isn’t relevant to our discussion today. Usually the inputs are strings and the patterns are regular expressions or equivalent. In practice, you compile a pattern into an FA, and then you go through the input, character by character, trying to traverse the FA to find out whether it matches the input.

An FA has a bunch of states, and for each state there can be a list of input symbols that lead to transitions to other states. What exactly I mean by “input symbol” turns out to be interesting and affects your choice of algorithm, but let’s ignore that for now.

The following statements apply:

  1. One state is designated as the “start state” because, well, that’s where you start.

  2. Some states are called “final”, and reaching them means you’ve matched one or more patterns. In Quamina’s FAs, each state has an extra field (usually empty) saying “if you got here you matched P*, yay!”, where P* is a list of labels for the (possibly more than one) patterns you matched.

  3. It is possible that you’re in a state and for some particular input, you transition to more than one other state. If this is true, your FA is nondeterministic, abbreviated NFA.

  4. It is possible that a state can have one or more “epsilon transitions”, ones that you can just take any time, not requiring any particular input. (I wrote about this in Epsilon Love.) Once again, if this is true, you’ve got an NFA. If neither this statement nor the previous are true, it’s a deterministic finite automaton, DFA.

The discussion here works for NFAs, but lots of interesting problems can be solved with DFAs, which are simpler and faster, and this algorithm works there too.

Union? · If I have FA1 that matches “foo” and FA2 that matches “bar”, then their union, FA1 ∪ FA2, matches both “foo” and “bar”. In practice Quamina often computes the union of a large number of FAs, but it does so a pair at a time, so we’re only going to worry about the union of two FAs.

The academic approach · There are plenty of Web pages and YouTubes covering this. Most of them are full of Greek characters and math symbols. They go like this:

  1. You have two FAs, call them A and B. A has states A1, … AmaxA, B has B1, … BmaxB

  2. The union contains all the states in A, all the states in B, and the “product” of A and B, which is to say states you could call A1B1, A1B2, A2B1, A2B2, … AmaxABmaxB.

  3. For each state AXBY, you work out its transitions by looking at the transitions of the two states being combined. For some input symbol, if AX has a transition to AXX but BY has no transition, then the combined state just has the A transition. The reverse for an input where BY has a transition but AX doesn’t. And if AX transitions to AXX and BY transitions to BYY, then the transition is to AXXBYY.

  4. Now you’ll have a lot of states, and it usually turns out that many of them aren’t reachable. But there are plenty of algorithms to filter those out. You’re done, you’ve computed the union and A1B1 is its start state!

Programmer-think · If you’re like me, the idea of computing all the states, then throwing out the unreachable ones, feels wrong. So here’s what I suggest, and has worked well in practice for Quamina:

  1. First, merge A1 and B1 to make your new start state A1B1. Here’s how:

  2. If an input symbol causes no transitions in either A1 or B1, it also doesn’t cause any in A1B1.

  3. If an input symbol causes a transition in A1 to AX but no transition in B1, then you adopt AX into the union, and any other A states it points to, and any they point to, and so on.

  4. And of course if B1 has a transition to BY but A1 doesn’t transition, you flip it the other way, adopting BY and its descendents.

  5. And if A1 transitions to AX and B1 transitions to BY, then you adopt a new state AXBY, which you compute recursively the way you just did for A1B1. So you’ll never compute anything that’s not reachable.

I could stop there. I think that’s enough for a competent developers to get the idea? But it turns out there are a few details, some of them interesting. So, let’s dig in.

“Input symbol”? · The academic discussion of FAs is very abstract on this subject, which is fair enough, because when you’re talking about how to build, or traverse, or compute the union of FAs, the algorithm doesn’t depend very much on what the symbols actually are. But when you’re writing code, it turns out to matter a lot.

In practice, I’ve done a lot of work with FAs over the years, and I’ve only ever seen four things used as input symbols to drive them. They are:

  • Unicode “characters” represented by code points, integers in the range 0…1,114,111 inclusive.

  • UTF-8 bytes, which have values in the range 0…244 inclusive.

  • UTF-16 values, unsigned 16-bit integers. I’ve only ever seen this used in Java programs because that’s what its native char type is. You probably don’t want to do this.

  • Enum values, small integers with names, which tend to come in small collections.

As I said, this is all I’ve seen, but 100% of the FAs that I’ve seen automatically generated and subject to set-arithmetic operations like Union are based on UTF-8. And that’s what Quamina uses, so that’s what I’m going to use in the rest of this discussion.

Code starts here · This comes from Quamina’s nfa.go. We’re going to look at the function mergeFAStates, which implements the merge-two-states logic described above.

Lesson: This process can lead to a lot of wasteful work. Particularly if either or both of the states transition on ranges of values like 0…9 or a…z. So we only want to do the work merging any pair of states once, and we want there only to be one merged value. Thus we start with a straightforward memo-ization.

func mergeFAStates(state1, state2 *faState, keyMemo map[faStepKey]*faState) *faState {
    // try to memo-ize
    mKey := faStepKey{state1, state2}
    combined, ok := keyMemo[mKey]
    if ok {
        return combined
    }

Now some housekeeping. Remember, I noted above that any state might contain a signal saying that arriving here means you’ve matched pattern(s). This is called fieldTransitions, and the merged state obviously has to match all the things that either of the merged states match. Of course, in the vast majority of cases neither merged state matched anything and so this is a no-op.

    fieldTransitions := append(state1.fieldTransitions, state2.fieldTransitions...)

Since our memo-ization attempt came up empty, we have to allocate an empty structure for the new merged state, and add it to the memo-izer.

    combined = &faState{table: newSmallTable(), fieldTransitions: fieldTransitions}
    keyMemo[mKey] = combined

Here’s where it gets interesting. The algorithm talks about looking at the inputs that cause transitions in the states we’re merging. How do you find them? Well, in the case where you’re transitioning on UTF-8 bytes, since there are only 244 values, why not do the simplest thing that could possibly work and just check each byte value?

Every Quamina state contains a table that encodes the byte transitions, which operates like the Go construct map[byte]state. Those tables are implemented in a compact data structure optimized for fast traversal. But for doing this kind of work, it’s easy to “unpack” them into a fixed-sized table; in Go, [244]state. Let’s do that for the states we’re merging and for the new table we’re building.

    u1 := unpackTable(state1.table)
    u2 := unpackTable(state2.table)
    var uComb unpackedTable

uComb is where we’ll fill in the merged transitions.

Now we’ll run through all the possible input values; i is the byte value, next1 and next2 are the transitions on that value. In practice, next1 and next2 are going to be null most of the time.

    for i, next1 := range u1 {
        next2 := u2[i]

Here’s where we start building up the new transitions in the unpacked array uComb.

For many values of i, you can avoid actually merging the states to create a new one. If the transition is the same in both input FAs, or if either of them are null, or if the transitions for this value of i are the same as for the last value. This is all about avoiding unnecessary work and the switch/case structure is the result of a bunch of profiling and optimization.

        switch {
        case next1 == next2: // no need to merge
            uComb[i] = next1
        case next2 == nil: // u1 must be non-nil
            uComb[i] = next1
        case next1 == nil: // u2 must be non-nil
            uComb[i] = next2
        case i > 0 && next1 == u1[i-1] && next2 == u2[i-1]: // dupe of previous step - happens a lot
            uComb[i] = uComb[i-1]

If none of these work, we haven’t been able to avoid merging the two states. We do that by a recursive call to invoke all the logic we just discussed.

There is a complication. The automaton might be nondeterministic, which means that there might be more than one transition for some byte value. So the data structure actually behaves like map[byte]*faNext, where faNext is a wrapper for a list of states you can transition to.

So here we’ve got a nested loop to recurse for each possible combination of transitioned-to states that can occur on this byte value. In a high proportion of cases the FA is deterministic, so there’s only one state from each FA being merged and this nested loop collapses to a single recursive call.

        default: // have to recurse & merge
            var comboNext []*faState
            for _, nextStep1 := range next1.states {
                for _, nextStep2 := range next2.states {
                    comboNext = append(comboNext, mergeFAStates(nextStep1, nextStep2, keyMemo))
                }
            }
            uComb[i] = &faNext{states: comboNext}
        }
    }

We’ve filled up the unpacked state-transition table, so we’re almost done. First, we have to compress it into its optimized-for-traversal form.

    combined.table.pack(&uComb)

Remember, if the FA is nondeterministic, each state can have “epsilon” transitions which you can follow any time without requiring any particular input. The merged state needs to contain all the epsilon transitions from each input state.

    combined.table.epsilon = append(state1.table.epsilon, state2.table.epsilon...)

    return combined
}

And, we’re done. I mean, we are once all those recursive calls have finished crawling through the states being merged.

Is that efficient? · As I said above, this is an example of a “simplest thing that could possibly work” design. Both the recursion and the unpack/pack sequence are kind of code smells, suggesting that this could be a pool of performance quicksand.

But apparently not. I ran a benchmark where I added 4,000 patterns synthesized from the Wordle word-list; each of them looked like this:

{"allis": { "biggy": [ "ceils", "daisy", "elpee", "fumet", "junta", … (195 more).

This produced a huge deterministic FA with about 4.4 million states, with the addition of these hideous worst-case patterns running at 500/second. Good enough for rock ’n’ roll.

How about nondeterministic FAs? I went back to that Wordle source and, for each of its 12,959 words, added a pattern with a random wildcard; here are three of them:

{"x": [ {"shellstyle": "f*ouls" } ] }
{"x": [ {"shellstyle": "pa*sta" } ] }
{"x": [ {"shellstyle": "utter*" } ] }

This produced an NFA with 46K states, the addition process ran at 70K patterns/second.

Sometimes the simplest thing that could possibly work, works.


author · Dad
colophon · rights
picture of the day
July 28, 2024
· Technology (90 fragments)
· · Quamina Diary (10 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!