Regular Expression Compiler

Tokens

The regular expression is broken down into tokens that represent the basic building blocks. Each token has a specific meaning and will be used to build the abstract syntax tree.

1
LiteralPosition 0
Value: "a"

A literal character matches exactly itself in the input string.

2
Kleene Star (*)Position 1

The Kleene star (*) matches zero or more occurrences of the preceding element.

3
Union (|)Position 2

The union operator (|) matches either the expression on its left or right.

4
LiteralPosition 3
Value: "b"

A literal character matches exactly itself in the input string.

5
LiteralPosition 4
Value: "c"

A literal character matches exactly itself in the input string.

Abstract Syntax Tree

The tokens are parsed into an Abstract Syntax Tree (AST) that represents the hierarchical structure of the regular expression. Click on any node to see detailed information about it.

For all the automata below, the start state is marked with a blue circle and accepting states marked with orange double circles. You can click on any states to highlight them and their transitions.

NFA-Lambda

First the AST is converted to an NFA with lambda transitions using Thompson's construction algorithm. Lambda transitions allow the automaton to move between states without consuming input characters. This makes construction easier but requires optimization in the next step.

Loading...Rendering automaton...

NFA

The NFA with lambda transitions must be converted to an NFA without lambda transitions. All the states are preserved. Lambda closures are used to figure out the new transition function. The resulting NFA does not have any lambda transitions.

Loading...Rendering automaton...

DFA

Finally, by using subset construction method, the NFA is converted to a DFA. Each DFA state represents a set of NFA states. The DFA is more efficient for pattern matching as it has exactly one transition for each input symbol from every state.

Loading...Rendering automaton...

State Mapping

When converting from an NFA to a DFA, each DFA state represents a set of NFA states.

You can click on a row to highlight the corresponding states in the automaton.

DFA StateNFA States
00
16
22, 3, 4, 8
37, 8

State-Minimized DFA

While following the above steps, the DFA above might have some redundant states that can be merged. The state minimization algorithm merges equivalent states to produce the most efficient automaton for pattern matching.

Loading...Rendering automaton...

Distinguishability Matrix

0123
0xxxx
1xxxx
2xxx
3xx

Each cell compares a pair of DFA states. Marked cells indicate pairs that can be distinguished during minimization.

State Minimization Groups

To minimize the states, we group equivalent states together.

You can click on a row to highlight the corresponding states in the automaton.

Final StateMerged States
01
10, 2, 3