Regular Expression Compiler

Pattern to Analyze

The regular expression is broken down into tokens that represent the basic building blocks. Each token has a specific meaning and will be used to build the abstract syntax tree.

LiteralPosition 0

Value: "a"

A literal character matches exactly itself in the input string.

Kleene Star (*)Position 1

The Kleene star (*) matches zero or more occurrences of the preceding element.

Union (|)Position 2

The union operator (|) matches either the expression on its left or right.

LiteralPosition 3

Value: "b"

A literal character matches exactly itself in the input string.

LiteralPosition 4

Value: "c"

A literal character matches exactly itself in the input string.

The tokens are parsed into an Abstract Syntax Tree (AST) that represents the hierarchical structure of the regular expression. Click on any node to see detailed information about it.

For all the automata below, the start state is marked with a blue circle and accepting states marked with orange double circles. You can click on any states to highlight them and their transitions.

NFA-Lambda

First the AST is converted to an NFA with lambda transitions using Thompson's construction algorithm. Lambda transitions allow the automaton to move between states without consuming input characters. This makes construction easier but requires optimization in the next step.

NFA

The NFA with lambda transitions must be converted to an NFA without lambda transitions. All the states are preserved. Lambda closures are used to figure out the new transition function. The resulting NFA does not have any lambda transitions.

DFA

Finally, by using subset construction method, the NFA is converted to a DFA. Each DFA state represents a set of NFA states. The DFA is more efficient for pattern matching as it has exactly one transition for each input symbol from every state.

State Mapping

When converting from an NFA to a DFA, each DFA state represents a set of NFA states.

You can click on a row to highlight the corresponding states in the automaton.

DFA State	NFA States
0	0
1	6
2	2, 3, 4, 8
3	7, 8

State-Minimized DFA

While following the above steps, the DFA above might have some redundant states that can be merged. The state minimization algorithm merges equivalent states to produce the most efficient automaton for pattern matching.

Distinguishability Matrix

	0	1	2
0
1	x
2	x	x
3	x	x	x

Each cell compares a pair of DFA states. Marked cells indicate pairs that can be distinguished during minimization.

State Minimization Groups

To minimize the states, we group equivalent states together.

You can click on a row to highlight the corresponding states in the automaton.

Final State	Merged States
0	0
1	1
2	2
3	3

Regular Expression Compiler

Tokens

Abstract Syntax Tree

NFA-Lambda

NFA

DFA

State Mapping

State-Minimized DFA

Distinguishability Matrix

State Minimization Groups