The regular expression is broken down into tokens that represent the basic building blocks. Each token has a specific meaning and will be used to build the abstract syntax tree.
A literal character matches exactly itself in the input string.
The Kleene star (*) matches zero or more occurrences of the preceding element.
The union operator (|) matches either the expression on its left or right.
A literal character matches exactly itself in the input string.
A literal character matches exactly itself in the input string.
The tokens are parsed into an Abstract Syntax Tree (AST) that represents the hierarchical structure of the regular expression. Click on any node to see detailed information about it.
For all the automata below, the start state is marked with a blue circle and accepting states marked with orange double circles. You can click on any states to highlight them and their transitions.
First the AST is converted to an NFA with lambda transitions using Thompson's construction algorithm. Lambda transitions allow the automaton to move between states without consuming input characters. This makes construction easier but requires optimization in the next step.
The NFA with lambda transitions must be converted to an NFA without lambda transitions. All the states are preserved. Lambda closures are used to figure out the new transition function. The resulting NFA does not have any lambda transitions.
Finally, by using subset construction method, the NFA is converted to a DFA. Each DFA state represents a set of NFA states. The DFA is more efficient for pattern matching as it has exactly one transition for each input symbol from every state.
When converting from an NFA to a DFA, each DFA state represents a set of NFA states.
You can click on a row to highlight the corresponding states in the automaton.
| DFA State | NFA States |
|---|---|
| 0 | 0 |
| 1 | 6 |
| 2 | 2, 3, 4, 8 |
| 3 | 7, 8 |
While following the above steps, the DFA above might have some redundant states that can be merged. The state minimization algorithm merges equivalent states to produce the most efficient automaton for pattern matching.
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | x | x | x | x |
| 1 | x | x | x | x |
| 2 | x | x | x | |
| 3 | x | x |
Each cell compares a pair of DFA states. Marked cells indicate pairs that can be distinguished during minimization.
To minimize the states, we group equivalent states together.
You can click on a row to highlight the corresponding states in the automaton.
| Final State | Merged States |
|---|---|
| 0 | 1 |
| 1 | 0, 2, 3 |