From regular expressions to deterministic finite automata: $2^{\frac{n}{2}+\sqrt{n}(\log n)^{Θ(1)}}$ states are necessary and sufficient
Abstract
It is proved that every regular expression of alphabetic width $n$, that is, with $n$ occurrences of symbols of the alphabet, can be transformed into a deterministic finite automaton (DFA) with $2^{\frac{n}{2}+(\frac{\log_2 e}{2\sqrt{2}}+o(1))\sqrt{n\ln n}}$ states recognizing the same language (the best upper bound up to date is $2^n$). At the same time, it is also shown that this bound is close to optimal, namely, that there exist regular expressions of alphabetic width $n$ over a two-symbol alphabet, such that every DFA for the same language has at least $2^{\frac{n}{2}+(\sqrt{2} + o(1))\sqrt{\frac{n}{\ln n}}}$ states (the previously known lower bound is $\frac{5}{4}2^{\frac{n}{2}}$). The same bounds are obtained for an intermediate problem of determinizing nondetermistic finite automata (NFA) with each state having all incoming transitions by the same symbol.