Technical Report: DCC-2010-04

On the Average Size of PD automata: an Analytic Combinatorics Approach

Sabine Broda

DCC-FC & LIACC, Universidade do Porto
E-mail: sbb@ncc.up.pt

António Machiavelo

CMUP, Universidade do Porto
E-mail: ajmachia@fc.up.pt

Nelma Moreira

Rogério Reis

DCC-FC & LIACC, Universidade do Porto
E-mail: \{nam,rvr\}@ncc.up.pt}

Rua do Campo Alegre, 4169-007 Porto, Portugal

July 2010


Abstract


The partial derivative automaton (NFA_PD) is usually smaller than other non-deterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton (NFA_POS). By estimating the number of regular expressions that have epsilon as a partial derivative, we compute a lower bound of the average number of mergings of states in NFA_POS. and describe its asymptotic behaviour. This depends on the alphabet size, k, and its limit, as k goes to infinity, is 1\2. The lower bound corresponds exactly to consider the NFA_PD automaton for the marked version of the RE, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the NFA_PD automaton for the unmarked RE, are very close to each other.