Technical Report: DCC-2011-07

Expected Compression Ratio for \DFCA: experimental average case analysis

Cezar Câmpeanu

Dept. of Computer Science and Information Technology,
University of Prince Edward Island, Charlottetown, P.E.I, Canada C1A 4P3
E-mail: ccampeanu@upei.ca

Nelma Morera

CMUP & DCC-FCUP
E-mail: nam@dcc.fc.up.pt

Rogério Reis

CMUP & DCC-FCUP
E-mail: rvr@dcc.fc.up.pt


July 2011

Abstract

In this paper we investigate from a statistical point of view the expected compression ratio between the size of a minimal Deterministic Finite Cover Automaton (DFCA) and the size of the minimal corresponding Determinitic Finite Automaton (DFA). Using sound statistical methods, we extend the experimental study done in Santean, thus obtaining a much better picture of the compression power of DFCAs. We compute the expected ratio for the family of all finite languages, but also for various subfamilies of finite languages, such as prefix, suffix-free languages prefix and suffix closed languages, or (un)balanced languages. We also give an example of a family for which the expected compression ratio is very high.