- Report on The Second Year of Activity
**Activity**

The research team continued working towards the achievement of the main project goals. Several BIC and BI grants were assigned: one BIC grant was continued by a junior researcher, whose main task is to develop an interactive automata visualization tool, GUitar, within task 6 (high-performance symbolic manipulation tools); one BI grant was continued by a research assistant for the implementation of a Web framework to manage descriptional complexity results related to Tasks 2 and 3; a 3-month BI grant was assigned to a research assistant for the implementation of a software library for the manipulation of linear and quasi-linear transducers, within task 3 and 6; recently (April 2012) a BI and a BIC were assigned to develop work within task 5. and tasks 2. and 3.

In May 2011 and February 2012, Vladimir Komendantsky (from the School of Computer Science, St. Andrews Univ., Scotland, UK) visited Porto.

In 2012, almost all members of the project team were engaged on the scientific and local organization of two conferences in Portugal: CIAA (17th Conference on International Conference on Implementation and Application of Automata), to be held in Porto 17-20/07/2012 (http://www.dcc.fc.up.pt/CIAA12), and also DCFS (14th Workshop on Descriptional Complexity of Formal Systems) to be held in Braga, 23-25/07/2012 (http://www.dcc.fc.up.pt/DCFS12). CIAA 2012 will include 5 invited talks, 28 contributed talks, and two Demo sessions of software tools related with automata. CIAA 2012 is one of the events of the 2012 Alan Turing Year (http://www.mathcomp.leeds.ac.uk/turing2012/) and has the scientific sponsorship of the European Association for Theoretical Computer Science (EATCS). DCFS 2012 will include 4 invited talks and 20 contributed talks, and is sponsored by the IFIP Working Group 1.2 in Descriptional Complexity. The proceedings of both conferences are to be published in the Springer LNCS series (numbers 7381 and 7386). Benefiting from these events, the visits of several project consultants were scheduled for July-August 2012. Unfortunately, earlier in January 2012, we were shocked with the unexpectedly death of Sheng Yu (University of Western Ontario, Canada), a project consultant and one of the most respected and influent researcher in automata theory, and also chair of the steering committee of CIAA.

Several talks of the project seminar and a mini-course occurred during this year. Members of the research team presented several communications in international and national workshops and conferences. Information about the activities and scientific results of the project are publicly available in the project Web page: http://www.dcc.fc.up.pt/cante.

The project is organized in the following tasks

1. Enumeration and random generation

2. Succinctness, conversions and language operations

3. Practical performance of algorithms

4. Complexity cores and average-case analysis

5. Kleene algebras and program verification

6. High-performance symbolic manipulation tools

Progresses in each task:

Task 1: Continued the work on determining the density of minimal DFAs parametrized by the alphabet size. Although experimentally it is known that almost all DFAs are minimal, we tried to obtain a theoretical characterization using Kolmogorov complexity and the incompressibility method. Some preliminary results were obtained.

Task 2: Continued the study of the average size of several representations for regular languages using analytic combinatorics methods. We computed and compared the average size of the Glushkov and the partial derivative automata. Using the same methods we obtained the average size of several epsilon-NFA constructions. The results obtained were reported in [1,2,5,8,16,23]. We studied the incomplete transition complexity of several basic operations on regular languages. This study has been initiated by Sheng Yu and collaborators in 2011 for Boolean operations, and we extended it for Concatenation, Star and Reversal. The results obtained were reported in [11,21]. We continued the development of the DesCo Web system, including the introduction of family languages and hierarchies of languages classes. See [7,17].

Task 3: In this task we focus on the development of efficient conversions from regular expressions to NFAs, and in particular in the partial derivative automata. Work is still in progress.

Task 4: Due to the retirement of one of the project team member, no progress was made in this task.

Task 5: In the context of the formalization of Kleene algebra within the Coq proof assistant, we terminated the encoding of a procedure for automatically deciding regular expression equivalence, based in the notion of partial derivatives [10,18]. We encoded the proofs of its soundness and completeness, and also the construction of a tactic that automatically decides regular expression equivalence. Then we extended that work to decide the equivalence of KAT terms [22]. Both decision procedures were also implemented in OCAML and we developed a new procedure for deciding Propositional Hoare Logic based on derivatives [12]. We developed a novel algorithm to learn generalized semi-Markov processes from sample executions that can be used for quantitative analysis and verification in the context of model checking [3,13,20].

Task 6: The developments in the FAdo system included: methods for determining the syntactic monoid associated with a DFA; several algorithms for basic operations on incomplete DFAs; the new algorithms for obtaining the Glushkov and the partial derivative automata; FAdo version 0.9.4 was released. A new library for linear transducers was developed that includes the inversion and random generation of linear transducers [15,18]. It was designed and partially implemented a version of FAdo in OCAML. GUitar has now a new implementation based on PyQt [4].**Deviations to the Intitial Proposal**

It has been very difficult to execute the project budget due to the fact that the Hosting Institution has severe financial restrictions and FCT only reimburse a percentage of the budget after expenses have been made. This includes the assignment of new grants, missions and acquisition of equipment.

Other difficulty concerned the fact that some project members leaved the team and it was not possible to substitute by new members, that are working in the project subjects but cannot have the financial support for their research (mainly missions). Thus I will again ask for the integration on the project team of Phd. students that are currently working within the project research goals, namely Eva Maia and Ivone Amorim.**Publications**1 - Publicado - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the Average State Complexity of Partial Derivative Automata. International Journal of Foundations of Computer Science, 22(7):1593-1606, 2011. http://dx.doi.org/10.1142/S0129054111008908

2 - Publicado - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. The Average Transition Complexity of Glushkov and Partial Derivative Automata. In G. Mauri and A. Leporati, editors, Developments in Language Theory, 15th International Conference, DLT 2011, Milano, Italy, July 2011. Proceedings, volume 6795 of LNCS, pages 93-104, Milano, Italy, July 2011. Springer-Verlag. http://dx.doi.org/10.1007/978-3-642-22321-1

3 - Publicado - André de Matos Pedro, Maria João Frade, Ana Paula Martins and Simão Melo de Sousa. Aprendizagem de Processos semi-Markovianos Generalizados: dos Sistemas de Eventos Discretos Estocásticos aos Testes e à Verificação. Atas do Inforum Simpósio de Informática, 8-9 de Setembro de 2011, Coimbra.Ed.: R. Barbosa, L. Caires. ISBN: 978-989-96001-5-7.

4 - Publicado - Rizó Isrof, Nelma Moreira, Rogério Reis. GUITAR: Graphical User Interface Tool for Automata Representation. Actas do IJUP (resumos), 2012, 22-24 Fevereiro, 2012, Porto. Universidade do Porto. http://www.dcc.fc.up.pt/~nam/publica/IJUP12GUITAR.pdf.

5 - Aceite - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the Average Size of Glushkov and Partial Derivative Automata. International Journal of Foundations of Computer Science, 2012. Accepted for publication.

6 - Aceite - Marco Almeida, Nelma Moreira, and Rogério Reis. Finite Automata Minimization Algorithms. Chapter in "Handbook on Finite State Based Models and Applications", CRC Press, July 2012.

7 - Aceite - Davide Nabais, Nelma Moreira, and Rogério Reis. DesCo: a Web Based Information System for Descriptional Complexity Results, para apresentação no First CiE-IFCoLog Student Session 18-23 June 2012, University of Cambridge.

8 - Aceite - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. Average Case Complexity of epsilon-NFA´s, para apresentaçõa informal na conferência TURING CENTENARY CONFERENCE CiE 2012, University of Cambridge, 18-23/06/2012.

9 - Submetido - Marco Almeida, Nelma Moreira, and Rogério Reis. Quadratic Incremental DFA Minimisation, submetido em Dezembro 2011 à revista Fundamenta Informaticae.

10 - Submetido - Nelma Moreira, David Pereira, Simão Melo de Sousa. Deciding Regular Expressions (In-)Equivalence in Coq. Submetido em Abril 2012 à conferencia 13th International Conference on Relational and Algebraic Methods in Computer Science (RAMiCS 13) (17-21 Setembro de 2012, Cambridge UK)

11 - Submetido - Eva Maia, Nelma Moreira, and Rogério Reis. On the Incomplete Transition Complexity of some Basic Operations on Regular Languages, submetido em Abril de 2012 à conferencia 37th International Symposium on Mathematical Foundations of Computer Science (MFCS 2012), Agosto 27 - 31, 2012, Bratislava, Slovakia

12 - Submetido - Ricardo Almeida, Sabine Broda, and Nelma Moreira. Deciding KAT and Hoare Logic with Derivatives, submetido em Abril de 2012 à conferência Third International Symposium on Games, Automata, Logics and Formal Verification (Gandalf 2012) Naples, Italy - Setembro 6-8, 2012

13 - Submetido - André de Matos Pedro, Paul Andrew Crocker and Simão Melo de Sousa. Learning stochastic timed automata from sample executions, submetido em Abril de 2012 ao ISOLA 2012,15-18/11/2012 - Amirandes, Heraclion, Crete

14 - Submetido - Sandra Alves and Sabine Broda. Type inhabitation through FTM and Game Semantics, submetido em Abril de 2012 à conferência Third International Symposium on Games, Automata, Logics and Formal Verification (Gandalf) Naples, Italy - Setembro 6-8, 2012

15 - Submetido - Ivone Amorim, António Machiavelo, and Rogério Reis. Formal Power Series and Invertability of Finite Linear Transducers. Submetido em Abril de 2012 à conferência Non-Classical Models of Automata and Applications (NCMA 2012) 23-24 August 2012, Fribourg, Switzerland.

Relatórios: disponíveis em http://www.dcc.fc.up.pt/dcc/Pubs/TReports/

16 - Relatório - Sabine Broda, António Machiavelo, Nelma Moreira, Rogério Reis. Study of the Average Size of Glushkov and Partial Derivative Automata, Technical Report DCC-2011-03, DCC - FC, Universidade do Porto, August, 2011.

17 - Relatório - Nelma Moreira, Davide Nabais, Rogério Reis. DesCo: a Web Based Information System for Descriptional Complexity Results , Technical Report DCC-2011-10, DCC - FC, Universidade do Porto, August, 2011.

18 - Relatório - Nelma Moreira, David Pereira, Simão Melo de Sousa. Deciding Regular Expressions (In-)Equivalence in Coq , Technical Report DCC-2011-06, DCC - FC, Universidade do Porto, November, 2011.

19 - Relatório - Ivone Amorim, António Machiavelo, Rogério Reis. On Linear Finite Automata and Cryptography , Technical Report DCC-2011-11, DCC - FC, Universidade do Porto, August, 2011.

20 - Relatório - André de Matos Pedro, Maria João Frade, Simão Melo de Sousa. Learning Generalized semi-Markov Processes: From Stochastic Discrete Event Systems to Testing and Verification , Technical Report DCC-2012-01, DCC-LIACC, Universidade do Porto, April, 2012.

21 - Relatório - Eva Maia, Nelma Moreira, and Rogério Reis. On the Incomplete Transition Complexity of Some Basic Operations on Regular Languages, Technical Report DCC-2012-02, DCC - FC, Universidade do Porto, April, 2012.

22 - Relatório - Nelma Moreira, David Pereira, Simão Melo de Sousa. Deciding KAT (In-)Equivalence in Coq. Technical Report DCC-2012-04, DCC - FC, Universidade do Porto, April, 2012.

23 - Relatório - Sabine Broda, António Machiavelo, Nelma Moreira, Rogério Reis. A Hitchhiker´s Guide to Descriptional Complexity through Analytic Combinatorics. Technical Report DCC-2012-05, DCC - FC, Universidade do Porto, April, 2012. - Report on The First Year of Activity
**Activity**

The research team worked towards the achievement of the main project goals. A BIC grant was assigned to a junior researcher to work in task 6 (high-performance symbolic manipulation tools) and a BI grant was assigned to a research assistant for the implementation of a Web framework to manage descriptional complexity results related to Tasks 2 and 3. R. Reis and N. Moreira visited Sheng Yu (project consultant) at University of Western Ontario (London, Canada) during their one semester sabbatical leave between August and December 2010. In February 2011, Cezar Campeanu (from University of Prince Edward Island,PEI,Canada) and, also a project consultant, visited Porto during two weeks. In April 2011, Sheng Yu and Giovanni Pighizzini (University of Milano, Italy) were part of the jury of Marco Almeida Ph.D. thesis defense that took place in University of Porto.

In 2012 members of the project team will organize and be the program chairs of two scientific conferences in Portugal: CIAA (Conference on International Conference on Implementation and Application of Automata), to be held in Porto, and also DCFS (Workshop on Descriptional Complexity of Formal Systems) that will be held in Braga. Several talks of the project seminar occured during this year. Members of the research team presented several commutications in international workshops and conferences. Information about the activities and scientific results of the project are public available in the project Web page: http://www.dcc.fc.up.pt/cante.

A complete list of the activities concering seminars, communications, software, and internationalization can be found in the appendix file ``canteactivities1011.pdf''.

The project is organized in the following tasks

1. Enumeration and random generation

2. Succinctness, conversions and language operations

3. Practical performance of algorithms

4. Complexity cores and average-case analysis

5. Kleene algebras and program verification

6. High-performance symbolic manipulation tools

In the following we describe the progress of each task.

Task 1: A canonical form for NFAs was defined for general regular languages. With C. Campeanu we studied the enumeration of NFAs for finite languages. The first problem is PSPACE-complete and the second is in NP. Thus although no efficient algorithms may exist we continue to study the feasibility of our methods for the uniform random generation of NFAs. Also with C. Campeanu we studied the expected compression rate of minimal cover automata and the size of minimal DFAs. We computed the expected ratio for the family of all finite languages, but also for various subfamilies of finite languages. Preliminary results were reported in [12].

Using Kolmogorov complexity and the incompressibility method we analysed the problem of the density of minimal DFAs parametrized by the alphabet size.

Using uniform random generators from previous work, we have enhanced the three relational databases that store datasets of random REs, DFAs and NFAs. The databases currently contain 4 600 000 DFAs, 9 660 000 NFAs, and 5980 000 regular expressions, amounting to 20 240 000 precomputed random samples, with sizes varying from 5 to 100 states (10 to 300, in the case of REs) over alphabets of 2--50 symbols.

Task 2: An exhaustive study relating the average sizes of several representations for regular languages was carried out using analytic combinatorics methods. We computed and compared the average size of two distinct constructions of NFAs from REs: the Glushkov and the partial derivative automata (PD). For the Glushkov automaton, approximate results were already known (2009). We improved these results by the introduction of a new algorithm for computing the Follow-set. This allowed the exact counting of the number of transitions and a new quadratic algorithm for the conversion. Given a RE of size n, we obtained the following results: in average and for large alphabets, the number of states in PD automata is half the number of states in Glushkov automata; on average and asymptotically the PD automaton has at most half the number of transitions of the Glushkov's; asymptotically the average number of transitions in the Glushkov automaton is n. The results obtained were reported in [1,3,6,9,10].

We also studied the problem of obtaining a short regular expression equivalent to a given finite automaton. We analysed some of the heuristics presented in the literature and propose new ones. We also presented some experimental comparative results [5].

Task 3: We have developed a new incremental algorithm for minimising deterministic finite automata. It runs in quadratic time for any practical application and may be halted at any point, returning a partially minimised automaton [2,4,7,8].

Task 4: No progress has been yet made in this task.

Task 5: In the context of the formalization of Kleene algebra within the Coq proof assistant, and following previous work, we focused on the encoding of a procedure for automatically deciding regular expression equivalence, based in the notion of partial derivatives. The main activities were: the redefinition of all constructions that involved finite sets, in order to use the new finite set library Containers; the definition and correctness proofs of an accessibility relation that ensures termination; the encoding of the decision procedure; a Coq library that contains a formalization of partial derivative automata in a constructive way. Preliminary results were reported in [11]

Task 6: The developments in the FAdo system included: implementation of several algorithms for combined operations on DFAs; implementation of a new class of FAs to deal with finite languages; development of an interface of Python classes to C language; integration with Grail++. Fado version 0.9.3 was released.

A BIC grant was assigned to a junior researcher which main task is to port the interactive automata visualization tool, GUitar, to the PyQT framework.**Deviations to the Intitial Proposal**

Considering that the funding of the project is smaller than the requested funding, some adjustments had to be made to the planned work. These will mainly reflect on the human resources regarding a post-doctoral grant that was attached to task 1 (enumeration and random generation), and thus the main goals of that task for all the duration of the project may be compromised.

Concerning the development of a framework for the visualiation and interactive manipulation of automata (GUItar) a student that had been working on that framework for two years (as part of his master thesis work) and that was going to begin his Ph.D. on that topic in 2011 has withdrawn the project (although a Ph.D grant was assigned to him by the FCT grants national funding). Thus, this task had to be reassigned to a new young researcher that joined the research team with a BIC grant which we hope to maintain for the next two years (assigned to task 6, and instead of 2 BI and one BIC grants for 12 months each).

In task 3 it is planned to study the descriptional complexity of combinations of operations. This topic is related with recent results in the state complexity of individual operations that began to be extended to combined operations in the last five years, mainly by Sheng Yu and his collaborators. During the visit of R. Reis and N. Moreira to the University of Western Ontario, a review paper on these topics was initiated co-authored by Sheng Yu and Yuan Gao (and partial presented in [13]). Given the extend and nature of those results it was evident that the development of a database and a Web-based search system connected with an automata symbolic manipulation system would be of great importance for the scientific community. Towards this goal, and using the FAdo system as symbolic manipulation system, a BI grant was assigned to a computer science graduate.**Publications**1 - Aceite - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average state complexity of partial derivative automata. International Journal of Foundations of Computer Science, 2011.

2 - Aceite - Marco Almeida, Nelma Moreira, and Rogério Reis. Finite automata minimization

algorithms. Chapter in "Handbook on Finite State Based Models and Applications", CRC Press, February 2012.

3 - Publicado - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average number of states of partial derivative automata. In Yuan Gao, Hanlin Lu, Shinnosuke Seki, and Sheng Yu, editors, Developments in Language Theory, 14th International Conference, DLT 2010, London, ON, Canada, August 17-20, 2010. Proceedings, volume 6224 of Lecture Notes on Computer Science, pages 112-123, London, ON, Canada, August 2010. Springer-Verlag. DOI:10.1007/978-3-642-14455-4_12.

4 - Publicado - Marco Almeida, Nelma Moreira, and Rogério Reis. Incremental dfa minimisation. In Proceedings of the 15th International Conference on Implementation and Application of Automata (CIAA 2010), number 6482 in Lecture Notes on Computer Science, pages 39-48, Winnipeg, MA, Canada, August 2011. Springer-Verlag.

5 - Publicado - Nelma Moreira, Davide Nabais, and Rogério Reis. State elimination ordering strategies: Some experimental results. In I. MacQuillan, G. Pighizzini, and B. Trost, editors, Proc. of 11th Workshop on Descriptional Complexity of Formal Systems (DCFS10), pages 169-180, Saskatoon, Saskatchewan,Canada, August 2010. Also at http://www.eptcs.org. DOI: 10.4204/EPTCS.31.16.

6 - Publicado - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. The average transition complexity of Glushkov and partial derivative automata. In G. Mauri and A. Leporati (Eds.): Proc. of DLT 2011, Milano, LNCS 6795, pp. 93--104. Springer, Heidelberg (2011)

7 - Submetido - Marco Almeida, Nelma Moreira, and Rogério Reis. Quadratic incremental dfa minimisation.

8 - Tese de Doutoramento - Marco Almeida, "Equivalence of regular languages: an algorithmic approach and complexity analysis", Ph.D. thesis, Programa Doutoral em Ciência de Computadores, FCUP. 2010

9 - Relatório - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of pd automata: an analytic combinatorics approach. Technical Report DCC-2010-04, DCC-FC, Universidade do Porto, 2010.http://www.dcc.fc.up.pt/dcc/Pubs/TReports/

10 - Relatório - Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of Glushkov and Partial Derivative Automata. Technical Report DCC-2011-03, DCC-FC, Universidade do Porto, 2011.http://www.dcc.fc.up.pt/dcc/Pubs/TReports/

11 - Relatório - Nelma Moreira, David Pereira, and Simão Melo de Sousa, A certified decision procedure for equivalence of regular expressions, Technical Report DCC-2011-06, DCC-FC, Universidade do Porto, 2011.http://www.dcc.fc.up.pt/dcc/Pubs/TReports/

12 - Relatório - Cezar Câmpeanu, Nelma Moreira, and Rogério Reis, Expected Compression Ratio for DFCA: experimental average case. Technical Report DCC-2011-07, DCC-FC, Universidade do Porto, 2011.http://www.dcc.fc.up.pt/dcc/Pubs/TReports/

13 - Relatório - Nelma Moreira, Rogério Reis, Sheng Yu and Yuan Gao. A review on State Complexity for Individual Operations. Technical Report DCC-2011-08, DCC-FC, Universidade do Porto, 2011.http://www.dcc.fc.up.pt/dcc/Pubs/TReports/