FAdo.reex¶
Regular expressions manipulation
Regular expression classes and manipulation
- class BuildRegexp(context=None)[source]¶
Semantics of the FAdo grammars’ regexps Priorities of operators: disj > conj > shuffle > concat > not > star >= option
- class CAtom(val, sigma=None)[source]¶
Simple Atom (symbol)
- Variables:
Sigma – alphabet set of strings
val – the actual symbol
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- PD()[source]¶
Closure of partial derivatives of the regular expression in relation to all words.
- Returns:
set of regular expressions
- Return type:
See also
Antimirov, 95
- static alphabeticLength()[source]¶
Number of occurrences of alphabet symbols in the regular expression.
- Return type:
Attention
Doesn’t include the empty word.
- derivative(sigma)[source]¶
Derivative of the regular expression in relation to the given symbol.
Note
whether the symbols belong to the expression’s alphabet goes unchecked. The given symbol will be matched against the string representation of the regular expression’s symbol.
See also
Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)
- static epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Return type:
- first(parent_first=None)[source]¶
List of possible symbols matching the first symbol of a string in the language of the regular expression.
- followLists(lists=None)[source]¶
Map of each symbol’s follow list in the regular expression.
- Parameters:
lists (dict) –
- Returns:
map of symbols’ follow lists {symbol: list of symbols}
- Return type:
Attention
For first() and last() return lists, the follow list for certain symbols might have repetitions in the case of follow maps calculated from Star operators. The union of last(), first() and follow() sets are always disjoint when the regular expression is in Star normal form ( Brüggemann-Klein, 92), therefore FAdo implements them as lists. You should order exclusively, or take a set from a list in order to resolve repetitions.
- followListsD(lists=None)[source]¶
Map of each symbol’s follow list in the regular expression.
- Parameters:
lists (dict) –
- Returns:
map of symbols’ follow list {symbol: list of symbols}
- Return type:
Attention
For first() and last() return lists, the follow list for certain symbols might have repetitions in the case of follow maps calculated from star operators. The union of last(), first() and follow() sets are always disjoint
See also
Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of glushkov and partial derivative automata. International Journal of Foundations of Computer Science, 23(5):969-984, 2012.
- followListsStar(lists=None)[source]¶
Map of each symbol’s follow list in the regular expression under a star.
- Parameters:
lists (dict) –
- Returns:
map of symbols’ follow lists {symbol: list of symbols}
- Return type:
dict
See also
Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of glushkov and partial derivative automata. International Journal of Foundations of Computer Science, 23(5):969-984, 2012.
- last(parent_last=None)[source]¶
List of possible symbols matching the last symbol of a string in the language of the regular expression.
- linearForm()[source]¶
Linear form of the regular expression , as a mapping from heads to sets of tails, so that each pair (head, tail) is a monomial in the set of linear forms.
- Returns:
dict: dictionary mapping heads to sets of tails
See also
Antimirov, 95
- linearP()[source]¶
Whether the regular expression is linear; i.e., the occurrence of a symbol in the expression is unique.
- Return type:
- static measure(from_parent=None)[source]¶
A list with four measures for regular expressions.
[alphabeticLength, treeLength, epsilonLength, starHeight]
alphabeticLength: number of occurences of symbols of the alphabet;
treeLength: number of functors in the regular expression, including constants.
epsilonLength: number of occurrences of the empty word.
starHeight: highest level of nested Kleene stars, starting at one for one star occurrence.
disjLength: number of occurrences of the disj operator
concatLength: number of occurrences of the concat operator
starLength: number of occurrences of the star operator
conjLength: number of occurrences of the conj operator
starLength: number of occurrences of the shuffle operator
Attention
Methods for each of the measures are implemented independently. This is the most effective for obtaining more than one measure.
- nfaThompson()[source]¶
Epsilon-NFA constructed with Thompson’s method that accepts the regular expression’s language.
- Returns:
NFA Thompson
- Return type:
See also
Thompson. Regular Expression Search Algorithm. CACM 11(6), 419-422 (1968)
- partialDerivatives(sigma)[source]¶
Set of partial derivatives of the regular expression in relation to given symbol.
- Parameters:
sigma (str) – symbol in relation to which the derivative will be calculated.
- Returns:
set of regular expressions
- Return type:
See also
Antimirov, 95
- reduced(has_epsilon=False)[source]¶
Equivalent regular expression with the following cases simplified:
Epsilon.RE = RE.Epsilon = RE
EmptySet.RE = RE.EmptySet = EmptySet
EmptySet + RE = RE + EmptySet = RE
Epsilon + RE = RE + Epsilon = RE, where Epsilon is in L(RE)
RE** = RE*
EmptySet* = Epsilon* = Epsilon
7.Epsilon:RE = RE:Epsilon= RE
- Parameters:
has_epsilon (bool) – used internally to indicate that the language of which this term is a subterm has the empty
word. –
- Returns:
regular expression
- Return type:
Attention
Returned structure isn’t strictly a duplicate. Use __copy__() for that purpose.
- setOfSymbols()[source]¶
Set of symbols that occur in a regular expression..
- Returns:
set of symbols
- Return type:
- static starHeight()[source]¶
- Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type:
- stringLength()[source]¶
Length of the string representation of the regular expression.
- Returns:
string length
- Return type:
- support(side=True)[source]¶
Support of a regular expression.
- Parameters:
side (bool) – if True concatenation of a set on the left if False on the right (prefix support)
- Returns:
set of regular expressions
- Return type:
See also
Champarnaud, J.M., Ziadi, D.: From Mirkin’s prebases to Antimirov’s word partial derivative. Fundam. Inform. 45(3), 195-205 (2001)
See also
Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015
- static syntacticLength()[source]¶
Number of nodes of the regular expression’s syntactical tree (sets).
- Return type:
- class CConcat(arg1, arg2, sigma=None)[source]¶
Class for catenation operation on regular expressions.
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class CConj(arg1, arg2, sigma=None)[source]¶
Intersection operation of regexps
- class CDisj(arg1, arg2, sigma=None)[source]¶
Class for disjunction/union operation on regular expressions.
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class CEmptySet(sigma=None)[source]¶
Class that represents the empty set.
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- class CEpsilon(sigma=None)[source]¶
Class that represents the empty word.
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- class COption(arg, sigma=None)[source]¶
Class for option operation (reg + @epsilon) on regular expressions.
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class CShuffle(arg1, arg2, sigma=None)[source]¶
Shuffle operation of regexps
- class CShuffleU(arg, sigma=None)[source]¶
Unary Shuffle operation of regexps
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- class CSigmaP(sigma=None)[source]¶
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+.
CSigmaP: Class that represents the complement of the EmptySet word (sigma^+)
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- class CSigmaS(sigma=None)[source]¶
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+.
CSigmaS: Class that represents the complement of the EmptySet set (sigma^*)
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- class CStar(arg, sigma=None)[source]¶
Class for iteration operation (aka Kleene star, or Kleene closure) on regular expressions.
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class Compl(arg, sigma=None)[source]¶
Class for not operation on regular expressions.
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- starHeight()[source]¶
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Returns:
number of nested star
- Return type:
- class Connective(arg1, arg2, sigma=None)[source]¶
Base class for (binary) operations: concatenation, disjunction, etc
- alphabeticLength()[source]¶
Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int
Attention
Doesn’t include the empty word.
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class DAG(reg)[source]¶
Class to support dags representing regexps
- …seealso: P. Flajolet, P. Sipala, J.-M. Steyaert, Analytic variations on the common subexpression problem,
in: Automata, Languages and Programmin, LNCS, vol. 443, Springer, New York, 1990, pp. 220–234.
Args: reg (RegExp): regular expression
- catLF(idl, idr, delay=False)[source]¶
Linear form for concatenation :param idl: node :type idl: int :param idr: node :type idr: int :param delay: if true partial derivatives are delayed :type delay: bool
- Returns:
partial derivatives
- Return type:
..note:: both arguments are assumed to be already present in the DAG
- getIdx(reg)[source]¶
Builds dag nodes :param reg: regular expression :type reg: regexp
- Returns:
node id
- Return type:
- class DAG_I(reg)[source]¶
Class to support dags representing regexps that inherit from DAG Partial derivatives are buid incrementally
Args: reg (RegExp): regular expression
- cat_one(idl, idr, c)[source]¶
Partial derivative by one symbol for concatenation :param idl: node :type idl: int :param idr: node :type idr: int :param c: symbol :type c: char
- Returns:
partial derivatives
- Return type:
- class MAtom(val, mark, sigma=None)[source]¶
Base class for pointed (marked) regular expressions
Used directly to represent atoms (characters). This class is used to obtain Yamada or Asperti automata. There is no evident use for it, outside this module.
- Parameters:
val – symbol
sigma – alphabet
- class Position(val, sigma=None)[source]¶
Class for marked regular expression symbols.
Constructor of a regular expression symbol.
- Parameters:
val – the actual symbol
- class Power(arg, n=1, sigma=None)[source]¶
Class for Power operation on regular expressions.
- alphabeticLength()[source]¶
Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int
Attention
Doesn’t include the empty word.
- epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- followLists()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class RegExp(sigma=None)[source]¶
Base class for regular expressions.
- Variables:
Sigma – alphabet set of strings
- abstract static alphabeticLength()[source]¶
Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int
Attention
Doesn’t include the empty word.
- compare(r, cmp_method='compareMinimalDFA', nfa_method='nfaPD')[source]¶
Compare with another regular expression for equivalence.
- compareMinimalDFA(r, nfa_method='nfaPosition')[source]¶
Compare with another regular expression for equivalence through minimal DFAs.
- dfaAuPoint()[source]¶
DFA “au-point” acconding to Nipkow
- Returns:
“au-point” DFA
- Return type:
See also
Andrea Asperti, Claudio Sacerdoti Coen and Enrico Tassi, Regular Expressions, au point. arXiv 2010
See also
Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for Regular Expression Equivalence
- dfaBrzozowski(memo=None)[source]¶
Word derivatives automaton of the regular expression
- Parameters:
memo – if True memorizes the states already computed
- Returns:
word derivatives automaton
- Return type:
See also
Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)
- dfaNaiveFollow()[source]¶
DFA that accepts the regular expression’s language, and is obtained from the follow automaton.
- Returns:
DFA follow
- Return type:
Note
Included for testing purposes.
See also
Ilie & Yu (Follow Automata, 2003)
- dfaYMG()[source]¶
DFA Yamada-McNaugthon-Gluskov acconding to Nipkow
- Returns:
Y-M-G DFA
- Return type:
See also
Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for Regular Expression Equivalence
- abstract static epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- evalWordP(word)[source]¶
Verifies if a word is a member of the language represented by the regular expression.
- static ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- abstract followLists()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- abstract followListsD()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- marked()[source]¶
Regular expression in which every alphabetic symbol is marked with its Position.
The kind of regular expression returned is known, depending on the literary source, as marked, linear or restricted regular expression.
- Returns:
linear regular expression
- Return type:
See also
r. McNaughton and H. Yamada, Regular Expressions and State Graphs for Automata, IEEE Transactions on Electronic Computers, V.9 pp:39-47, 1960
..attention: mark and unmark do not preserve the alphabet, neither set the new alphabet
- nfaFollow()[source]¶
NFA that accepts the regular expression’s language, whose structure, equiand construction.
- Returns:
NFA follow
- Return type:
See also
Ilie & Yu (Follow Automata, 03)
- nfaFollowEpsilon(trim=True)[source]¶
Epsilon-NFA constructed with Ilie and Yu’s method () that accepts the regular expression’s language.
- Parameters:
trim (bool) – if True automaton is trimmed at the end
- Returns:
possibly with epsilon transitions
- Return type:
NFAe
Note
The regular expression must be reduced
See also
Ilie & Yu, Follow automta, Inf. Comp. ,v. 186 (1),140-162,2003
- nfaGlushkov()[source]¶
Position or Glushkov automaton of the regular expression. Recursive method.
- Returns:
NFA position
- Return type:
- nfaNaiveFollow()[source]¶
NFA that accepts the regular expression’s language, and is equal in structure to the follow automaton.
- Returns:
NFA follow
- Return type:
Note
Included for testing purposes.
See also
Ilie & Yu (Follow Automata, 2003)
- nfaPD(pdmethod='nfaPDDAG')[source]¶
Computes the partial derivative automaton
- Parameters:
pdmethod (str) – an implementation of the PD automaton. Default value : nfaPDDAG
- Returns:
a PD nfa
- Return type:
Attention
for sre classes, CConj and CShuffle use nfaPDNaive directly
- nfaPDDAG()[source]¶
Partial derivative automaton using a DAG for the re and partial derivatives
- Returns:
a PD nfa build using a DAG
- Return type:
- ..seealso:: s.Konstantinidis, A. Machiavelo, N. Moreira, and r. Reis.
Partial derivative automaton by compressing regular expressions. DCFS 2021, volume 13037 of LNCS, pages 100–112. Springer, 2022
- nfaPDNaive()[source]¶
- NFA that accepts the regular expression’s language,
and which is constructed from the expression’s partial derivatives.
- Returns:
partial derivatives [or equation] automaton
- Return type:
See also
V. M. Antimirov, Partial Derivatives of Regular Expressions and Finite Automaton Constructions .Theor. Comput. Sci.155(2): 291-319 (1996)
- nfaPDO()[source]¶
- NFA that accepts the regular expression’s language, and which is constructed from the expression’s partial
derivatives.
- Returns:
partial derivatives [or equation] automaton
- Return type:
Note
optimized version
- nfaPSNF()[source]¶
Position or Glushkov automaton of the regular expression constructed from the expression’s star normal form.
- Returns:
Position automaton
- Return type:
- nfaPre()[source]¶
Prefix NFA of a regular expression
- Returns:
prefix automaton
- Return type:
See also
Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015
- setSigma(symbolset=None, strict=False)[source]¶
Set the alphabet for a regular expression and all its nodes
- Parameters:
..attention: Normally this attribute is not defined in a RegExp()
- abstract static starHeight()[source]¶
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Returns:
number of nested star
- Return type:
- toNFA(nfa_method='nfaPD')[source]¶
NFA that accepts the regular expression’s language. :param nfa_method:
- abstract static treeLength()[source]¶
Number of nodes of the regular expression’s syntactical tree.
- Returns:
tree lenght
- Return type:
- wordDerivative(word)[source]¶
- Derivative of the regular expression in relation to the given word,
which is represented by a list of symbols.
- Parameters:
word (list) – list of arbitrary symbols.
- Returns:
regular expression
- Return type:
See also
Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)
- class SConnective(arg, sigma=None)[source]¶
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
- associativity of concatenation; identities sigma^* and sigma^+. Connectives are:
SDisj: disjunction SConj: intersection SConcat: concatenation
For parsing use str2sre
- followLists()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class SDisj(arg, sigma=None)[source]¶
Class that represents the disjunction operation for special regular expressions.
- class SNot(arg, sigma=None)[source]¶
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+. SNot: negation
- followLists()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD()[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- class SStar(arg, sigma=None)[source]¶
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+.
SStar: Class that represents Kleene star
- class SpecialConstant(sigma=None)[source]¶
Base class for Epsilon and EmptySet
- Parameters:
sigma – alphabet
- distDerivative(sigma)[source]¶
- Parameters:
sigma – an arbitrary symbol.
- Return type:
regular expression
- static epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- static starHeight()[source]¶
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Returns:
number of nested star
- Return type:
- static treeLength()[source]¶
Number of nodes of the regular expression’s syntactical tree.
- Returns:
tree lenght
- Return type:
- class Unary(arg, sigma=None)[source]¶
Base class for unary operations: star, option, not, unary shuffle, etc
- alphabeticLength()[source]¶
Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int
Attention
Doesn’t include the empty word.
- abstract epsilonLength()[source]¶
Number of occurrences of the empty word in the regular expression.
- Returns:
number of epsilons
- Return type:
- abstract ewp()[source]¶
Whether the empty word property holds for this regular expression’s language.
- Return type:
- followLists(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- followListsD(lists=None)[source]¶
Follow set
- Returns:
for each key position a set of follow positions
- Return type:
- abstract starHeight()[source]¶
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Returns:
number of nested star
- Return type:
- equivalentP(first, second)[source]¶
Verifies if the two languages given by some representative (DFA, NFA or re) are equivalent
- Parameters:
first – language
second – language
- Return type:
New in version 0.9.6.
- powerset(iterable)[source]¶
Powerset of a set. :param iterable: the set :type iterable: list
- Returns:
the powerset
- Return type:
itertools.chain
- rpn2regexp(s, sigma=None, strict=False)[source]¶
Reads a (simple) RegExp from a RPN representation
r ::= .RR | +RR | *r | L | @ L ::= [a-z] | [A-Z]
- Parameters:
- Return type:
Note
This method uses python stack… thus depth limitations apply
- str2regexp(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]¶
Reads a RegExp from string.
- Parameters:
- Return type: