FAdo.reex¶

Regular expressions manipulation

Regular expression classes and manipulation

class BuildRPNRegexp(context=None)[source]¶

class BuildRPNSRE(context=None)[source]¶

class BuildRegexp(context=None)[source]¶: Semantics of the FAdo grammars’ regexps Priorities of operators: disj > conj > shuffle > concat > not > star >= option

class BuildSRE(context=None)[source]¶: Parser for sre

class CAtom(val, sigma=None)[source]¶

Simple Atom (symbol)

Variables:

Sigma – alphabet set of strings
val – the actual symbol

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

PD()[source]¶

Closure of partial derivatives of the regular expression in relation to all words.

Returns:: set of regular expressions
Return type:: set

See also

Antimirov, 95

static alphabeticLength()[source]¶

Number of occurrences of alphabet symbols in the regular expression.

Return type:: int

Attention

Doesn’t include the empty word.

derivative(sigma)[source]¶

Derivative of the regular expression in relation to the given symbol.

Parameters:: sigma (str) – an arbitrary symbol.
Returns:: regular expression
Return type:: RegExp

Note

whether the symbols belong to the expression’s alphabet goes unchecked. The given symbol will be matched against the string representation of the regular expression’s symbol.

See also

1. Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)

static epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Return type:: int

first(parent_first=None)[source]¶

List of possible symbols matching the first symbol of a string in the language of the regular expression.

Parameters:: parent_first (list) –
Returns:: list of symbols
Return type:: list

first_l()[source]¶: First set for locations

followLists(lists=None)[source]¶

Map of each symbol’s follow list in the regular expression.

Parameters:: lists (dict) –
Returns:: map of symbols’ follow lists {symbol: list of symbols}
Return type:: dict

Attention

For first() and last() return lists, the follow list for certain symbols might have repetitions in the case of follow maps calculated from Star operators. The union of last(), first() and follow() sets are always disjoint when the regular expression is in Star normal form ( Brüggemann-Klein, 92), therefore FAdo implements them as lists. You should order exclusively, or take a set from a list in order to resolve repetitions.

followListsD(lists=None)[source]¶

Map of each symbol’s follow list in the regular expression.

Parameters:: lists (dict) –
Returns:: map of symbols’ follow list {symbol: list of symbols}
Return type:: dict

Attention

For first() and last() return lists, the follow list for certain symbols might have repetitions in the case of follow maps calculated from star operators. The union of last(), first() and follow() sets are always disjoint

See also

Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of glushkov and partial derivative automata. International Journal of Foundations of Computer Science, 23(5):969-984, 2012.

followListsStar(lists=None)[source]¶

Map of each symbol’s follow list in the regular expression under a star.

Parameters:

lists (dict) –

Returns:

map of symbols’ follow lists {symbol: list of symbols}

Return type:

dict

See also

Sabine Broda, António Machiavelo, Nelma Moreira, and Rogério Reis. On the average size of glushkov and partial derivative automata. International Journal of Foundations of Computer Science, 23(5):969-984, 2012.

follow_l()[source]¶: Follow set for locations

last(parent_last=None)[source]¶

List of possible symbols matching the last symbol of a string in the language of the regular expression.

Parameters:: parent_last (list) –
Returns:: list of symbols
Return type:: list

last_l()[source]¶: Last set for locations

linearForm()[source]¶

Linear form of the regular expression , as a mapping from heads to sets of tails, so that each pair (head, tail) is a monomial in the set of linear forms.

Returns:
dict: dictionary mapping heads to sets of tails

See also

Antimirov, 95

linearFormC()[source]¶

Returns:: linear form
Return type:: dict

linearP()[source]¶

Whether the regular expression is linear; i.e., the occurrence of a symbol in the expression is unique.

Return type:: bool

mark()[source]¶

Return type:: MAtom

static measure(from_parent=None)[source]¶

A list with four measures for regular expressions.

Parameters:: from_parent –
Returns:: the measures
Return type:: [int,int,int,int]

[alphabeticLength, treeLength, epsilonLength, starHeight]

alphabeticLength: number of occurences of symbols of the alphabet;
treeLength: number of functors in the regular expression, including constants.
epsilonLength: number of occurrences of the empty word.
starHeight: highest level of nested Kleene stars, starting at one for one star occurrence.
disjLength: number of occurrences of the disj operator
concatLength: number of occurrences of the concat operator
starLength: number of occurrences of the star operator
conjLength: number of occurrences of the conj operator
starLength: number of occurrences of the shuffle operator

Attention

Methods for each of the measures are implemented independently. This is the most effective for obtaining more than one measure.

nfaThompson()[source]¶

Epsilon-NFA constructed with Thompson’s method that accepts the regular expression’s language.

Returns:: NFA Thompson
Return type:: NFA

See also

Thompson. Regular Expression Search Algorithm. CACM 11(6), 419-422 (1968)

partialDerivatives(sigma)[source]¶

Set of partial derivatives of the regular expression in relation to given symbol.

Parameters:: sigma (str) – symbol in relation to which the derivative will be calculated.
Returns:: set of regular expressions
Return type:: set

See also

Antimirov, 95

partialDerivativesC(sigma)[source]¶

Parameters:: sigma (str) – symbol
Returns:: set of partial derivatives
Return type:: set

reduced(has_epsilon=False)[source]¶

Equivalent regular expression with the following cases simplified:

Epsilon.RE = RE.Epsilon = RE
EmptySet.RE = RE.EmptySet = EmptySet
EmptySet + RE = RE + EmptySet = RE
Epsilon + RE = RE + Epsilon = RE, where Epsilon is in L(RE)
RE** = RE*
EmptySet* = Epsilon* = Epsilon

7.Epsilon:RE = RE:Epsilon= RE

Parameters:

has_epsilon (bool) – used internally to indicate that the language of which this term is a subterm has the empty
word. –

Returns:

regular expression

Return type:

RegExp

Attention

Returned structure isn’t strictly a duplicate. Use __copy__() for that purpose.

reversal()[source]¶

Reversal of RegExp

Return type:: Regexp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Set of symbols that occur in a regular expression..

Returns:: set of symbols
Return type:: set

snf(hollowdot=False)[source]¶

Star Normal Form (SNF) of the regular expression.

Parameters:: hollowdot (bool) – if True computes hollow dot function else black dot
Returns:: regular expression in star normal form
Return type:: RegExp

static starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.: For instance, starHeight(((a*b)*+b*)*) is 3.

Return type:: int

stringLength()[source]¶

Length of the string representation of the regular expression.

Returns:: string length
Return type:: int

support(side=True)[source]¶

Support of a regular expression.

Parameters:: side (bool) – if True concatenation of a set on the left if False on the right (prefix support)
Returns:: set of regular expressions
Return type:: set

See also

Champarnaud, J.M., Ziadi, D.: From Mirkin’s prebases to Antimirov’s word partial derivative. Fundam. Inform. 45(3), 195-205 (2001)

See also

Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015

supportlast(side=True)[source]¶

Subset of support such that elements have ewp

Parameters:: side (bool) – if True left-derivatives else right-derivatives
Returns:: set of partial derivatives
Return type:: set

static syntacticLength()[source]¶

Number of nodes of the regular expression’s syntactical tree (sets).

Return type:: int

tailForm()[source]¶

Returns:: tail form
Return type:: dict

static treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Return type:: int

unmarked()[source]¶

The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(), the CEpsilon() or the CEmptySet().

Returns:: (general) regular expression
Return type:: RegExp

class CConcat(arg1, arg2, sigma=None)[source]¶

Class for catenation operation on regular expressions.

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

first_l()[source]¶: First sets for locations

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

follow_l()[source]¶: Follow sets for locations

last(parent_last=None)[source]¶

Last set

Returns:: last position set
Return type:: set

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

reversal()[source]¶

Reversal of RegExp

Return type:: reex.RegExp

rpn()[source]¶

Return type:: str

snf(_hollowdot=False)[source]¶: Star Normal Form

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

unmark()[source]¶: Conversion back to unmarked atoms :rtype: CConcat

class CConj(arg1, arg2, sigma=None)[source]¶

Intersection operation of regexps

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first_l()[source]¶: First sets for locations

follow_l()[source]¶: Follow sets for locations

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf()[source]¶: Star Normal Form

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class CDisj(arg1, arg2, sigma=None)[source]¶

Class for disjunction/union operation on regular expressions.

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

first_l()[source]¶: First sets for locations

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

follow_l()[source]¶: Follow sets for locations

last(parent_last=None)[source]¶

Last set

Returns:: last position set
Return type:: set

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶: Convertion to marked atoms :rtype: CDisj

nfaThompson()[source]¶

Returns an NFA (Thompson) that accepts the RE.

Return type:: NFA

digraph dij {
"0" -> "si1" [label=e];
"si1" -> "sf1" [label="arg1"];
"sf1" -> "1" [label=e];
"0" -> "si2" [label=e];
"si2" -> "sf2" [label="arg2"];
"sf2" -> "1" [label=e];
}

reversal()[source]¶

Reversal of RegExp

Return type:: reex.RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf(hollowdot=False)[source]¶: Star Normal Form

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

unmark()[source]¶: Conversion back to unmarked atoms :rtype: CDisj

class CEmptySet(sigma=None)[source]¶

Class that represents the empty set.

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

static emptysetP()[source]¶

Returns:

static epsilonLength()[source]¶

Returns:

static epsilonP()[source]¶

Returns:

ewp()[source]¶

Returns:

static measure(from_parent=None)[source]¶

Parameters:: from_parent –
Returns:

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

partialDerivatives(_)[source]¶: Partial derivatives

partialDerivativesC(_)[source]¶

Returns:

rpn()[source]¶

Returns:

snf(_hollowdot=False)[source]¶: Star Normal Form

class CEpsilon(sigma=None)[source]¶

Class that represents the empty word.

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

static epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

static epsilonP()[source]¶

Return type:: bool

ewp()[source]¶

Return type:: bool

static measure(from_parent=None)[source]¶

Parameters:: from_parent –
Returns:: measures

nfaThompson()[source]¶

Return type:: NFA

partialDerivatives(_)[source]¶

Returns:

partialDerivativesC(_)[source]¶

Returns:

rpn()[source]¶

Returns:: str

snf(_hollowdot=False)[source]¶

Parameters:: _hollowdot –
Returns:

class COption(arg, sigma=None)[source]¶

Class for option operation (reg + @epsilon) on regular expressions.

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

first_l()[source]¶: First sets for locations

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsStar(lists=None)[source]¶: to be fixed

follow_l()[source]¶: Follow sets for locations

last(parent_first=None)[source]¶

Last set

Returns:: last position set
Return type:: set

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

nfaThompson()[source]¶

Returns a NFA that accepts the RE.

Return type:: NFA

digraph foo {
"0" -> "1" [label=e];
"0" -> "a" [label=e];
"a" -> "b" [label=A];
"b" -> "1" [label=e];
}

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:: set of symbols
Return type:: set

snf(_hollowdot=False)[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class CShuffle(arg1, arg2, sigma=None)[source]¶

Shuffle operation of regexps

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_list=None)[source]¶

Parameters:: parent_list –
Returns:

first_l()[source]¶: First sets for locations

followListsD(lists=None)[source]¶: in progress

follow_l()[source]¶: Follow sets for locations

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf()[source]¶: Star Normal Form

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class CShuffleU(arg, sigma=None)[source]¶

Unary Shuffle operation of regexps

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_list=None)[source]¶

Parameters:: parent_list –
Returns:

followListsD(lists=None)[source]¶: in progress

last(parent_last=None)[source]¶

Last set

Returns:: last position set
Return type:: set

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

tailForm()[source]¶

Returns:: tail form
Return type:: dict

unmark()[source]¶

Conversion back to RegExp

Return type:: reex.__class__

class CSigmaP(sigma=None)[source]¶

Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;: associativity of concatenation; identities sigma^* and sigma^+.

CSigmaP: Class that represents the complement of the EmptySet word (sigma^+)

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

ewp()[source]¶

Returns:

linearForm()[source]¶

Returns:

linearFormC()[source]¶

Returns:

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(_)[source]¶

Parameters:: _ –
Returns:

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

support(side=True)[source]¶

Returns:

class CSigmaS(sigma=None)[source]¶

Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;: associativity of concatenation; identities sigma^* and sigma^+.

CSigmaS: Class that represents the complement of the EmptySet set (sigma^*)

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

ewp()[source]¶

Returns:

linearForm()[source]¶

Returns:

linearFormC()[source]¶

Returns:

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

support(side=True)[source]¶

Returns:

class CStar(arg, sigma=None)[source]¶

Class for iteration operation (aka Kleene star, or Kleene closure) on regular expressions.

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

first_l()[source]¶: First sets for locations

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

follow_l()[source]¶: Follow sets for locations

last(parent_first=None)[source]¶

Last set

Returns:: last position set
Return type:: set

last_l()[source]¶: Last sets for locations

linearForm()[source]¶

Returns:: linear form
Return type:: dict

nfaThompson()[source]¶

Returns a NFA that accepts the RE.

Return type:: NFA

digraph foo {
"0" -> "1" [label=e];
"0" -> "a" [label=e];
"a" -> "b" [label=A];
"b" -> "1" [label=e];
"1" -> "0" [label=e];
}

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf(_hollowdot=False)[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class Compl(arg, sigma=None)[source]¶

Class for not operation on regular expressions.

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

first(_)[source]¶

First set

Returns:: first position set
Return type:: set

followLists()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

last()[source]¶

Last set

Returns:: last position set
Return type:: set

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

unmark()[source]¶

Conversion back to RegExp

Return type:: reex.__class__

class Connective(arg1, arg2, sigma=None)[source]¶

Base class for (binary) operations: concatenation, disjunction, etc

alphabeticLength()[source]¶: Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int

Attention

Doesn’t include the empty word.

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

last(parent_last=None)[source]¶

Last set

Returns:: last position set
Return type:: set

abstract linearForm()[source]¶

Returns:: linear form
Return type:: dict

abstract mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

abstract rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:: set of symbols
Return type:: set

abstract snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

class DAG(reg)[source]¶

Class to support dags representing regexps

…seealso: P. Flajolet, P. Sipala, J.-M. Steyaert, Analytic variations on the common subexpression problem,: in: Automata, Languages and Programmin, LNCS, vol. 443, Springer, New York, 1990, pp. 220–234.

Args: reg (RegExp): regular expression

NFA()[source]¶

Returns:: the partial derivative automaton
Return type:: NFA

catLF(idl, idr, delay=False)[source]¶

Linear form for concatenation :param idl: node :type idl: int :param idr: node :type idr: int :param delay: if true partial derivatives are delayed :type delay: bool

Returns:: partial derivatives
Return type:: dict

..note:: both arguments are assumed to be already present in the DAG

getAtomIdx(val)[source]¶

Node atom :param val: letter :type val: str

Returns:: node id
Return type:: int

getIdx(reg)[source]¶

Builds dag nodes :param reg: regular expression :type reg: regexp

Returns:: node id
Return type:: int

interLF(diff1, diff2)[source]¶

Intersection of partial derivatives

Parameters:

diff1 (dict) – partial diff of the first argument
diff2 (dict) – partial diff of the second argument

Return type:

dict

static plusLF(diff1, diff2)[source]¶

Union of partial derivatives

Parameters:

diff1 (dict) – partial diff of the first argument
diff2 (dict) – partial diff of the second argument

Return type:

dict

shuffleLF(id1, id2)[source]¶

Shuffle of partial derivatives :param id1: node :type id1: int :param id2: node :type id2: int

Returns:: partial derivatives
Return type:: dict

class DAG_I(reg)[source]¶

Class to support dags representing regexps that inherit from DAG Partial derivatives are buid incrementally

Args: reg (RegExp): regular expression

cat_one(idl, idr, c)[source]¶

Partial derivative by one symbol for concatenation :param idl: node :type idl: int :param idr: node :type idr: int :param c: symbol :type c: char

Returns:: partial derivatives
Return type:: set

evalWordP(w)[source]¶

Parameters:: w (str) – a word
Returns:: True if w in L(reg)
Return type:: bool

one_derivative(id, c)[source]¶

Parameters:

c (str) – a symbol
id (int) – a node (representing a regexp)

shuffle_one(id1, id2, c)[source]¶

Shuffle of partial derivatives :param id1: node :type id1: int :param id2: node :type id2: int :param c: symbol :type c: char

Returns:: of partial derivatives
Return type:: set

class MAtom(val, mark, sigma=None)[source]¶

Base class for pointed (marked) regular expressions

Used directly to represent atoms (characters). This class is used to obtain Yamada or Asperti automata. There is no evident use for it, outside this module.

Parameters:

val – symbol
sigma – alphabet

unmark()[source]¶

Conversion back to RegExp

Return type:: reex.RegExp

class Position(val, sigma=None)[source]¶

Class for marked regular expression symbols.

Constructor of a regular expression symbol.

Parameters:: val – the actual symbol

setOfSymbols()[source]¶

Set of symbols that occur in a regular expression..

Returns:: set of symbols
Return type:: set

unmarked()[source]¶

The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(), the CEpsilon() or the CEmptySet().

Returns:: (general) regular expression
Return type:: RegExp

class Power(arg, n=1, sigma=None)[source]¶

Class for Power operation on regular expressions.

alphabeticLength()[source]¶: Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int

Attention

Doesn’t include the empty word.

epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

first()[source]¶

First set

Returns:: first position set
Return type:: set

followLists()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

last()[source]¶

Last set

Returns:: last position set
Return type:: set

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

reversal()[source]¶

Reversal of RegExp

Return type:: reex.RegExp

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:: set of symbols
Return type:: set

snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶: Set of partial derivatives

tailForm()[source]¶

Returns:: tail form
Return type:: dict

treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

class RegExp(sigma=None)[source]¶

Base class for regular expressions.

Variables:: Sigma – alphabet set of strings

abstract static alphabeticLength()[source]¶: Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int

Attention

Doesn’t include the empty word.

compare(r, cmp_method='compareMinimalDFA', nfa_method='nfaPD')[source]¶

Compare with another regular expression for equivalence.

Parameters:

r (RegExp) –
cmp_method (str) –
nfa_method (str) – NFA construction

Returns:

True if the expressions are equivalent

Return type:

bool

compareMinimalDFA(r, nfa_method='nfaPosition')[source]¶

Compare with another regular expression for equivalence through minimal DFAs.

Parameters:

r (RegExp) –
nfa_method (str) – NTFA construction

Returns:

True if equivalent

Return type:

bool

dfaAuPoint()[source]¶

DFA “au-point” acconding to Nipkow

Returns:: “au-point” DFA
Return type:: DFA

See also

Andrea Asperti, Claudio Sacerdoti Coen and Enrico Tassi, Regular Expressions, au point. arXiv 2010

See also

Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for Regular Expression Equivalence

dfaBrzozowski(memo=None)[source]¶

Word derivatives automaton of the regular expression

Parameters:: memo – if True memorizes the states already computed
Returns:: word derivatives automaton
Return type:: DFA

See also

1. Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)

dfaNaiveFollow()[source]¶

DFA that accepts the regular expression’s language, and is obtained from the follow automaton.

Returns:: DFA follow
Return type:: NFA

Note

Included for testing purposes.

See also

Ilie & Yu (Follow Automata, 2003)

dfaYMG()[source]¶

DFA Yamada-McNaugthon-Gluskov acconding to Nipkow

Returns:: Y-M-G DFA
Return type:: DFA

See also

Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for Regular Expression Equivalence

static emptysetP()[source]¶

Whether the regular expression is the empty set.

Return type:: bool

abstract static epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

static epsilonP()[source]¶

Whether the regular expression is the empty word.

Return type:: bool

equivP(other, strict=True)[source]¶

Test RE equivalence with extended Hopcroft-Karp method

Parameters:

other (RegExp) – RE
strict (bool) – if True checks for same alphabets

Return type:

bool

equivalentP(other)[source]¶

Tests equivalence

Parameters:: other (RegExp) – other regexp
Returns:: True if regexps are equivalent
Return type:: bool

evalWordP(word)[source]¶

Verifies if a word is a member of the language represented by the regular expression.

Parameters:: word (str) – the word
Returns:: True if word belongs to the language
Return type:: bool

static ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

abstract first()[source]¶

First set

Returns:: first position set
Return type:: set

abstract followLists()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

abstract followListsD()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

abstract last()[source]¶

Last set

Returns:: last position set
Return type:: set

abstract linearForm()[source]¶

Returns:: linear form
Return type:: dict

abstract mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

marked()[source]¶

Regular expression in which every alphabetic symbol is marked with its Position.

The kind of regular expression returned is known, depending on the literary source, as marked, linear or restricted regular expression.

Returns:: linear regular expression
Return type:: RegExp

See also

r. McNaughton and H. Yamada, Regular Expressions and State Graphs for Automata, IEEE Transactions on Electronic Computers, V.9 pp:39-47, 1960

..attention: mark and unmark do not preserve the alphabet, neither set the new alphabet

nfaFollow()[source]¶

NFA that accepts the regular expression’s language, whose structure, equiand construction.

Returns:: NFA follow
Return type:: NFA

See also

Ilie & Yu (Follow Automata, 03)

nfaFollowEpsilon(trim=True)[source]¶

Epsilon-NFA constructed with Ilie and Yu’s method () that accepts the regular expression’s language.

Parameters:: trim (bool) – if True automaton is trimmed at the end
Returns:: possibly with epsilon transitions
Return type:: NFAe

Note

The regular expression must be reduced

See also

Ilie & Yu, Follow automta, Inf. Comp. ,v. 186 (1),140-162,2003

nfaGlushkov()[source]¶

Position or Glushkov automaton of the regular expression. Recursive method.

Returns:: NFA position
Return type:: NFA

nfaLoc()[source]¶

Location automaton of the regular expression.

Returns:: location nfa
Return type:: NFA

nfaNaiveFollow()[source]¶

NFA that accepts the regular expression’s language, and is equal in structure to the follow automaton.

Returns:: NFA follow
Return type:: NFA

Note

Included for testing purposes.

See also

Ilie & Yu (Follow Automata, 2003)

nfaPD(pdmethod='nfaPDDAG')[source]¶

Computes the partial derivative automaton

Parameters:: pdmethod (str) – an implementation of the PD automaton. Default value : nfaPDDAG
Returns:: a PD nfa
Return type:: NFA

Attention

for sre classes, CConj and CShuffle use nfaPDNaive directly

nfaPDDAG()[source]¶

Partial derivative automaton using a DAG for the re and partial derivatives

Returns:: a PD nfa build using a DAG
Return type:: NFA

..seealso:: s.Konstantinidis, A. Machiavelo, N. Moreira, and r. Reis.: Partial derivative automaton by compressing regular expressions. DCFS 2021, volume 13037 of LNCS, pages 100–112. Springer, 2022

nfaPDNaive()[source]¶

NFA that accepts the regular expression’s language,: and which is constructed from the expression’s partial derivatives.

Returns:: partial derivatives [or equation] automaton
Return type:: NFA

See also

V. M. Antimirov, Partial Derivatives of Regular Expressions and Finite Automaton Constructions .Theor. Comput. Sci.155(2): 291-319 (1996)

nfaPDO()[source]¶

NFA that accepts the regular expression’s language, and which is constructed from the expression’s partial: derivatives.

Returns:: partial derivatives [or equation] automaton
Return type:: NFA

Note

optimized version

nfaPSNF()[source]¶

Position or Glushkov automaton of the regular expression constructed from the expression’s star normal form.

Returns:: Position automaton
Return type:: NFA

nfaPosition(lstar=True)[source]¶

Position automaton of the regular expression.

Parameters:: lstar (bool) – if not None followlists are computed as disjunct
Returns:: Position NFA
Return type:: NFA

nfaPre()[source]¶

Prefix NFA of a regular expression

Returns:: prefix automaton
Return type:: NFA

See also

Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015

notEmptyW()[source]¶

Witness of non emptyness

Return type:: word or None

abstract rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

abstract static setOfSymbols()[source]¶

Returns:: set of symbols
Return type:: set

setSigma(symbolset=None, strict=False)[source]¶

Set the alphabet for a regular expression and all its nodes

Parameters:

symbolset (set or list) – accepted symbols. If None, alphabet is unset.
strict (bool) – if True checks if setOfSymbols is included in symbolSet

..attention: Normally this attribute is not defined in a RegExp()

abstract snf()[source]¶: Star Normal Form

abstract static starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

abstract support(side=True)[source]¶: Set of partial derivatives

abstract tailForm()[source]¶

Returns:: tail form
Return type:: dict

toDFA()[source]¶: DFA that accepts the regular expression’s language

toNFA(nfa_method='nfaPD')[source]¶: NFA that accepts the regular expression’s language. :param nfa_method:

abstract static treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

unionSigma(other)[source]¶

Returns the union of two alphabets

Return type:: set

wordDerivative(word)[source]¶

Derivative of the regular expression in relation to the given word,: which is represented by a list of symbols.

Parameters:: word (list) – list of arbitrary symbols.
Returns:: regular expression
Return type:: RegExp

See also

1. Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)

class RegularExpression[source]¶: Abstract base class for all regular expression objects

class SConcat(arg, sigma=None)[source]¶

Class that represents the concatenation operation.

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

ewp()[source]¶

Returns:

head()[source]¶

Returns:

head_rev()[source]¶

Returns:

linearForm()[source]¶

Returns:

linearFormC()[source]¶

Returns:

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

support(side=True)[source]¶

Returns:

tail()[source]¶

Returns:

tailForm()[source]¶

Returns:: tail form
Return type:: dict

tail_rev()[source]¶

Returns:

class SConj(arg, sigma=None)[source]¶

Class that represents the conjunction operation.

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

ewp()[source]¶

Returns:

linearForm()[source]¶

Returns:

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

support(side=True)[source]¶

Returns:

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class SConnective(arg, sigma=None)[source]¶

Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;

associativity of concatenation; identities sigma^* and sigma^+. Connectives are:

SDisj: disjunction SConj: intersection SConcat: concatenation

For parsing use str2sre

alphabeticLength()[source]¶

Returns:

epsilonLength()[source]¶

Returns:

first()[source]¶

First set

Returns:: first position set
Return type:: set

followLists()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

last()[source]¶

Last set

Returns:: last position set
Return type:: set

linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:

snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

abstract support(side=True)[source]¶: Set of partial derivatives

syntacticLength()[source]¶

Returns:

abstract tailForm()[source]¶

Returns:: tail form
Return type:: dict

treeLength()[source]¶

Returns:

class SDisj(arg, sigma=None)[source]¶

Class that represents the disjunction operation for special regular expressions.

static cross(ri, s, lists)[source]¶

Return type:: list

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

ewp()[source]¶

Returns:

first()[source]¶

Returns:

followLists(lists=None)[source]¶

Parameters:: lists –
Returns:

followListsStar(lists=None)[source]¶

Parameters:: lists –
Returns:

last()[source]¶

Returns:

linearForm()[source]¶

Returns:

linearFormC()[source]¶

Returns:

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

support(side=True)[source]¶

Returns:

tailForm()[source]¶

Returns:: tail form
Return type:: dict

class SNot(arg, sigma=None)[source]¶

Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;: associativity of concatenation; identities sigma^* and sigma^+. SNot: negation

alphabeticLength()[source]¶

Returns:

derivative(sigma)[source]¶: :param sigma :return:

epsilonLength()[source]¶

Returns:

ewp()[source]¶

Returns:

first()[source]¶

First set

Returns:: first position set
Return type:: set

followLists()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD()[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

last()[source]¶

Last set

Returns:: last position set
Return type:: set

linearForm()[source]¶

Returns:

linearFormC()[source]¶

Returns:

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:

snf()[source]¶: Star Normal Form

starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶

Returns:

syntacticLength()[source]¶

Returns:

tailForm()[source]¶

Returns:: tail form
Return type:: dict

treeLength()[source]¶

Returns:

class SStar(arg, sigma=None)[source]¶

Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;: associativity of concatenation; identities sigma^* and sigma^+.

SStar: Class that represents Kleene star

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

linearForm()[source]¶

Returns:

nfaPD(pdmethod='nfaPDNaive')[source]¶: Computes the partial derivative automaton

partialDerivatives(sigma)[source]¶

Parameters:: sigma –
Returns:

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

support(side=True)[source]¶

Returns:

class SpecialConstant(sigma=None)[source]¶

Base class for Epsilon and EmptySet

Parameters:: sigma – alphabet

static alphabeticLength()[source]¶

Returns:

derivative(sigma)[source]¶

Parameters:: sigma –
Returns:

distDerivative(sigma)[source]¶

Parameters:: sigma – an arbitrary symbol.
Return type:: regular expression

static epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

static first(parent_first=None)[source]¶

Parameters:: parent_first –
Returns:

followLists(lists=None)[source]¶

Parameters:: lists –
Returns:

followListsD(lists=None)[source]¶

Parameters:: lists –
Returns:

static followListsStar(lists=None)[source]¶

Parameters:: lists –
Returns:

last(parent_last=None)[source]¶

Parameters:: parent_last –
Returns:

linearForm()[source]¶

Returns:

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

partialDerivativesC(sigma)[source]¶

Parameters:: sigma –
Returns:

reversal()[source]¶

Reversal of RegExp

Return type:: reex.RegExp

abstract rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

static setOfSymbols()[source]¶

Returns:

snf()[source]¶: Star Normal Form

static starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

support(side=True)[source]¶

Returns:

supportlast(side=True)[source]¶

Returns:

tailForm()[source]¶

Returns:

static treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

unmark()[source]¶: Conversion back to unmarked atoms :rtype: SpecialConstant

unmarked()[source]¶

The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(), the CEpsilon() or the CEmptySet().

Return type:: (general) regular expression

wordDerivative(word)[source]¶

Parameters:: word –
Returns:

class Unary(arg, sigma=None)[source]¶

Base class for unary operations: star, option, not, unary shuffle, etc

alphabeticLength()[source]¶: Number of occurrences of alphabet symbols in the regular expression. :returns: alphapetic length :rtype: int

Attention

Doesn’t include the empty word.

abstract epsilonLength()[source]¶

Number of occurrences of the empty word in the regular expression.

Returns:: number of epsilons
Return type:: int

abstract ewp()[source]¶

Whether the empty word property holds for this regular expression’s language.

Return type:: bool

abstract first(parent_first=None)[source]¶

First set

Returns:: first position set
Return type:: set

followLists(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

followListsD(lists=None)[source]¶

Follow set

Returns:: for each key position a set of follow positions
Return type:: dict

abstract last(parent_last=None)[source]¶

Last set

Returns:: last position set
Return type:: set

abstract linearForm()[source]¶

Returns:: linear form
Return type:: dict

mark()[source]¶

Make all atoms maked (tag False)

Return type:: RegExp

reversal()[source]¶

Reversal of RegEx

Return type:: reex.RegExp

abstract rpn()[source]¶

RPN representation

Returns:: printable RPN representation
Return type:: str

setOfSymbols()[source]¶

Returns:: set of symbols
Return type:: set

snf()[source]¶: Star Normal Form

abstract starHeight()[source]¶

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Returns:: number of nested star
Return type:: int

treeLength()[source]¶

Number of nodes of the regular expression’s syntactical tree.

Returns:: tree lenght
Return type:: int

unmark()[source]¶

Conversion back to RegExp

Return type:: reex.__class__

equivalentP(first, second)[source]¶

Verifies if the two languages given by some representative (DFA, NFA or re) are equivalent

Parameters:

first – language
second – language

Return type:

bool

New in version 0.9.6.

powerset(iterable)[source]¶

Powerset of a set. :param iterable: the set :type iterable: list

Returns:: the powerset
Return type:: itertools.chain

rpn2regexp(s, sigma=None, strict=False)[source]¶

Reads a (simple) RegExp from a RPN representation

r ::=  .RR | +RR | *r | L | @
L ::=  [a-z] | [A-Z]

Parameters:

s (str) – RPN representation
strict (bool) – Boolean
sigma (set) – alphabet

Return type:

reex.RegExp

Note

This method uses python stack… thus depth limitations apply

str2regexp(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]¶

Reads a RegExp from string.

Parameters:

s (string) – the string representation of the regular expression
parser – a parser generator for regexps
sigma (list or set of symbols) – alphabet of the regular expression
strict (boolean) – if True tests if the symbols of the regular expression are included in sigma

Return type:

reex.RegExp

str2sre(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]¶

Reads a sre from string. Arguments as str2regexp.

Return type:: reex.sre

to_s(r)[source]¶

Returns a sre from FAdo regexp.

Parameters:: r (RegExp) – the FAdo representation regexp for a regular expression.
Return type:: RegExp

FAdo.reex¶

Table of Contents

Previous topic

Next topic

This Page