Interface to a regular expression interpreter¶
This library includes routines to determine whether a regular expression matches part or all of a string.
The routines can also return which parts parts of the string matched the expression or subexpressions of it. This library relies on Henry Spencer's C
-package and is only available in operating systems that support dynamic loading. The C
-code has been obtained from the sources of FreeBSD-4.0 and is protected by copyright from Henry Spencer and from the Regents of the University of California (see the file library/regex/COPYRIGHT for further details).
Much of the description of regular expressions below is copied verbatim from Henry Spencer's manual page.
A regular expression is zero or more branches, separated by `|
. It matches anything that matches one of the branches.
A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.
A piece is an atom possibly followed by \*
, +
, or ?
. An atom followed by \*
matches a sequence of 0 or more matches of the atom. An atom followed by +
matches a sequence of 1 or more matches of the atom. An atom followed by ?
matches a match of the atom, or the null string.
An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), .
(matching any single character), ^
(matching the null string at the beginning of the input string), $
(matching the null string at the end of the input string), a \
followed by a single character (matching that character), or a single character with no other significance (matching that character).
A range is a sequence of characters enclosed in []
. It normally matches any single character from the sequence. If the sequence begins with ^
, it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by -
, this is shorthand for the full list of ASCII characters between them (e.g. [0-9]
matches any decimal digit). To include a literal ]
in the sequence, make it the first character (following a possible ^
). To include a literal -
, make it the first or last character.
Define:¶
Functions:¶
1. static int matcher(struct re_guts g, char string, size_t nmatch, regmatch_t pmatch[], int eflags):
1. static char * dissect(struct match m, char start, char *stop, sopno startst, sopno stopst):
1. static char * backref(struct match m, char start, char *stop, sopno startst, sopno stopst, sopno lev):
1. static char * fast(struct match m, char start, char *stop, sopno startst, sopno stopst):
1. static char * slow(struct match m, char start, char *stop, sopno startst, sopno stopst):
1. static states step(struct re_guts *g, sopno start, sopno stop, states bef, int ch, states aft):
1. static int matcher(struct re_guts g, char string, size_t nmatch, pmatch, int eflags):