Interface to a regular expression interpreter¶

This library includes routines to determine whether a regular expression matches part or all of a string.

The routines can also return which parts parts of the string matched the expression or subexpressions of it. This library relies on Henry Spencer's C-package and is only available in operating systems that support dynamic loading. The C-code has been obtained from the sources of FreeBSD-4.0 and is protected by copyright from Henry Spencer and from the Regents of the University of California (see the file library/regex/COPYRIGHT for further details).

Much of the description of regular expressions below is copied verbatim from Henry Spencer's manual page.

A regular expression is zero or more branches, separated by `|. It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by \*, +, or ?. An atom followed by \* matches a sequence of 0 or more matches of the atom. An atom followed by + matches a sequence of 1 or more matches of the atom. An atom followed by ? matches a match of the atom, or the null string.

An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), . (matching any single character), ^ (matching the null string at the beginning of the input string), $ (matching the null string at the end of the input string), a \ followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in []. It normally matches any single character from the sequence. If the sequence begins with ^, it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by -, this is shorthand for the full list of ASCII characters between them (e.g. [0-9] matches any decimal digit). To include a literal ] in the sequence, make it the first character (following a possible ^). To include a literal -, make it the first or last character.

Define:¶

1. BOL:

1. EOL:

1. BOLEOL:

1. NOTHING:

1. BOW:

1. EOW:

1. CODEMAX:

1. NONCHAR:

1. NNONCHAR:

1. SP:

1. AT:

1. NOTE:

Functions:¶

1. static int matcher(struct re_guts g, char string, size_t nmatch, regmatch_t pmatch[], int eflags):

1. static char * dissect(struct match m, char start, char *stop, sopno startst, sopno stopst):

1. static char * backref(struct match m, char start, char *stop, sopno startst, sopno stopst, sopno lev):

1. static char * fast(struct match m, char start, char *stop, sopno startst, sopno stopst):

1. static char * slow(struct match m, char start, char *stop, sopno startst, sopno stopst):

1. static states step(struct re_guts *g, sopno start, sopno stop, states bef, int ch, states aft):

1. static int matcher(struct re_guts g, char string, size_t nmatch, pmatch, int eflags):