from
The Free On-line Dictionary of Computing (8 July 2008)
regular expression
RE
1. <text, operating system> (regexp, RE) One of the {wild
card} patterns used by {Perl} and other languages, following
{Unix} utilities such as {grep}, {sed}, and {awk} and editors
such as {vi} and {Emacs}. Regular expressions use conventions
similar to but more elaborate than those described under
{glob}. A regular expression is a sequence of characters with
the following meanings:
An ordinary character (not one of the special characters
discussed below) matches that character.
A backslash (\) followed by any special character matches the
special character itself. The special characters are:
"." matches any character except NEWLINE; "RE*" (where
the "*" is called the "{Kleene star}") matches zero
or more occurrences of RE. If there is any choice, the
longest leftmost matching string is chosen, in most
regexp {flavours}.
"^" at the beginning of an RE matches the start of a line and
"$" at the end of an RE matches the end of a line.
[string] matches any one character in that string. If the
first character of the string is a "^" it matches any
character except the remaining characters in the string (and
also usually excluding NEWLINE). "-" may be used to indicate
a range of consecutive ASCII characters.
\( RE \) matches whatever RE matches and \n, where n is a
digit, matches whatever was matched by the RE between the nth
\( and its corresponding \) earlier in the same RE. Many
flavours use ( RE ) used instead of \( RE \).
The concatenation of REs is a RE that matches the
concatenation of the strings matched by each RE. RE1 | RE2
matches whatever RE1 or RE2 matches.
\< matches the beginning of a word and \> matches the end of a
word. In many flavours of regexp, \> and \< are replaced by
"\b", the special character for "word boundary".
RE\{m\} matches m occurences of RE. RE\{m,\} matches m or
more occurences of RE. RE\{m,n\} matches between m and n
occurences.
The exact details of how regexp will work in a given
application vary greatly from flavour to flavour. A
comprehensive survey of regexp flavours is found in Friedl
1997 (see below).
[Jeffrey E.F. Friedl, "Mastering Regular Expressions
(http://enterprise.ic.gc.ca/~jfriedl/regex/index.html),
O'Reilly, 1997].
2. Any description of a {pattern} composed from combinations
of {symbols} and the three {operators}:
Concatenation - pattern A concatenated with B matches a match
for A followed by a match for B.
Or - pattern A-or-B matches either a match for A or a match
for B.
Closure - zero or more matches for a pattern.
The earliest form of regular expressions (and the term itself)
were invented by mathematician {Stephen Cole Kleene} in the
mid-1950s, as a notation to easily manipulate "regular sets",
formal descriptions of the behaviour of {finite state
machines}, in {regular algebra}.
[S.C. Kleene, "Representation of events in nerve nets and
finite automata", 1956, Automata Studies. Princeton].
[J.H. Conway, "Regular algebra and finite machines", 1971, Eds
Chapman & Hall].
[Sedgewick, "Algorithms in C", page 294].
(2004-02-01)