regular expression

from The Free On-line Dictionary of Computing (8 July 2008)
regular expression
RE

   1. <text, operating system> (regexp, RE) One of the {wild
   card} patterns used by {Perl} and other languages, following
   {Unix} utilities such as {grep}, {sed}, and {awk} and editors
   such as {vi} and {Emacs}.  Regular expressions use conventions
   similar to but more elaborate than those described under
   {glob}.  A regular expression is a sequence of characters with
   the following meanings:

   An ordinary character (not one of the special characters
   discussed below) matches that character.

   A backslash (\) followed by any special character matches the
   special character itself.  The special characters are:

   "." matches any character except NEWLINE; "RE*" (where
   the "*" is called the "{Kleene star}") matches zero
   or more occurrences of RE.  If there is any choice, the
   longest leftmost matching string is chosen, in most
   regexp {flavours}.

   "^" at the beginning of an RE matches the start of a line and
   "$" at the end of an RE matches the end of a line.

   [string] matches any one character in that string.  If the
   first character of the string is a "^" it matches any
   character except the remaining characters in the string (and
   also usually excluding NEWLINE).  "-" may be used to indicate
   a range of consecutive ASCII characters.

   \( RE \) matches whatever RE matches and \n, where n is a
   digit, matches whatever was matched by the RE between the nth
   \( and its corresponding \) earlier in the same RE.  Many
   flavours use ( RE ) used instead of \( RE \).

   The concatenation of REs is a RE that matches the
   concatenation of the strings matched by each RE.  RE1 | RE2
   matches whatever RE1 or RE2 matches.

   \< matches the beginning of a word and \> matches the end of a
   word.  In many flavours of regexp, \> and \< are replaced by
   "\b", the special character for "word boundary".

   RE\{m\} matches m occurences of RE.  RE\{m,\} matches m or
   more occurences of RE.  RE\{m,n\} matches between m and n
   occurences.

   The exact details of how regexp will work in a given
   application vary greatly from flavour to flavour.  A
   comprehensive survey of regexp flavours is found in Friedl
   1997 (see below).

   [Jeffrey E.F. Friedl, "Mastering Regular Expressions
   (http://enterprise.ic.gc.ca/~jfriedl/regex/index.html),
   O'Reilly, 1997].

   2. Any description of a {pattern} composed from combinations
   of {symbols} and the three {operators}:

   Concatenation - pattern A concatenated with B matches a match
   for A followed by a match for B.

   Or - pattern A-or-B matches either a match for A or a match
   for B.

   Closure - zero or more matches for a pattern.

   The earliest form of regular expressions (and the term itself)
   were invented by mathematician {Stephen Cole Kleene} in the
   mid-1950s, as a notation to easily manipulate "regular sets",
   formal descriptions of the behaviour of {finite state
   machines}, in {regular algebra}.

   [S.C. Kleene, "Representation of events in nerve nets and
   finite automata", 1956, Automata Studies. Princeton].

   [J.H. Conway, "Regular algebra and finite machines", 1971, Eds
   Chapman & Hall].

   [Sedgewick, "Algorithms in C", page 294].

   (2004-02-01)