Unix Power ToolsUnix Power ToolsSearch this book

32.11. Regular Expressions: Matching a Specific Number of Sets with \ { and \ }

You cannot specify a maximum number of sets with the * modifier. However, some programs (Section 32.20) recognize a special pattern you can use to specify the minimum and maximum number of repeats. This is done by putting those two numbers between \{ and \}.

Having convinced you that \{ isn't a plot to confuse you, an example is in order. The regular expression to match four, five, six, seven, or eight lowercase letters is:

[a-z]\{4,8\}

Any numbers between 0 and 255 can be used. The second number may be omitted, which removes the upper limit. If the comma and the second number are omitted, the pattern must be duplicated the exact number of times specified by the first number.

WARNING: The backslashes deserve a special discussion. Normally a backslash turns off the special meaning for a character. For example, a literal period is matched by \. and a literal asterisk is matched by \*. However, if a backslash is placed before a <, >, {, }, (, or ) or before a digit, the backslash turns on a special meaning. This was done because these special functions were added late in the life of regular expressions. Changing the meaning of {, }, (, ), <, and > would have broken old expressions. (This is a horrible crime punishable by a year of hard labor writing COBOL programs.) Instead, adding a backslash added functionality without breaking old programs. Rather than complain about the change, view it as evolution.

You must remember that modifiers like * and \{1,5\} act as modifiers only if they follow a character set. If they were at the beginning of a pattern, they would not be modifiers. Table 32-3 is a list of examples and the exceptions.

Table 32-3. Regular expression pattern repetition examples

Regular expression

Matches

*

Any line with a *

\*

Any line with a *

\\

Any line with a \

^*

Any line starting with a *

^A*

Any line

^A\*

Any line starting with an A*

^AA*

Any line starting with one A

^AA*B

Any line starting with one or more A's followed by a B

^A\{4,8\}B

Any line starting with four, five, six, seven, or eight A's followed by a B

^A\{4,\}B

Any line starting with four or more A's followed by a B

^A\{4\}B

Any line starting with an AAAAB

\{4,8\}

Any line with a {4,8}

A{4,8}

Any line with an A{4,8}

-- BB



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.