Unix Power ToolsUnix Power ToolsSearch this book

32.10. Regular Expressions: Repeating Character Sets with *

The third part of a regular expression is the modifier. It is used to specify how many times you expect to see the previous character set. The special character * (asterisk) matches zero or more copies. That is, the regular expression 0* matches zero or more zeros, while the expression [0-9]* matches zero or more digits.

This explains why the pattern ^#* is useless (Section 32.4), as it matches any number of # s at the beginning of the line, including zero. Therefore, this will match every line, because every line starts with zero or more # s.

At first glance, it might seem that starting the count at zero is stupid. Not so. Looking for an unknown number of characters is very important. Suppose you wanted to look for a digit at the beginning of a line, and there may or may not be spaces before the digit. Just use ^* to match zero or more spaces at the beginning of the line. If you need to match one or more, just repeat the character set. That is, [0-9]* matches zero or more digits and [0-9][0-9]* matches one or more digits.

-- BB



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.