Unix Power ToolsUnix Power ToolsSearch this book

32.12. Regular Expressions: Matching Words with \ < and \ >

Searching for a word isn't quite as simple as it at first appears. The string the will match the word other. You can put spaces before and after the letters and use this regular expression: the. However, this does not match words at the beginning or the end of the line. And it does not match the case where there is a punctuation mark after the word.

There is an easy solution -- at least in many versions of ed, ex, vi, and grep. The characters \< and \> are similar to the ^ and $ anchors, as they don't occupy a position of a character. They anchor the expression between to match only if it is on a word boundary. The pattern to search for the words the and The would be: \<[tT]he\>.

Let's define a "word boundary." The character before the t or T must be either a newline character or anything except a letter, digit, or underscore ( _ ). The character after the e must also be a character other than a digit, letter, or underscore, or it could be the end-of-line character.

-- BB



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.