Unix Power ToolsUnix Power ToolsSearch this book

16.7. Find a a Doubled Word

One type of error that's hard to catch when proofreading is a doubled word. It's hard to miss the double "a" in the title of this article, but you might find yourself from time to time with a "the" on the end of one line and the beginning of another.

We've seen awk scripts to catch this, but nothing so simple as this shell function. Here are two versions; the second is for the System V version of tr (Section 21.11):

uniq Section 21.20

ww( ) { cat $* | tr -cs "a-z'" "\012" | uniq -d; }

ww( ) { cat $* | tr -cs "[a-z]'" "[\012*]" | uniq -d; }

In the script ww.sh, the output of the file is piped to tr to break the stream into separate words, which is then passed to the uniq command for testing of duplicate terms.

--TOR and JP



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.