Unix Power ToolsUnix Power ToolsSearch this book

13.6. Approximate grep: agrep

agrep is one of the nicer additions to the grep family. It's not only one of the faster greps around; it also has the unique feature of looking for approximate matches. It's also record oriented rather than line oriented. The three most significant features of agrep that are not supported by the grep family are as follows:

  1. The ability to search for approximate patterns, with a user-definable level of accuracy. For example:

    % agrep -2 homogenos foo

    will find "homogeneous," as well as any other word that can be obtained from "homogenos" with at most two substitutions, insertions, or deletions.

    % agrep -B homogenos foo

    will generate a message of the form:

    best match has 2 errors, there are 5 matches, output them? (y/n)
  2. agrep is record oriented rather than just line oriented; a record is by default a line, but it can be user-defined with the -d option specifying a pattern that will be used as a record delimiter. For example:

    % agrep -d '^From ' 'pizza' mbox

    outputs all mail messages (Section 1.21) (delimited by a line beginning with From and a space) in the file mbox that contain the keyword pizza. Another example:

    % agrep -d '$$' pattern foo

    will output all paragraphs (separated by an empty line) that contain pattern.

  3. agrep allows multiple patterns with AND (or OR) logic queries. For example:

    % agrep -d '^From ' 'burger,pizza' mbox

    outputs all mail messages containing at least one of the two keywords (, stands for OR).

    % agrep -d '^From ' 'good;pizza' mbox

    outputs all mail messages containing both keywords.

Putting these options together, one can write queries such as the following:

% agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib

which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with Curriculum. Two errors are allowed, but they cannot be in either CACM or the year. (The < > brackets forbid errors in the pattern between them.)

Other agrep features include searching for regular expressions (with or without errors), using unlimited wildcards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as two substitutions or three insertions, restricting parts of the query to be exact and parts to be approximate, and many more.

--JP, SW, and UM



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.