Unix Power ToolsUnix Power ToolsSearch this book

11.8. Comparing Two Files with comm

The comm command can tell you what information is common to two lists and what information appears uniquely in one list or the other. For example, let's say you're compiling information on the favorite movies of critics Ebert and Roeper. The movies are listed in separate files (and must be sorted (Section 22.1)). For the sake of illustration, assume each list is short:

% cat roeper
Citizen Kane
Halloween VI
Ninja III
Rambo II
Star Trek V
Zelig
% cat ebert
Cat People
Citizen Kane
My Life as a Dog
Q
Z
Zelig

To compare the favorite movies of your favorite critics, type:

% comm roeper ebert
                  Cat People
                                         Citizen Kane
Halloween VI
                  My Life as a Dog
Ninja III
                  Q
Rambo II
Star Trek V
                  Z
                                         Zelig

Column 1 shows the movies that only Roeper likes; column 2 shows those that only Ebert likes; and column 3 shows the movies that they both like. You can suppress one or more columns of output by specifying that column as a command-line option. For example, to suppress columns 1 and 2 (displaying only the movies both critics like), you would type:

% comm -12 roeper ebert
Citizen Kane
Zelig

As another example, say you've just received a new software release (Release 4), and it's your job to figure out which library functions have been added so that they can be documented along with the old ones. Let's assume you already have a list of the Release 3 functions (r3_list) and a list of the Release 4 functions (r4_list). (If you didn't, you could create them by changing to the directory that has the function manual pages, listing the files with ls, and saving each list to a file.) In the following lists, we've used letters of the alphabet to represent the functions:

% cat r3_list
b
c
d
f
g
h

% cat r4_list
a
b
c
d
e
f

You can now use the comm command to answer several questions you might have:

You can create partial lists by saving the previous output to three separate files.

comm can only compare sorted files. In the GNU version, the option -l (lowercase L) means the input files are sorted using the LC_COLLATE collating sequence. If you have non-ASCII characters to sort, check your manual page for details.

-- DG



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.