sed & awksed & awkSearch this book

10.8. Limitations

There are fixed limits within any awk implementation. The only trouble is that the documentation seldom reports them. Table 10.1 lists the limitations as described in The AWK Programming Language. These limitations are implementation-specific but they are good ballpark figures for most systems.

Table 10.1. Limitations

Item Limit
Number of fields per record 100
Characters per input record 3000
Characters per output record 3000
Characters per field 1024
Characters per printf string 3000
Characters in literal string 400
Characters in character class 400
Files open 15
Pipes open 1

NOTE: Despite the number in Table 10.1, experience has shown that most awks allow you to have more than one open pipe.

In terms of numeric values, awk uses double-precision, floating-point numbers that are limited in size by the machine's architecture.

Running into these limits can cause unanticipated problems with scripts. In developing examples for the first edition of this book, Dale thought he'd write a search program that could look for a word or sequence of words in a single paragraph. The idea was to read a document as a series of multiline records and if any of the fields contained the search term, print the record, which was a paragraph. It could be used to search through mail files where blank lines delimit paragraphs. The resulting program worked for small test files. However, when tried on larger files, the program dumped core because it encountered a paragraph that was longer than the maximum input record size, which is 3000 characters. (Actually, the file contained an included mail message where blank lines within the message were prefixed by ">".) Thus, when reading multiple lines as a single record, you better be sure that you don't anticipate records longer than 3000 characters. By the way, there is no particular error message that alerts you to the fact that the problem is the size of the current record.

Fortunately, gawk and mawk (see Chapter 11, "A Flock of awks") don't have such small limits; for example, the number of fields in a record is limited in gawk to the maximum value that can be held in a C long, and certainly records can be longer than 3000 characters. These versions allow you to have more open files and pipes.

Recent versions of the Bell Labs awk have two options, -mf N and -mr N, that allow you to set the maximum number of fields and the maximum record size on the command line, as an emergency way to get around the default limits.

(Sed implementations also have their own limits, which aren't documented. Experience has shown that most UNIX versions of sed have a limit of 99 or 100 substitute (s) commands.)



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.