Book HomeBook TitleSearch this book

8.4. trap

We've been discussing how signals affect the casual user; now let's talk a bit about how shell programmers can use them. We won't go into too much depth about this, because it's really the domain of systems programmers.

We mentioned earlier that programs in general can be set up to "trap" specific signals and process them in their own way. The trap built-in command lets you do this from within a shell script. trap is most important for "bullet-proofing" large shell programs so that they react appropriately to abnormal events -- just as programs in any language should guard against invalid input. It's also important for certain systems programming tasks, as we'll see in the next chapter.

The syntax of trap is:

trap cmd sig1 sig2 ...

That is, when any of sig1, sig2, etc., are received, run cmd, then resume execution. After cmd finishes, the script resumes execution just after the command that was interrupted.[121]

[121] This is what usually happens. Sometimes the command currently running aborts (sleep acts like this, as we'll see soon); other times it finishes running. Further details are beyond the scope of this book.

Of course, cmd can be a script or function. The sigs can be specified by name or by number. You can also invoke trap without arguments, in which case the shell prints a list of any traps that have been set, using symbolic names for the signals. If you use trap -p, the shell prints the trap settings in a way that can be saved and reread later by a different invocation of the shell.

The shell scans the text of cmd twice. The first time is while it is preparing to run the trap command; all the substitutions as outlined in Chapter 7 are performed before executing the trap command. The second time is when the shell actually executes the trap. For this reason, it is best to use single quotes around the cmd in the text of the shell program. When the shell executes the trap's command, $? is always the exit status of the last command run before the trap started. This is important for diagnostics.

Here's a simple example that shows how trap works. Suppose we have a shell script called loop with this code:

while true; do
    sleep 60
done

This just pauses for 60 seconds (the sleep(1) command) and repeats indefinitely. true is a "do-nothing" command whose exit status is always 0. For efficiency, it is built-in to the shell. (The false command is a similar "do-nothing" command whose exit status is always 1. It is also built-in to the shell.) As it happens, sleep is also built-in to the shell. Try typing in this script. Invoke it, let it run for a little while, then type CTRL-C (assuming that is your interrupt key). It should stop, and you should get your shell prompt back.

Now insert the following line at the beginning of the script:

trap 'print "You hit control-C!"' INT

Invoke the script again. Now hit CTRL-C. The odds are overwhelming that you are interrupting the sleep command (as opposed to true). You should see the message "You hit control-C!", and the script will not stop running; instead, the sleep command will abort, and it will loop around and start another sleep. Hit CTRL-\ to get it to stop. Type rm core to get rid of the resulting core dump file.

Next, run the script in the background by typing loop &. Type kill %loop (i.e., send it the TERM signal); the script will terminate. Add TERM to the trap command, so that it looks like this:

trap 'print "You hit control-C!"' INT TERM

Now repeat the process: run it in the background and type kill %loop. As before, you will see the message and the process will keep running. Type kill -KILL %loop to stop it.

Notice that the message isn't really appropriate when you use kill. We'll change the script so it prints a better message in the kill case:

trap 'print "You hit control-C!"' INT
trap 'print "You tried to kill me!"' TERM

while true; do
    sleep 60
done

Now try it both ways: in the foreground with CTRL-C and in the background with kill. You'll see different messages.

8.4.1. Traps and Functions

The relationship between traps and shell functions is straightforward, but it has certain nuances that are worth discussing. The most important thing to understand is that Korn shell functions (those created using the function keyword; see Chapter 4) have their own local traps; these aren't known outside of the function. Old-style POSIX functions (those created using the name() syntax) share traps with the parent script.

Let's start with function-style functions, where traps are local. In particular, the surrounding script doesn't know about them. Consider this code:

function settrap {
    trap 'print "You hit control-C!"' INT
}

settrap
while true; do
    sleep 60
done

If you invoke this script and hit your interrupt key, it just exits. The trap on INT in the function is known only inside that function. On the other hand:

function loop {
    trap 'print "How dare you!"' INT
    while true; do
        sleep 60
    done
}

trap 'print "You hit control-C!"' INT
loop

When you run this script and hit your interrupt key, it prints "How dare you!" But how about this:

function loop {
    while true; do
        sleep 60
    done
}

trap 'print "You hit control-C!"' INT
loop
print 'exiting ...'

This time the looping code is within a function, and the trap is set in the surrounding script. If you hit your interrupt key, it prints the message and then prints "exiting..." It does not repeat the loop as above.

Why? Remember that when the signal comes in, the shell aborts the current command, which in this case is a call to a function. The entire function aborts, and execution resumes at the next statement after the function call.

The advantage of traps that are local to functions is that they allow you to control a function's behavior separately from the surrounding code.

Yet you may want to define global traps inside functions. There is a rather kludgy way to do this; it depends on a feature that we introduce in Chapter 9, which we call a "fake signal." Here is a way to set trapcode as a global trap for signal SIG inside a function:

trap "trap trapcode SIG" EXIT

This sets up the command trap trapcode SIG to run right after the function exits, at which time the surrounding shell script is in scope (i.e., is "in charge"). When that command runs, trapcode is set up to handle the SIG signal.

For example, you may want to reset the trap on the signal you just received, like this:

function trap_handler {
    trap "trap second_handler INT" EXIT
    print 'Interrupt: one more to abort.'
}

function second_handler {
    print 'Aborted.'
    exit
}

trap trap_handler INT

This code acts like the Unix mail utility: when you are typing in a message, you must press your interrupt key twice to abort the process.

There is a less kludgy way to this, taking advantage of the fact that POSIX-style functions share traps with the parent script:

# POSIX style function, trap is global
trap_handler () {
    trap second_handler INT
    print 'Interrupt: one more to abort.'
}

function second_handler {
    print 'Aborted.'
    exit
}

trap trap_handler INT

while true ; do
    sleep 60
done

If you type this in and run it, you get the same results as in the previous example, without the extra trickery of using the fake EXIT signal.

Speaking of mail, in Task 8-2 we'll show a more practical example of traps.

Task 8-2

As part of an electronic mail system, write the shell code that lets a user compose a message.

The basic idea is to use cat to create the message in a temporary file and then hand the file's name off to a program that actually sends the message to its destination. The code to create the file is very simple:

msgfile=/tmp/msg$$
cat > $msgfile

Since cat without an argument reads from the standard input, this just waits for the user to type a message and end it with the end-of-file character CTRL-D.

8.4.2. Process ID Variables and Temporary Files

The only thing new about this is $$ in the filename expression. This is a special shell variable whose value is the process ID of the current shell.

To see how $$ works, type ps and note the process ID of your shell process (ksh). Then type print "$$"; the shell responds with that same number. Now type ksh to start a shell subprocess, and when you get a prompt, repeat the process. You should see a different number, probably slightly higher than the last one.

You can examine the parent-child relationship in more detail by using the PPID (parent process ID) variable. ksh sets this to the process ID of the parent process. Each time you start a new instance of ksh, if you type print $PPID you should see a number that is the same as the $$ of the earlier shell.

A related built-in shell variable is ! (i.e., its value is $!), which contains the process ID of the most recently invoked background job. To see how this works, invoke any job in the background and note the process ID printed by the shell next to [1]. Then type print "$!"; you should see the same number.

To return to our mail example: since all processes on the system must have unique process IDs, $$ is excellent for constructing names of temporary files. We saw an example of this in Chapter 7, when discussing command-line evaluation steps, and there are also examples in Chapter 9.[122]

[122] In practice, temporary filenames based just on $$ can lead to insecure systems. If you have the mktemp(1) program on your system, you should use it in your applications to generate unique names for your temporary files.

The directory /tmp is conventionally used for temporary files. Files in this directory are usually erased whenever the computer is rebooted.

Nevertheless, a program should clean up such files before it exits, to avoid taking up unnecessary disk space. We could do this in our code very easily by adding the line rm $msgfile after the code that actually sends the message. But what if the program receives a signal during execution? For example, what if a user changes his or her mind about sending the message and hits CTRL-C to stop the process? We would need to clean up before exiting. We'll emulate the actual Unix mail system by saving the message being written in a file called dead.letter in the current directory. We can do this by using trap with a command string that includes an exit command:

trap 'mv $msgfile dead.letter; exit' INT TERM
msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address ...
rm $msgfile

When the script receives an INT or TERM signal, it saves the temp file and then exits. Note that the command string isn't evaluated until it needs to be run, so $msgfile will contain the correct value; that's why we surround the string in single quotes.

But what if the script receives a signal before msgfile is created -- unlikely though that may be? Then mv will try to rename a file that doesn't exist. To fix this, we need to test for the existence of the file $msgfile before trying to save it. The code for this is a bit unwieldy to put in a single command string, so we'll use a function instead:

function cleanup {
    if [[ -e $msgfile ]]; then
        mv $msgfile dead.letter
    fi
    exit
}

trap cleanup INT TERM

msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address ...
rm $msgfile

8.4.3. Ignoring Signals

Sometimes a signal comes in that you don't want to do anything about. If you give the null string ("" or '') as the command argument to trap, the shell effectively ignores that signal. The classic example of a signal you may want to ignore is HUP (hangup), the signal all of your background processes receive when you log out. (If your line actually drops, Unix sends the HUP signal to the shell. The shell forwards the signal to all your background processes, or sends it on its own initiative if you logout normally.)

HUP has the usual default behavior: it kills the process that receives it. But there are bound to be times when you don't want a background job to terminate when you log out. For example, you may start a long compile or text formatting job; you want to log out and come back later when you expect the job to be finished. Under normal circumstances, your background job terminates when you log out. But if you run it in a shell environment where the HUP signal is ignored, the job finishes.

To do this, you could write a simple function that looks like this:

function ignorehup {
    trap "" HUP
    eval "$@"
}

We write this as a function instead of a script for reasons that will become clearer when we look in detail at subshells at the end of this chapter.

Actually, there is a Unix command called nohup that does precisely this. The start function from the last chapter could include nohup:

function start {
    eval nohup "$@" > logfile 2>&1 &
}

This prevents HUP from terminating your command and saves its standard and error output in a file. Actually, the following is just as good:

function start {
    nohup "$@" > logfile 2>&1 &
}

If you understand why eval is essentially redundant when you use nohup in this case, then you have a firm grasp on the material in Chapter 7.

8.4.4. Resetting Traps

Another "special case" of the trap command occurs when you give a dash (-) as the command argument. This resets the action taken when the signal is received to the default, which usually is termination of the process.

As an example of this, let's return to Task 8-2, our mail program. After the user has finished sending the message, the temporary file is erased. At that point, since there is no longer any need to "clean up," we can reset the signal trap to its default state. The code for this, apart from function definitions, is:

trap cleanup INT TERM

msgfile=/tmp/msg$$
cat > $msgfile
# send the contents of $msgfile to the specified mail address ...
rm $msgfile

trap - INT TERM

The last line of this code resets the handlers for the INT and TERM signals.

At this point you may be thinking that one could get seriously carried away with signal handling in a shell script. It is true that industrial strength programs devote considerable amounts of code to dealing with signals. But these programs are almost always large enough so that the signal-handling code is a tiny fraction of the whole thing. For example, you can bet that the real Unix mail system is pretty darn bullet-proof.

However, you will probably never write a shell script that is complex enough, and that needs to be robust enough, to merit lots of signal handling. You may write a prototype for a program as large as mail in shell code, but prototypes by definition do not need to be bullet-proofed.

Therefore, you shouldn't worry about putting signal-handling code in every 20-line shell script you write. Our advice is to determine if there are any situations in which a signal could cause your program to do something seriously bad and add code to deal with those contingencies. What is "seriously bad"? Well, with respect to the above examples, we'd say that the case where HUP causes your job to terminate on logout is seriously bad, while the temporary file situation in our mail program is not.

The Korn shell has several new options to trap (with respect to the same command in most Bourne shells) that make it useful as an aid for debugging shell scripts. We cover them in Chapter 9.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.