Unix Power ToolsUnix Power ToolsSearch this book

38.9. On-Demand Incremental Backups of a Project

As I was working on this book, I was constantly editing lots of random files all through a directory tree. I archived some of the files in a revision control system (Section 39.4), but those archives, as well as the nonarchived files, still would be vulnerable if my disk crashed. (And naturally, close to a deadline, one hard disk started making whining noises...)

The answer I came up with was easy to use and simple to set up. It's a script named ptbk, and this article explains it. To run the script, I just type its name. It searches my directory tree for files that have been modified since the last time I ran ptbk. Those files are copied into a dated compressed tar archive and copied to a remote system using scp. The process looks like this:

$ ptbk
upt/upt3_changes.html
upt/BOOKFILES
upt/art/0548.sgm
upt/art/1420.sgm
upt/art/1430.sgm
upt/art/0524.sgm
upt/BOOKIDS
upt/ulpt3_table
Now copying this file to bserver:
-rw-rw-r--    1 jpeek    323740 Jan  3 23:08 /tmp/upt-200101032308.tgz
upt-200101032308.tgz     |     316 KB |  63.2 kB/s | ETA: 00:00:00 | 100%

The script actually doesn't copy all of the files in my directory tree. I've set up a tar exclude file that makes the script skip some files that don't need backing up. For instance, it skips any filename that starts with a comma (,). Here's the file, named ptbk.exclude:

upt/ptbk.exclude
upt/tarfiles
upt/gmatlogs
upt/drv-jpeek-jpeek.ps
upt/drv-jpeek.3l
upt/BOOKFILES~
upt/ch*.ps.gz
upt/ch*.ps
upt/,*
upt/art/,*

After the script makes the tar file, it touches a timestamp file named ptbk.last. The next time the script runs, it uses find -newer (Section 9.8) to get only the files that have been modified since the timestamp file was touched.

The script uses scp and ssh-agent to copy the archive without asking for a password. You could hack it to use another method. For instance, it could copy using rcp (Section 1.21) or simply copy the file to another system with cp via an NFS-mounted filesystem (Section 1.21).

This doesn't take the place of regular backups, if only because re-creating days' worth of work from the little individual archives would be tedious. But this system makes it painless to take snapshots, as often as I want, by typing a four-letter command. Here's the ptbk script:

|| Section 35.14, '...' Section 28.14

#!/bin/sh
# ptbk - back up latest UPT changes, scp to $remhost

dirbase=upt
dir=$HOME/$dirbase
timestamp=$dir/ptbk.last     # the last time this script was run
exclude=$dir/ptbk.exclude    # file with (wildcard) pathnames to skip
remhost=bserver              # hostname to copy the files to
remdir=tmp/upt_bak/.         # remote directory (relative to $HOME)
cd $dir/.. || exit           # Go to parent directory of $dir
datestr=`date '+%Y%m%d%H%M'`
outfile=/tmp/upt-$datestr.tgz

# Don't send vim recovery files (.*.swp):
tar czvlf $outfile -X $exclude \
     `find $dirbase -type f -newer $timestamp ! -name '.*.swp' -print`
mv -f $timestamp $dir/,ptbk.last
echo "Timestamp file for $0.  Don't modify." > $timestamp
echo "Now copying this file to $remhost:"
ls -l $outfile
scp $outfile ${remhost}:${remdir}

If the copy fails (because the remote machine is down, for instance), I have to either copy the archive somewhere else or wait and remember to copy the archive later. If you have an unreliable connection, you might want to modify the script to touch the timestamp file only if the copy succeeds -- at the possible cost of losing a data file that was modified while the previous archive was (not?) being transferred to the remote host.

-- JP



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.