Perl Boot Camp, Part 1: Typical Script Anatomy (Unix Power Tools, 3rd Edition)

41.4. Perl Boot Camp, Part 1: Typical Script Anatomy

It is impossible to present a complete guide to programming Perl in this one small section, but you can glean enough information here to be able to modify existing Perl scripts and evaluate whether you'd like to learn more about this incredibly handy language.

Perl scripts bare a passing resemblence to Bourne shell scripts. Example 41-1 a script called writewav.pl that comes with the Perl module Audio::SoundFile. It converts a given sound file into WAV format. The details of what it's doing aren't important, but it does demonstrate some common Perl structures that you should understand at a high level.

Example 41-1. A sample Perl script

#!/usr/bin/perl -w

=head1 NAME

 writewav - Converts any sound file into .wav format

=cut

use Audio::SoundFile;
use Audio::SoundFile::Header;

my ($buffer, $length, $header, $reader, $writer);
my $BUFFSIZE = 16384;
my $ifile = shift @ARGV || usage( );
my $ofile = shift @ARGV || usage( );

$reader = Audio::SoundFile::Reader->new($ifile, \$header);
$header->{format} = SF_FORMAT_WAV | SF_FORMAT_PCM;
$writer = Audio::SoundFile::Writer->new($ofile,  $header);

while ($length = $reader->bread_pdl(\$buffer, $BUFFSIZE)) {
    $writer->bwrite_pdl($buffer);
}

$reader->close;
$writer->close;

sub usage {
  print "usage: $0 <infile> <outfile>\n";
  exit(1);
}

The first line of Example 41-1 should be familiar to shell hackers; it's the shebang line. When the first two bytes of a file are the characters #!, the shell uses the rest of that file's first line to determine which program should be used to interpret the rest of the file. In this case, the path to the Perl interpreter is given. Command line arguments can be given to the interpreter. Here -w instructs Perl to print warning messages when it finds code that is likely to be incorrect. This includes such common gaffes as trying to write to a read-only file handle, subroutines that recurse more than 100 times, and attempts to get the value of a scalar variable that hasn't been assigned a value yet. This flag is a new Perl programmer's best friend and should be used in all programs.

All lines that start with = in the left margin are part of Perl's Plain Old Documentation (POD) system. Everything between the directives =head1 and =cut are documentation and do not affect how the script runs. There are Perl tools like pod2text and pod2man that will format the POD found in a script into the particular output format given in the command's name. There's even a pod2man program used during the Perl installation procedure that creates all the Perl manpages on the target system.

The next two lines begin with actual Perl code. To use Perl library files called modules (Section 41.10), scripts invoke the use module statement. Perl searches the paths listed in the global variable @INC (Section 41.2) for these modules, which typically have the extension .pm.

In Perl, variables don't need to be declared before being used. Although this behavior is convenient for small scripts, larger scripts can benefit from the disciplined approach of declaring variables. Perl 5 -- that is, Perl revision 5 -- introduced the my operator as a way of declaring a variable. Declaring variables allows the -w flag to help catch misspelled variable names, which are a common source of bugs in Perl scripts.

A variable that holds a single value is called a scalar and is always prefixed with a $ (even in assignments), unlike variables in the Bourne shell. The = is the assignment operator (when it's not appearing as a POD directive). Another kind of variable, called an array, can be used to hold many scalar values. Array variables begin with @. One example of a global array variable is @ARGV, which holds the list of command-line arguments passed into the Perl script.

Continuing with Example 41-1, the two variables $ifile and $ofile get values from the command line. The shift operator removes values from the beginning of the @ARGV array. If there aren't enough values on the command line, the user defined subroutine usage( ) is called.

Perl supports object oriented programming (OOP). The hallmark of OOP is that both the data and the subroutines (called methods in OOP jargon) for processing that data are accessed through an object. In traditional procedural programming, data structures are stored separately from functions that manipulate them. Fortunately, using object oriented Perl modules is often straightforward. In Example 41-1, the scalar $reader is a new Audio::SoundFile::Reader object. Unlike other OOP languages, Perl's objects are not opaque: the user can set or get values internal to the object. This is what is happening on the next line. The -> dereferencing operator is used both to get at values that are pointed to by references (Section 41.5.4) and to make method calls. Here, the key format is set to a value that is created by the bitwise or of the values returned by the subroutines SF_FORMAT_WAV and SF_FORMAT_PCM. Another object, $writer, is created on the following line.

The heart of the program is the while loop which, in English, reads, "While reading more chunks of the source file, translate that chunk into WAV data and write it to the outfile." When the loop finishes, those objects are no longer needed, so the close( ) method is called on each of them to release any resources used by those objects. This is the end of the program's execution, but there's a bit more to this script.

Perl allows for user defined subroutines. Although they can be anywhere in the file, subroutine definitions typically come after the main block of code. Here, a subroutine called usage( ) is defined that simply prints some help to the user and quits. Inside of double quoted strings, Perl interpolates scalar and array values. This is a fancy way of saying that Perl replaces variables with their values. Because Perl tries to do the right thing with interpolation, there may be occasions when Perl's rules surprise you. Take a look at the perldata manpage for the definitive rules governing variable interpolation and a peek at the perltrap manpage for common interpolation mistakes. You can prevent interpolation by putting a backslash in front of the variable name (e.g. \$foo is $foo) or use single quotes, which never interpolate variables. Finally, the exit(1) function halts the script before the subroutine can return to the caller and returns the value 1 to the operating system.

That's the 50,000-foot view of a Perl script. To confidently modify existing Perl scripts, it is necessary to understand some of the basic components of Perl better.

-- JJ


41.3. Compiling Perl from Scratch		41.5. Perl Boot Camp, Part 2: Variables and Data Types