Chapter 5: Introducing Perl

Perl Overview


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]


Perl has most of the standard C operators and control constructs, with the exception of address dereferencing (&var) and pointers (*var). It takes a similar approach to variables to the UNIX shells (Bourne, Korn, or Bash, to be precise); a simple variable name is indicated by a preceding "$" sign, and references to $variable are replaced by the value of $variable. The dollar symbol can be escaped (that is, its special meaning removed) by preceding it with a backslash.

Variables can contain strings (sequences of characters) or numbers. The two are interconverted freely, depending on the context in which they are encountered. Strings are identified in a program by surrounding them with double or single quote marks. For example:

$myvar = 6;

print $myvar;

assigns the numerical value 6 to the variable $myvar, then prints: 6

You can do the same with strings of text as well as numbers:

$myvar = "fred";print $myvar;

assigns the string fred to the variable $myvar, then prints:

fred

Strings can contain references to variables. If the string is enclosed in double-quotes, any variable names within the string are replaced by the contents of the variable. Single-quotes prevent vartiable substitution. For example:

$myvar = "fred";print "Hello, $myvar";print 'and hello, $myvar';

This produces: Hello, fred, and hello, $myvar

The C arithmetic and relational operators apply:

$variable = 4; # set $variable to 4
$variable++; # increment the value of $variable
if ($variable >= 5) { 
    print "Variable is \"$variable\"\n";
}
$variable -=7; # decrement the value of $variable
print "but now it's $variable\n";

This prints:

Variable is "5"
but now it's -2

Note the use of a preceding backslash to escape the special meaning of the double quote, so that it prints as a literal character. The "\n" symbol has the same meaning as in C; that is, a newline.

The lines containing a hash sign "#" preceded by whitespace are comments, and everything from the hash to the end of the line is ignored.

The flow of control operators should look very familiar to a C or awk programmer:

if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK 
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCKLABEL
while (EXPR) BLOCKLABEL 
while (EXPR) BLOCK continue BLOCKLABEL
for (EXPR; EXPR; EXPR) BLOCKLABEL 
foreach VAR (ARRAY) BLOCKLABEL BLOCK continue BLOCK

Where BLOCK is a set of commands enclosed in curly braces, and EXPR is some valid expression that can be evaluated to true (zero) or false (non-zero). Perl is more traditional than C in its willingness to use labels for blocks of code; "goto" is also available, although its usage is deprecated.

Perl provides several different ways to accomplish any task. For example:

open(INPUT,"<$myfile")|| die "Could not open $myfile!\n";

takes the file named by the variable $myfile, associates it with the filehandle INPUT, and attempts to open it for reading. (Note the input redirection operator, "<", in front of the filename; this specifies that data is going to be read from the file. If it was a ">", it would be open for writing; if it was "+>$myfile" if would be open for reading and writing.) If this statement fails, the subsequent command (die "Could not open $myfile!\n") is executed -- the logical-OR "||" operator is used to execute commands on the basis of the returned value of the preceding command. (Logical-AND, &&, is also available; the following command is executed only if the preceding command succeeded.)

This rather terse line is equivalent to:

if ( open(INPUT,"<$myfile") == 0) { 
    die "Could not open $myfile\n"; 
}

(a style which might be more familiar to C programmers) or:

$result = open(INPUT, "<$myfile"); 
if ($result == 0) { 
    die "Could not open $myfile\n"; 
}

(a style more familiar to Pascal programmers).

Perl provides a couple of built-in filehandles; STDIN, STDOUT, and STDERR, the standard input, output, and error streams. STDIN is read from by default, if no files are specified on the command line, and STDOUT is written to by default if no file handle is selected. You can read from a filehandle by assigning to a variable from <FILEHANDLE>, for example:

$line = <FILE>

-- this reads a line of text from FILE into the string $line, and advances the file handle FILE to the next line. By default, if no handle is specified the standard input is used; so it is common to see notation like:

$_ = (<>);

which means "read a line from standard input into the variable $_".

The built-in variable called "$_" stores the result of the most recent operation. When you read from a file handle using a loop, $_ is set in sequence to the value of each line in a file;. Consequently the explicit reference to $_ can often be omitted. Perl exhibits a lot of default behaviour; for example:

while (<>) { 
    print ; 
}

Because no file handle or input variable are specified, perl reads from the standard input and for while it contains something, reads it into $_; by default, the print command prints $_, so this filter simply prints the standard input to the standard output.

Perl can handle three data types. These are:

A scalar is a simple variable, of the kind we have already seen; it holds a string or a number. An array consists of a list of scalars, which are grouped under a common name. (We'll deal with associative arrays later.)

Individual scalars in an array can be identified by their position in the array. For example:

$fred[4]

refers to item 4 in the array @fred.

Note that $fred[4] and @fred[4] both refer to item 4 in fred, but do so in different ways. The "$" prefix means that $fred[4] is a scalar, the fourth item in the array. The "@" prefix means that @fred[4] is an array "slice", a chunk of @fred containing a single element. In general, Perl lets you freely transfer data between arrays and scalars, but you've got to keep track of the context in which you refer to the variables. You can't directly change arrays, scalars, and associative arrays. For example, although it is perfectly legitimate to assign one scalar to another, or one array to another, if you assign an array to a scalar the scalar ends up holding the not the contents of the array, but the size of the array :

$fred = "hello";
$joe = $fred;
print $joe;
@fred = ("Fred","Joe", "Margaret", "Hilda");
$joe = @fred;
print $joe;

will result in:

hello4

Note that by default, Perl numbers array elements from 0, so although there are three items in @fred, the items are numbered 0 .. 2. (If necessary, you can change the array base.) You can refer to a slice, or range of elements in an array, using the elipsis:

foreach (@fred[1..3]) { 
    print "$_ ";
}
Fred Joe Margaret

To convert an array into a scalar, use the join() function:

$joe = join(" ",@fred);
print $joe;
Fred Joe Margaret

join() takes two arguments; a spacer string to interpolate between array elements, and an array. It converts the array to a string, with the spacer string between each item. (Otherwise the elements in the array are all concatenated.)

The converse function is called split(); this takes a pattern and a target string, and splits the string into array elements wherever the pattern occurs:

@fred = split(":","Fred:Joe:Margaret");
foreach (@fred) { 
    print "$_ ";
}

prints Fred Joe Margaret

You can grow arrays dynamically; the subscript of the highest element in an array @array is represented by the variable $#array, and the index of the base element of an array is set by the global variable $[ (which is normally 0 but can be 1, or can be reset to anything you like). $array[n] is the nth element of @array. You can refer to a range of variables in an array; for example:

@array2 = @array1[1 .. 4]

assigns elements 1 through 4 of @array1 to @array2.

And:

@array2 = (@array1[($#array1-4) .. $#array1]);

assigns the last five elements of @array1 to @array2.

For a really useful shorthand:

open (INPUT,"<$file") || die "couldn't digest $file\n";
@file = (<INPUT>);

reads everything from the file named $file into the array @file, one line per array entry.

Much of the beauty of Perl lies in its brevity and flexibility. Both solutions are valid -- but there is a short, elegant one, and a long one (that may be more appropriate under some circumstances). Perl is not a


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]