The Perl Programming Language


[ Site Index] [ Attic Index] [ Perl/Tk Index] [ Feedback ]


[ Intro ] [ Chap 1 ] [ Chap 2 ] [ Chap 3 ] [ Chap 4 ]


The Perl Programming Language

In the preceding two chapters we've examined the basic language structure of Perl, its components, and the powerful pattern matching subsystem that makes it so useful for text processing.

However, we've still only scratched the surface in our exploration of the language. In this chapter we will examine Perl's flow-of-control structures, look more closely at variables and context, and see some of the different sets of built-in functions that Perl provides.


Variables, context, and program structure

Most programming languages differentiate between variables -- containers for information -- on the basis of the type of information that can be stored in them. Variables can only store a given type of data: for example, floating point numbers, integers representable by two bytes, characters, or strings of characters. It is usually difficult to interchange data of different types; a function which operates on an integer is unlikely to take kindly to being fed a floating point number, and so on.

Perl is different. Perl has a single catch-all container for all unitary types of data: the scalar. A scalar can be of any necessary length and can contain arbitrary binary data. You can sometimes force Perl to treat a scalar as if it contains a specific type of data, but usually this isn't necessary: and operations which wouldn't make sense in a conventional strongly-typed language can be applied to scalars with some hope of 'the right thing' happening. (For example, adding 3 to ``apples''.)

Perl is more interested in the context a variable is used in than in what it contains. While most Perl operators, functions, and subroutines will try to munch away on some data regardless of whether it is numeric, integer, string, or blank verse, they tend to get very fussy if they expect a single scalar and receive an array or list, or expect an array and receive a scalar.

Program structure in Perl is somewhat free form by conventional standards. At the minimal level, a Perl program consists of one or more statements in the Perl language.

A simple statement is defined as 'an expression evaluated for its side-effects'. (For example, a simple variable assignment or a function call are expressions that, when executed, have some affect on the state of the running program.) You can use curly braces ({}) to group statements into compound statements -- blocks of code which are in turn treated as statements by the simple flow-of-control modifiers if, while, until, and unless (and the dynamic variable scoping system, which we'll meet later). For example:

  {
      print "I am alone!\n";
      do_something();
    :
  } unless ($I_am_with_someone);

which is functionally equivalent to:

  if (! $I_am_with_someone) {
      print "I am alone!\n";
      do_something();
  }

(Note the exclamation-mark in the test expression; this is the logical-NOT operator. If $I_am_with_someone is defined, it evaluates to TRUE: the ! negates this.)

The simple flow-of-control modifiers if, while, until, and unless are critical; they let you make the execution of a block of code contingent on the result of evaluating a different block of code.

We can loop using the while and until modifiers:

  $i = 1;
  while ($i <10) {
     do_something();
     $i++;
  }

or:

  $i = 1;
  do {
      do_something();
      $i++;
  } until ($i >10);

Note that do {} is basically a command to execute the following block of code. (In early releases of Perl -- prior to 4.0 -- this was the main way of structuring code; it is now deprecated except when used in the contect of an until() condition.)

Blocks of code can be executed under the control of some kind of condition. First, the if condition:

  if ($my_value == 7) {
      print "I am 7!";
  } elsif ($my_value > 7) {
      print "I am over 7!";
  } else {
      print "I am under 7!";
  }

if() requires a list as a parameter; if this evaluates as true, the following block is executed. elsif() indicates a subsidiary if() statement, to be executed if the first was false; else introduces a block to be executed if the previous condition failed.

The list parameter can contain Perl expressions. For example:

  if (! open(MYFILE, "<myfile")) {
      print "Failed to open file myfile\n";
  }

The expression open(MYFILE, "<myfile") is executed. The result is negated, and used by if(); because open() only returns FALSE if it fails, this if() statement only triggers the associated block if the open() failed.

Note that this isn't the usual Perl way of testing an expression for success (even though it strongly smells of C): instead, the logical-OR operator || is frequently used as a kind of pseudo-if construct. For example:

  statement1 || statement2

Means 'execute statement1, then, if it evaluated to true, execute statement2'. To show how this is used:

  open(MYFILE, "<myfile") || die "Failed to open file myfile\n";

This is a side-effect of Perl's linearity; statements in Perl are executed one after the other, left to right, along the line. Use of a logical-OR (or another boolean operator) is used here to implicitly split the line into two statements, one of which is dependent on the other. Keep an eye on the precedence of the boolean operators you use, though. Perl 5's English-language ``or'' and ``and'' operators have a much lower precedence than || and &&; if you replace a || with an ``or'' after a statement that expects a list argument, you'd better add some brackets -- or you may be in for trouble.

The semi-colon ``;'' is used in Perl to end a statement; the comma ``,'' is used in Perl to simply indicate that one item is followed by another. It's used as a separator in lists of items, and sometimes as a separator between statements (although the latter use is probably not a good idea under most circumstances because it has low associativity; it may not do what you expect it to do, unlike the semi-colon). For example, we can use a comma in a list:

  @colors = ("red", "blue", "green", "octarine");

or between statements:

  $spam = "tinned meat", $color = "pink";

Blocks can also be executed conditionally under the control of complex flow-of-control constructs; for, foreach, while, and until. (Not to mention goto -- although this latter is considered unmentionable by most programmers.)

Blocks provide a context within which various Perlish magic takes place -- you can have variables which are localized within a block, and various flow of control operators that operate relative to the current block. We'll see how blocks affect variable scope later (when we deal with context.)

The flow-of-control primitives aren't the only ways of structuring Perl programs, luckily (otherwise large programs would be unreadable and gnarly).

First, the foreach loop. This takes a scalar and a list as parameters. For each item in the list, the scalar is set to the item's value and the block of code is then executed. For example:

  foreach $color ("red" "blue" "green") {
      print "My color is $color\n";
  } 

or:

  $i = 0;
  foreach $hashkey (sort ( keys(%database))) {
     print "$hashkey -> ", $database{$hashkey}, "\n";
     $i++;
  }
  print "Table contains $i elements\n"

This latter example isn't as complicated as it looks. Instead of a straight list or array, the 'list' here consists of two nested functions, each of which returns a list. keys() returns a list of all the keys to a hash; sort() in its simplest form sorts a list, returning another list. Thus, the perl statements in brackets evaluate to a list, which is what foreach() expects to see.

Note that when executing a foreach() loop, there's no way to tell where in the target list you are. That's what $i is for in the example above; it's a counter.

But foreach is a little bit magical. You can also say:

  foreach (@mylist) {
   :
  # Perl statements here
   :
  }

If you omit the scalar, foreach() asigns each element of its parameter list (in this case, the array @mylist) in turn to the special variable $_. $_ is a scalar, like any other; you can do just about anything with it that you'd do with any other scalar variable. However it's unique in that it's used as the repository for operations that return a scalar, when no variable is explicitly named. (If you've seen languages like AppleScript, it's equivalent to ``it'' -- the nameless current variable. In fact, if you get into the habit of pronouncing ``$_'' as ``it'', you'll be on the right track.)

There's more context-sensitive magic associated with foreach(). Suppose you do something like this:

  @numbers = qw(1 2 3 4 5 6 7 8 9);

  # qw() is short for "quote word" -- everything in the brackets is 
  # interpreted as being enclosed in double quotes

  foreach $val (@numbers) {
     $val += 1;
  }

We defined an array containing the numbers 1 .. 9. We then iterated through the array, setting $val to each element in turn, and adding one to it.

At the end of this process, we've actually changed the values in the array, by incrementing them all. This is because when we say the scalar variable in the foreach() loop is set to the value of each array element in turn, we're actually using it as an alias for the array element -- changes to $val ``write through'' and modify the element of @numbers.

NOTE: A parenthetical diversion
Apropos this last example, some readers might wonder why I bothered using qw() to define the array. After all, doesn't this tell Perl implicitly that the elements of the array are all strings? Well, yes: and we then act on the strings in a numerical context. This is an important point for programmers used to C or similar languages -- scalars are not integers, char, float, or any other data type you're familiar with; they're scalars. If you're really into this sort of thing, they're based on a C data structure called an SV, or scalar value. But what Perl does with an SV has nothing whatsoever to do with what C would do with a string or integer or whatever.

To continue the parenthetical discourse a bit further, some other readers might be wondering why I didn't say:

  @numbers = 1..9;

The answer is: I haven't introduced the ``..'' operator yet -- or indeed, most of Perl's rich bestiary of operators. ``..'' is full of deep magic; what it does is wholly context dependent, as you'll see if you try to figure out what ``joe'' .. ``camel'' means to Perl. (Clue: don't try this on a machine with less than six terabytes of free memory.)

Final observation: if you meditate upon (``joe'' .. ``camel'') and what it evaluates to for long enough, you will understand the true meaning of magic, within the Perlish context. At that point either you will achieve enlightenment, or you will close this book and walk away in disgust, to return to the linguistically pure fields of Pascal and Prolog. Clue: like the dromedary, Perl ain't pretty -- but it works. (Unless you try to make it smoke.

The for() loop is a kind of mutant foreach(); more precisely, the foreach loop is a special case of for(). Like the C language for() loop, it expects three parameters: a start condition, a stop condition, and something to do every time it executes. For example:

  for ($bananas = 1; $bananas < 8; $bananas++) {
      print "I have $bananas bananas!\n";
  }

$bananas = 1 is the initial condition. While the second (stop) condition is true, the third condition ($bananas++) is executed, then the block associated with the loop is executed. Thus, when $bananas exceeds 7, the loop is no longer executed.

In Perl, all of these conditions are optional. It's quite valid to say something like:

  for (; ; ) {

  }

In which case the block of code is executed permanently (for as all Zen adepts and fans of Zeno know, nothing is ever true).

You can also do weird and whacky things with a for() loop:

  for (seek(DATAFILE, $offset, 0); 
       read(DATAFILE, $cursor, $rec_len);
       seek(DATAFILE, $offset, 1)) { 
       : 
       # do something abstruse with contents of $cursor
       :
  }

Here we scan thorugh a file, skipping forward $offset bytes from the beginning, then reading $rec_len bytes into $cursor, executing the loop body, then skipping forward another $offset bytes (and so on). Weird, but possibly useful if you want to extract one specific field from a file containing fixed-length records -- although there are other, less obscure, ways of doing this sort of thing. (At this point it's worth emphasizing again that in Perl, just because something is possible, it does not necessarily follow that it is desirable, sensible, or even sane. Readability is usually more important than raw performance: computers get upgraded every year, but human brains remain the same, and if you have to maintain software there's nothing quite as frustrating as having to grapple with someone else's left-over entry from last year's Obfuscated Perl contest. for() loops with eldritch conditions attached probably don't contribute to your coding clarity.)


Sharp operators

Perl, like most computer languages, is swarming with operators. Operators are primitives which are used to build statements; we've already seen quite a few of them, like =~ (the pattern binding operator) or += (the addition operator) or > (the greater-than operator).

Operators have two important properties: associativity and precedence. Get these wrong and they won't do what you expect them to. They also have a third property, without which they won't work at all: number of required parameters. (In general, operators have one, two, or three parameters and they're referred to accordingly: as unary operators, which work on a single parameter; binary operators, which need two parameters; and ternary operators, which expect three parameters.)

Let's examine associativity first. Associativity is one of those algebraic concepts that have been looted by computer scientists and ruthlessly modified. In mathematics, it's a way of expressing the idea that brackets in some operations can be moved:

  A * B * C  = (A * B) * C = A * (B * C)

In Perl, associativity takes on a subtly different meaning; it indicates the direction to which an operator points. For example, the += operator is right-associative. It assumes that you want to grab whatever is to the right of it. For example, $i += 1 grabs the ``1'' and adds it to $i. If you have a more complex term on the right, you may need to bracket it explicitly to tell the += operator that you want to grab the whole thing, rather than just the first element of it. For example:

  $my_policy = $guns || &butter();

Doesn't necessarily mean what you think it means. The logical-OR operator ``||'' has a higher precedence than the assignment operator ``=''. (Precedence is a weighting the language associates with operators, to help figure out which is evaluated first when parsing a complex expression.) So first $guns is assigned to $my_policy, then if the assignment evaluated to true, $butter is evaluated. Did you want this statement to evaluate to $butter every time, or only when $guns was undefined? If the latter, you need to bracket it:

  $my_policy = ( $guns || $butter);

A trivial secret: when in doubt, always use brackets. It makes it easier to see what's going on -- both for you, and for Perl. Even if you think you can get away with saying $bread + $butter * $marmalade, it's probably best to disambiguate it by saying $bread + ($butter * $marmalade). Which is what you thought Perl thought you meant to begin with, isn't it?

A full list of the Perl operators, along with their precedence and associativity, is found on the perlop manual page. In general, these are a superset of the operators you get with C; the main exceptions are the * and & operators from C (to do with dereferencing pointers and getting the address of a variable), which simply aren't relevent to Perl. Be warned, however, that Perl is magical. When you call a function, you use brackets. When you indicate that a collection of scalars or strings are part of a list, you use brackets. When you try to clarify expressions involving operators, you use brackets. Perl is smart about brackets; in general, if something looks like a function it treats it as such, and if it doesn't look like a function it tries to figure out what it is. Perl's interpretation of brackets is thus complex, and although it ``feels'' right (especially if you're used to C) trying to define it in words of one syllable is not a pleasant exercise.

Read the perlop manual page. Otherwise you will be confused. (Even if you do read the manual page you'll be confused, but at least you'll understand why it's confusing.)

Another hint: there is a Perl 5 desktop reference available. All the operators and functions are grouped by category in an easy-to-find format. It's invaluable.

Perl operators

Operator Meaning Example
** exponentiation $i**
+ addition $a + $b
- subtraction $a - $b
* multiplication $a * $b
/ division $a / $b
% modulo $a % $b
& bitwise-AND $a & $b
| bitwise-OR $a | $b
^ bitwise-XOR $a ^ $b
>> bitwise-shift right $a >> 4
<< bitwise-shift left $a << 4
|| logical-OR make_love() || die;
&& logical-AND eat() && sleep();
. string concatenation $adjective . $noun
x string multiplier " " x 10
-> dereference $record->{fieldname}
\ reference $hashref = \%hash
! logical-NOT !($true)
~ bitwise-complement $foo~
++ autoincrement (post, pre) $a++, ++$a
-- autodecrement (post, pre) $a--, --$a
== numeric equivalence test $a == $b
!= numeric inequality test $a != $b
eq string equivalence $a eq "aardvark"
ne string inequality $bachtrian ne "camel""
gt string greater-than $book gt "War and Peace"
lt string less-than $short_string lt "hi"
le string less-than-or-equals $a le "ab"
ge string greater-than-or-equals $b ge "frobnicate"
cmp string compare (returns three possible values) $a cmp $b
< numeric less-than $a < $b
> numeric greater-than $a > $b
>= numeric greater-than-or-equals $a >= $b
<= numeric less-than-or-equals $a <= $b
<=> numeric compare $a <=> $b
=~ pattern binding $a =~ /camel.*/
!~ negated pattern binding $a !~ /dromedary/
.. range operator 1 .. 9
?: alternation $a == 1 ? print "$a = 1\n" : print "$a != 1\n";
, comma/list item separator $a, $b ...
not low-precedence logical-NOT $a not $b
and low-precedence logical-AND $a and $b
or low-precedence logical-OR $a or $b
xor low-precedence logical-XOR $a xor $b


Subroutines

So far, we've been behaving as if it's possible to cram the entire functionality of a Perl program into a single series of expressions and blocks. However, this isn't a very sensible way of structuring a program. It doesn't encourage re-use of common chunks of code, and it can make a large program had to understand.

Perl (like all modern languages) allows you to encapsulate frequently used chunks of code in subroutines. A subroutine is a mini-programmette, containing a block of code, that can be referred to by name. You declare a subroutine like this:

  sub mysub {
     :

  } # end of mysub

And you invoke it like this:

  &mysub($arg1, $arg2);

(The leading ampersand ``&'' is optional, but helps to clarify the issue; Perl has some odd ideas about what to do with bare, unqualified words occuring in a program, and the ampersand tells it that the following name is a subroutine.)

It's pretty important not to give a subroutine the same name as some existing Perl function or built-in variable -- otherwise you may discover some new, interesting, and obscure features of the language that are not quite what you wanted.

You can pass arguments to a subroutine, just like a built-in function. You specify them (as above) in brackets; from inside the subroutine, they look like a simple list, which is stored in the special variable @_.

So ...

  $foo = "hello";
  $bar = "world";
  @quux = qw(have a nice day);

  sub print_args {
      foreach (@_) {
          print "$$_ ... ";
      } 
  }

  &print_args($foo, $bar, @quux);

will print:

  hello .... world ... have ... a ... nice ... day

It's important to note that the array and the scalars you passed to this subroutine were all munged in together into the single array @_. You can pass multiple arrays to a subroutine, but they'll be jammed into @_ together. The same goes for hashes; as a hash looks like an unordered list of alternating key/value items in a list context, a hash you pass to a subroutine will end up being jammed into @_. One size fits all, on the receiving end. There're two ways round this, using typeglobs and references, that we'll see later. For now, just remember: parameters to a subroutine end up in @_ in the order you pass them.

Subroutines don't have 'local' variables in quite the same sense as a traditional structured language like C or Pascal. Originally (in the dim and distant pre-history of Perl) Perl had only global variables -- a variable would be visible anywhere in a program. This was unsatisfactory, so a facility was added to tag a variable as being 'local' to a subroutine or package (collection of subroutines glommed together under one name). You declare a variable as being local to a function like this:

  sub my_func {
      local ($my_func_var);
      :
      :
  }

If you have a variable of the same name in use outside of &my_func(), its value is maintained entirely independently of $my_func_var within &my_func().

For example:

  sub change_num {
      local ($number);
      $number =6;
      print "$number\n";
  }

  $number = 4;
  print "$number\n";
  &change_num;
  print "$number\n";

Prints:

  4
  6
  4

Perl does this by saving the value of a local variable outside the current symbol table, and restoring it when the program exits the block to which the variable is localized.

A symbol table is essentially where Perl keeps track of variables. You can think of a symbol table as being a special kind of hash; for a given scalar, the key is its' name, while the value stored in the symbol table is whatever the scalar is supposed to contain. (The same thing goes for arrays and other hashes.)

Diagram: two symbol tables coexisting in one perl interpreter

When we initialize $number (with the value 4), we are creating a slot in the symbol table; the key is 'number', and it contains '4'.

As the program runs, it hits the call to &change_num(). The first instruction in &change_num() is 'local($number)'. Perl saves the current (global) value of $number in a safe location until it exits the enclosing block (by leaving &change_num()). At this point, it throws away whatever was stored in $number and restores the old value.

Thus, while you're in a block within which a variable is declared to be local, you can't get at its global value (as visible to the rest of the program).

Save/Restore of local variables

This mechanism for keeping variables local to subroutines is called dynamic scoping; Perl has to dynamically save the value of the variable. It's not very fast, and it has drawbacks (which we'll see later), but it was the standard mechanism for variable scope control in Perl 4 and it's still used for some purposes. The important point to note is that the contents of a local variable are lost when Perl leaves the block in which the variable is defined.

(The 'Perl 5 way' of dealing with scope is to use my(), which is described later.)

Now we can begin to see how Perl handles functions. For example, here's a simple function to compute the factorial of a number:

  $n = 3;
  sub fact {
  
     local ($f) = shift @_;
     local ($fact) = $f;
     $fact--;
  
     print "fact(): running $f * fact($fact)\n";
     ($fact > 0) ? return ($f * fact($fact)) : return $f;
  }

  print "The factorial of $n is ", &fact($n), "\n";

Which gives us:

  fact(): running 3 * fact(2)
  fact(): running 2 * fact(1)
  fact(): running 1 * fact(0)
  The factorial of 3 is 6

Some explanations are required.

shift() takes the first element from a list and returns it, shortening the list by one. It's comminly used for getting at the first item in a parameter list passed to a subroutine.

The print statement is included for clarity's sake; it demonstrates that we're called fact() recursively. Every time we invoke fact() we take the current value (say, 'n'), and multiply it by fact(n - 1), while n > 0. If 'n' is no longer greater than zero, we return the number fact() was invoked with.

The variables $f and $fact are declared as local each time fact() is reinvoked. If they weren't local, this subroutine couldn't work; it would end up returning 1 (because it is perpetually decrementing the variable $fact then calling itself again).

A Perl subroutine returns either the result of the last expression in the subroutine, or a value specified by the command 'return'.

Note that there are several things wrong with fact(), as defined here. For one thing, it would be better to use my(), as we'll see later. For another thing, it makes no checks to ensure that the value you pass it is sensible -- meaning, an integer number as opposed to an imaginary number, a file handle, or a section of blank verse. For a third thing, it uses recursion freely. Perl can often warn you if you're in danger of running into a runaway recursive loop, but it's generally better practice to unroll recursive functions where possible by using a stack; we'll discuss recursion at greater length in the next chapter. Suffice to say, all this subroutine demonstrates is that Perl can handle recursion, and local() works as advertised.


More about context

The biggest stumbling block for experienced programmers meeting Perl for the first time is variable context.

Most programming languages are concerned with the contents of variables; Perl is concerned with their plurality.

Most Perl built-in functions and operators behave differently depending on whether you're attempting to apply them to a scalar or a list (including an array). (Hashes are treated essentially as a special case of arrays.) Just as the common programming errors in C include trying to assign to the wrong variable type, in Perl the commonest error is calling a function in the wrong context.

For example, take grep() (as described in the last chapter). You use grep to iterate through an array, applying an expression to each element, and storing the results (for which the expression was true) in an output array. But you can also use grep in a scalar context -- in which case, it returns the number of matches the expression found. To illustrate:

  @days = qw(monday tuesday wednesday thursday friday saturday sunday);
  @found = grep(!/s+/, @days); # look for days with no 's' in them
  print join(" ", @found), "\n";
  $found = grep(!/s+/, @days); # same, in scalar context
  print $found, "\n";

The first print statement will print:

  monday friday

The second print statement will print:

  2

The problem, of course, is that context errors tend to propagate in Perl programs. Calling grep() in scalar context rather than array context doesn't cause a run-time exception -- it just delivers the wrong answer. Consider the following assignments:

  @new_array = ( $found );

  @new_array = ( @found );

The brackets force a list context, so in both cases @new_array is really an array.

Put this together with Perl's operator-overloading (based on context rather than data type) and you have a potent recipe for obscure programmatic bugs.


Getting context right using Prototypes

There are a couple of tools you can use to help ensure you've got the correct context, of course. Firstly, there's the built-in function wantarray. wantarray is a bit of magic; it returns true if the currently executing subroutine is expected to return a list, and false if it is expected to return a scalar. So you could write a 'safe' grep, which always returns a list of the matches it finds -- either as a real list, or as a scalar containing strings separated by spaces:

  sub safe_grep {
     my ($expr) = shift @_;
     my (@array) = @_;
     my (@result) = grep(/$expr/, @array);
     if (wantarray) {
         return @result;
     } else {
         return join(" ", @result); 
     }
  }

Note the use of my() to indicate the local variables. We'll see the reason for using my() later. Note also that this could be improved considerably by using some advanced techniques.

We've covered the issue of a function returning the right data in context, but what about controlling the other end of things -- the input it expects? It would help if we could ensure that our subroutine complains if it isn't supplied with parameters matching some pre-specified template (or prototype, to use the C language terminology which keeps rearing its ugly head). In most sane high-level languages you can declare a subroutine like this:

  Function SafeGrep (Expr : string;
                     Array: array [80] of string) : string;
  BEGIN
      { Pascal version  -- or rather, Pascal with strings }
      :
  END;

or:

  char *  safe_grep (*char expr, **char array) {
      /* ANSI C version */
      :
  }

Prototypes (the data definition in each function declaration) provide a standard interface to the functions, enabling the compiler to throw out badly-formed function calls at compile time.

Perl 5.003 and later has a prototyping facility. It's not mandatory -- in fact, it's not even commonly used -- but it's good style and it helps trap bugs, so we'll try and use it from now on. You use it like this:

  sub safe_grep($@) {
      :
  }

The brackets () after the subroutine name contain the prototype string. The prototype string is a set of symbols which indicate the required parameters of the subroutine. A '$' symbol means ``expect a scalar at this point''; an '@' sign means ``expect a list or array here'', and a '%' sign means ``expect a hash''. There are some other symbols you can use; '*' for a typeglob, '&' to pass an anonymous subroutine, and '\' prefixing any of the other arguments to indicate a reference to the data type. (Don't worry if these don't mean anything to you yet.)

You can include optional arguments in a subroutine prototype; they're separated from the mandatory arguments by a semi-colon. So a subroutine declaration like:

  sub my_complex_sub($;@)

means that my_complex_sub expects a scalar argument, and an optional list.


Passing complex arguments

Perl isn't obsessive about distinguishing between arcane parameter-passing modes such as call-by-value and call-by-reference, and it doesn't worry about formal and actual parameters. (If these terms don't mean anything to you, don't worry.) It's designed to make it easy to pass variable-sized lists of data into a subroutine, and to return a variable-length list of values.

This has its drawbacks. A Perl subroutine isn't insulated from the outside world. Unless it declares all its variables to be local, other subroutines can (in principle) reach in and twiddle with its guts. And seeing the outside world only in terms of a single list (@_) is a bit limiting.

How can we ensure that variables used in a subroutine really are unique to that subroutine, and how can we pass complex parameters (such as multiple arrays or hashes) in and out of a subroutine?

As Perl has evolved organically, there are two ways to accomplish each task. First, the Perl 4 way; use local() to dynamically save the global state of a variable while in a local subroutine, and use typeglobs to pass parameters. Next, the Perl 5 way; use my() to create lexically scoped variables, and use references to pass parameters. There's something to be said for each technique, so I'll explain them both.

As noted earlier, Perl stores its variables as slots in a symbol table -- a kind of global hash used by the system to keep track of named entities. You can actually have more than one symbol table in use in a Perl program; they're associated with things called 'packages' and 'modules' which will be described in Chapter 7 (data encapsulation).

Anyway, as noted earlier, local() temporarily copies the value associated with a variable name onto a hidden stack, and restores it when Perl leaves the enclosing construct (such as a subroutine). Thus, the value associated with the variable within that subroutine is insulated from its global value in the symbol table.

A typeglob is another operation you can perform on the symbol table; in effect, it bundles all the variables that have a common name together, in a form which can be passed through to a subroutine as if it was a scalar. For example, suppose we have a hash called %my_hash. We can take a typeglob to it by referring to *my_hash. All *my_hash is is a handle for everything in the symbol table called my_hash -- for $my_hash, @my_hash, and %my_hash (if they exist). We can then pass *my_hash into a subroutine, and get at %my_hash through it, as if it's a scalar, like this:

  #!/usr/bin/perl

  %my_hash = (
      "color" => "red",
      "music" => "ambient",
      "taste" => "chocolate"
     ) ;  # define %my_hash;

  %your_hash = (
      "program" => "test5.pl",
      "language" => "perl",
      "beast" => "camel"
     ); # define a second hash

  sub print_hashes {
     local ($hashes_to_print) = shift @_;
     local (%tmp_hash);
     foreach $num (1 .. $hashes_to_print) {
          *tmp_hash = shift @_;
          print "Printing hash $num\n";
          foreach $key (keys %tmp_hash) {
               print "$key => ", $tmp_hash{$key}, "\n";
          }
      }
  }

  print_hashes(2, *my_hash, *your_hash);

In this example, we can see how Perl 4's typeglob mechanism lets us pass variable numbers of hashes (or arrays) to a subroutine.

First, we define a couple of hashes (containing sample data). (Note that the => notation is a Perl 5 thing -- syntactic sugar that could equally well be replaced by a comma.)

The meat of this example is the subroutine print_hashes(). This is intended to show how we can pass two or more hashes to a subroutine. The first parameter is a scalar, which tells print_hashes how many associative arrays to expect. (This is important because all the actual data is mashed into @_ when it's passed into the subroutine's context; this normally conceals the number of arrays or hashes being passed to a subroutine.) The second and subsequent arguments are typeglobs with the same name as the hashes we want to pass. Thus, for a hash called %myvars, you would pass print_hashes the typeglob *myvars.

Inside print_hashes(), first we read the number of hashes we're going to receive, then we enter a loop. We use shift() to pull typeglobs off the @_ array, making sure to use a local variable -- the typeglob thus aliases the external hash (say, %my_hash) to the local hash %tmp_hash.

In effect, the typeglob mechanism is used to create an alias for a variable -- a second name that can be used to get to its contents from within some alien context.


The Perl 5 Way

Typeglobs aren't commonly used in Perl 5 (at least, not for passing multiple arrays as parameters to subroutines), because there's a better way: references.

We've seen that variables are names that are related in some way to a chunk of data that Perl knows the location of. A reference is simply the actual location of the data itself. You can store the location of a variable in a scalar, and use it to get at the variable's contents. So if you have a hash or an array, you can take a reference to it (that is, ask Perl where it is held in memory), and store the location in a scalar -- and, by de-referencing the scalar, relocate the original data.

Here's the general idea:

A hash and an anonymous hash

You can take a reference to a variable by prefixing it with a backslash. For example:

  $my_ref = \%some_hash;

Stores a reference to %some_hash in the scalar $my_ref. While:

  $my_array = \@list

Stores a reference to the array @list in $my_array.

You can also store references to scalars:

  $my_scalar_ref = \$scalar

$my_ref (or $my_array, or $my_scalar_ref) are just ordinary scalars. But if we print them out we see something like this:

  HASH(0x80be8d4)

or

  ARRAY(0x80be8d4)

This tells us that we're looking, not at a scalar's value, but at a reference (to the named kind of item). The string in brackets is the item's location in Perl's symbol table -- but it's no use to us, as it's just a stringified representation of an underlying ``hidden'' piece of data. Unlike C or C++, we can't physically manipulate the address in memory that a reference points to; references are not actual memory locations, but identifiers that Perl uses to find some allocated storage occupied by an item.

Perhaps the best way of thinking about this is to think of a variable as being two distinct things: a name, and an underlying ``thingy'' which stores some data somewhere in Perl's symbol table. (``Thingy'' is the approved technical Perl terminology for talking about underlying stored data.) Normally, we get at a thingy by referring to its name; but a thingy can exist without having an explicit name, in which case the only way we can get to it is by knowing where it is stored. A reference is simply the location where it is stashed away by Perl.

How do we get back at the data stored in an array or hash that we have a reference for?

Suppose we write:

  my (%hash) = ( "color"  => "red",
                 "number" => 3,
                 "quark"  => "charm");

  my ($hashref) = \%hash;

The approved way to get back at our original hash is to take our reference and explicitly refer to it in a hash context:

  print $$hashref{quark}, "\n";

which outputs:

  charm

$hashref is a reference to a hash. We can talk about the underlying hash by referring to %$hashref: the hash pointed to by $hashref. The ``%'' prefix indicates that we're talking about it in a hash context. $$hashref{quark} is equivalent to $hash{quark}.

This is an important rule for dealing with references: prefix a variable containing a reference with an extra variable type specifier, and Perl will try to treat the thingy behind the reference as if it's of that type.

As with so many other things in Perl, there is More Than One Way To Do It. In particular, it can get kind of hairy to talk about variables like $$$fruitbat{whatever}. More seriously, suppose we want to use a multi-dimensional array, or a multi-dimensional hash. In C, we can refer to elements in a two-dimensional array, like array[1][4], the fourth item in row 1 of the array. In Perl, we can have an array of references, each of which points to another array. But how do we talk about the underlying item?

We use curly brackets {} to make references less ambiguous. For example, %{$hashref} is equivalent to %$hashref, but is a lot easier to read. For this to work we are using {$hashref} (a tiny block of code which returns the contents of $hashref) as a variable's name. This relies on the fact that a block of code returns either the last variable specified in it, or the result of the last statement executed in it. If we enclose a scalar containing a reference in a block, it returns the reference.

Thus, suppose we have an array of references, each of which points to another array. Let's define it:

  @array1 = qw( red blue green );
  @array2 = qw( yellow purple violet );
  @array3 = qw( mauve pink gold );

  @two_d_array = ( \@array1, \@array2, \@array3);

@two_d_array is an array of references, each of which refers to an array defined previously. (Yes, there are less cumbersome ways of defining it; we'll meet them later.)

How do we examine the second element of @array2, via @two_d_array?

First, we know that the second element of @two_d_array is a reference to the target array (@array2). We can get the reference we need like this:

  @two_d_array[1]

(remember, array subscripts are numbered from zero in Perl, by default).

So we enclose it in curly brackets, then add a subscript to the desired element of @array2:

  {@two_d_array[1]}[1]

But this isn't going to work, because we haven't told Perl whether we're treating this in a list or scalar context -- as an array slice, or as an individual element of an array. So we need to prefix it with an ``@'' sign or a ``$'' symbol, respectively:

  print ${ @two_d_array[1] }[1], "\n";

which prints:

  purple

To wrap up the job of getting multiple hashes or arrays into a subroutine: the Perl 5 way is to pass references. For a trivial example:

  my (@insects) = qw(ant bee moth wasp);
  my (@stings) = qw(mild painful non-existent vicious);

  describe(\@insects, \@stings);

  sub describe {
      my ($creepie) = shift @_;
      my ($bite)    = shift @_;

      for ($i = 0; $i < 4; $i++) {
         print "A ", $$creepie[$i], " has a ", $$bite[$i], " sting\n";
      }
  }

Which results in:

  A ant has a mild sting
  A bee has a painful sting
  A moth has a non-existent sting
  A wasp has a vicious sting  

The important thing to note here is that the arrays @insects and @stings have been kept separate while being passed through to describe(). If we had passed the arrays through, rather than references to them, they'd have been munged together in @_.

We're going to see a lot more uses for references, and other ways of deploying them, in the next chapter -- on data structures in Perl.


The Swiss-Army Chainsaw: built-in functions

All these ways of handling variables may have made you dizzy, but we've still barely scratched the surface. In addition to a very powerful language syntax, Perl has a plethora of built-in functions for doing various things. Perl was designed to be complete, not simple: as a result, most of the system interaction tasks that a C programmer would use external libraries for are actually built in. Perl is like a Swiss army knife: you can never be quite sure how many blades it has for dealing with exotic, specialized problems, until you've used them all.

We're not going to examine Perl's toolkit in any great detail at this stage. If we did, this chapter would be as long as a book! If you need to examine them in detail, see the perlfunc manual page, or Chapter 3 of ``Programming Perl'' (both of which run to about a hundred pages).

Meanwhile, here is a list of built-in functions (in Perl 5), organized by category. (This list is reproduced from the manual page.)

built-in functions
      Functions for SCALARs or strings
            chomp, chop, chr, crypt, hex, index, lc, lcfirst,
            length, oct, ord, pack, q/STRING/, qq/STRING/,
            reverse, rindex, sprintf, substr, tr///, uc, ucfirst,
            y///

       Regular expressions and pattern matching
            m//, pos, quotemeta, s///, split, study

       Numeric functions
            abs, atan2, cos, exp, hex, int, log, oct, rand, sin,
            sqrt, srand

       Functions for real @ARRAYs
            pop, push, shift, splice, unshift

       Functions for list data
            grep, join, map, qw/STRING/, reverse, sort, unpack

       Functions for real %HASHes
            delete, each, exists, keys, values

       Input and output functions
            binmode, close, closedir, dbmclose, dbmopen, die,
            eof, fileno, flock, format, getc, print, printf,
            read, readdir, rewinddir, seek, seekdir, select,
            syscall, sysread, syswrite, tell, telldir, truncate,
            warn, write
 
       Functions for fixed length data or records
            pack, read, syscall, sysread, syswrite, unpack, vec

       Functions for filehandles, files, or directories
            -X, chdir, chmod, chown, chroot, fcntl, glob, ioctl,
            link, lstat, mkdir, open, opendir, readlink, rename,
            rmdir, stat, symlink, umask, unlink, utime

       Keywords related to the control flow of your perl program
            caller, continue, die, do, dump, eval, exit, goto,
            last, next, redo, return, sub, wantarray
 
       Keywords related to scoping
            caller, import, local, my, package, use

       Miscellaneous functions
            defined, dump, eval, formline, local, my, reset,
            scalar, undef, wantarray

       Functions for processes and process groups
            alarm, exec, fork, getpgrp, getppid, getpriority,
            kill, pipe, qx/STRING/, setpgrp, setpriority, sleep,
            system, times, wait, waitpid

       Keywords related to perl modules
            do, import, no, package, require, use

       Keywords related to classes and object-orientedness
            bless, dbmclose, dbmopen, package, ref, tie, tied,
            untie, use

       Low-level socket functions
            accept, bind, connect, getpeername, getsockname,
            getsockopt, listen, recv, send, setsockopt, shutdown,
            socket, socketpair

       System V interprocess communication functions
            msgctl, msgget, msgrcv, msgsnd, semctl, semget,
            semop, shmctl, shmget, shmread, shmwrite

       Fetching user and group info
            endgrent, endhostent, endnetent, endpwent, getgrent,
            getgrgid, getgrnam, getlogin, getpwent, getpwnam,
            getpwuid, setgrent, setpwent

       Fetching network info
            endprotoent, endservent, gethostbyaddr,
            gethostbyname, gethostent, getnetbyaddr,
            getnetbyname, getnetent, getprotobyname,
            getprotobynumber, getprotoent, getservbyname,
            getservbyport, getservent, sethostent, setnetent,
            setprotoent, setservent
 
       Time-related functions
            gmtime, localtime, time, times


Practical tools for practical tasks: UNIX pipelines

Up until now, we've been focussing on fragments of code that demonstrate specific features of Perl. This is all well and good, but by now you probably know enough that you're becoming impatient and want to write your own programs. As it happens, you still don't know enough to do anything constructive in Perl/Tk ... but straight Perl is another matter.

Perl programs that run on the UNIX (or DOS) command line are structurally simple and fairly easy to write. While they aren't interactive, don't have menus, buttons, bells, or whistles, and don't seem a lot like the sort of applications we're aiming at, simple filter commands are quite useful.

A filter in this context is a program that reads data in at one end and spits it out at the other, having transformed it en route. UNIX filter commands typically read from their standard input (the file handle STDIN) and print output on the standard output (the file handle STDOUT); warnings or other information goes to the standard error (STDERR), which is usually merged with STDOUT if you're running them on a terminal.

There are some variations on this, of course. Filters may also be given the name of a file or files to read, and flags can be passed to them that change their behaviour. But in general, the same principle applies; we use filter-type programs to run some useful transformation on a batch of data.

The general structure of a filter is fairly simple: stuff goes in, is churned about, and different stuff comes out.

Flow of control in a filter

When the program is executed, the first thing it does is check for command-line arguments. These are passed to it in the array @ARGV. If there are arguments, it modifies its behaviour appropriately and (if files are specified) starts reading data from them. (There's a packaged library of Perl code for reading command-line options called getopt; we'll see how to use it later) If there there aren't any, it starts reading data from the standard input instead.

While there is some input data to read, the program loops and processes it, doing something or other. The usual result of processing some data is to transform it, and print the transformed result on the standard output.

This kind of non-interactive program is relatively simple: filters don't back up, don't have complex behaviour that may be context-dependent, and they don't usually produce complex graphics.

Here's an example of a simple pipeline that counts the words in its input stream (after the example fragment in the previous chapter):

  #!/usr/bin/perl

  if (scalar (@ARGV) > 0) {   # if we have files to read ...
     foreach (@ARGV) {
        open(INPUT, "< $_") || print STDERR "Failed to open $_ \n";
        do_count($_, <INPUT>);
     }
  } else {
      do_count("stdin", <STDIN>);
  }
  select STDOUT;
  exit;

  sub do_count {
      my ($filename) = shift @_;
      my (@file) = @_;
      my ($words) = 0;
      foreach (@file) {
         $words += split(/\s+/, $_);
      }
        print "file $filename contains $words words\n";
  }

The job of counting words in a file is delegated to do_count(), which takes a filename and an array as parameters -- the name of the file to count, and an array containing the entire file. (Because <STDIN> or <INPUT> are interpolated in an array context, the entire file is read in and placed in @_ at this point.)

The main program body simply checks for files in @ARGV and, if any are specified, opens them and runs do_count() on them; otherwise it runs do_count() on the standard input.

This is a pretty simple-minded script, but it illustrates a point: you don't need a lot of support routines for a Perl program to be effective. What you see is all you need to accomplish the task. One file of commands is all it takes. There are circumstances where we need a much more complex set-up (with other files containing supporting code and data), but we'll leave that until later. For now, the point to remember is that Perl lets you start work with programs which are structurally simple. Elaboration comes later.


Exercises

  1. John Conway's Game of Life is a classic example of a cellular automaton simulation. A two dimensional grid of cells is used; each cell can be ``alive'' or ``dead''. The grid is updated in in accordance with some simple rules; patterns arise and propagate across it in a fascinating display of emergent behaviour.

    The rules are as follows. At each turn, lonely cells (ones with fewer than three live neighbours) die or stay dead; crowded cells (those with more than four live neighbours) die or remain dead: but cells with three or four live neighbours either remain alive or come to life.

    To write a program that simulates the game of life, you will need to use a pair of two-dimensional arrays. For each cell in the current array, you will need to count the adjoining cells and then update the state of the corresponding cell in the next array -- then print the next array, and repeat.

  2. For bonus points: write a simple function that reads a grid of cells from a text file and uses it to initialize the Game of Life.

    (We're going to visit the Game of Life in the next chapter, on data structures.)


[ Site Index] [ Attic Index] [ Perl/Tk Index] [ Feedback ]


[ Intro ] [ Chap 1 ] [ Chap 2 ] [ Chap 3 ] [ Chap 4 ]