Chapter 5: Introducing Perl

Putting it together


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]


Now we've covered the basics, it's possible to examine some working CGI scripts. Here are two:


A reflector script

Here's a simple reflector script. All it does is reflect back to the client everything that the client sent to the server. It does so by encapsulating the query strings in a definition list. You can use it to get a feel for the way forms work; create forms, make sure that their ACTION invokes this script, and it will tell you exactly what the form sent to the server.
#
# first get the query
#
if ($ENV{'REQUEST_METHOD'} eq "GET") {		# get input for GET requests
	$request = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD'} eq "POST") {	# get input for POST requests
	read(STDIN, $request,$ENV{'CONTENT_LENGTH'}) 
		|| die "Content-type: text/html\n\n<HTML>\n<H1>Fatal Error</H1>  \
			Could not get query\n</HTML>\n";
} 

#
# print a generic header to standard output, to send back to the client
#
print "Content-type: text/html \n\n";
#
# from now on, our output is in HTML, because that's what we've said we're
# sending
#
print "<HTML\n";
print "<H1>Query reflector</H1>\n";

#
# sanity check; return an error if the query is empty
#
if (length($request) = 0) {
	print "<B>No</B> query was detected. Did you send a request?\n";
	print "</HTML>\n";
	exit 0;
}  

#
# now parse the query
#

@parameter_list = split(/&/,$request);          # split name=value pairs
foreach (@parameter_list) {		        # for each parameter ...
	($name, $value) = split(/=/);           # get the name and value
	$name =~ s/\+/ /g;                      # convert characters
	$name =~ s/%(..)/pack("c",hex($1))/ge;
	$value =~ s/\+/ /g;
	$value =~ s/%(..)/pack("c",hex($1))/ge;
	$passed{$name} = $value;                # put them in %passed
}

#
# now return the query embedded in an HTML document
#
print "I recieved the following variables and values:\n";
print "<HR>\n<DL>\n";	     # we're sending back a definition list of variables
foreach $key (keys %passed) {		             # loop through %passed
    print "<DT>$key\n";          # printing key, value pairs
    print "<DD>$passed{$key}\n";
}
print "</DL><HR>\n";       # be nice and close the list
print "</HTML>\n";
exit 0;                                          # exit in good order

Although this script is trivial, it is quite useful. As it is, it returns the entered values to the user. However, with minor modifications it can be used to take more complex actions -- for example, to email the variables to a specified mail address.


A simple mail feedback form

To make this send email to a given address, using the UNIX sendmail system, add the following lines at the top of the file, where variables are being declared:
$recipient = 'user@host';
$mailer_binary = "/usr/lib/sendmail";
$mailer_flags = " -odi -t";

(Making sure to set user@host to your email address.)

Then replace the last fourteen lines of the file with this:

$message =  "\nContent Feedback Report\n";
$message .= "=======================\n";
foreach $key (keys %passed) {            # loop through %passed
    $message .= "$key: $passed{$key}\n";        # printing key, value pairs
}                                            # the following stuff is
                                            # all put in the environment
                                            # by the web server
$message .= "Protocol:       $ENV{'SERVER_PROTOCOL'}\n";
$message .= "Remote Host:    $ENV{'REMOTE_HOST'}\n";
$message .= "Remote IP addr: $ENV{'REMOTE_ADDR'}\n";
#
# Now send mail to $recipient
#
open (MAIL, "|$mailer_binary $mailer_flags $recipient") 
    || die "Can't open $mailprog!\n";        # open a pipe to sendmail

print MAIL "Subject: Content Feedback\n";        # print $message into sendmail
print MAIL "From: daemon\n";                # remember the From line 
print MAIL "$message";
close (MAIL);                                # close the pipe and quit
return 0;

This script does exactly the same -- except that instead of printing its output to the standard output, where it is captured by the HTTP server and sent back to the user, it sends it to the sendmail mail daemon via a pipe. sendmail is a mail transport agent; it handles the process of taking mail from a mail application and sending it to an SMTP server, and the process of receiving mail and storing it in user's mail folders.

(Note that invoking a mail tool such as pine or elm from within a CGI script is potentially hazardous, as these programs have the ability to execute shell commands and it is not impossible for a hacker to sneak some instructions in via an odd set of form arguments, which are then executed on the server and open up a security hole. Security issues are discussed in more detail in Chapter 9.)

A simple search script

This script is a bit more complex. It searches files in a directory hierarchy for a regular expression, and returns a bulleted list of URLs, the clickable text of which is the TITLE string for the matching file.

To use it, create a form like this:
<HR>

<FORM ACTION="/cgi-bin/scan.pl"
METHOD="GET">

<PRE>Enter search text:   <INPUT NAME="search">

Press here to start:  <INPUT TYPE="Submit" VALUE="Start search">
</PRE>
</FORM>
This form permits you to enter a perl regular expression. It searches all the files in directories under public_html, and returns a bullet-list of files that contain the expression. <P;> <HR>

The form's ACTION should point to the CGI script below:
#!/bin/perl



# read a GET request; quit if method is not GET
# take first argument
# search for it in all .html files below $basedir
# return URLs for all files containing matches,formatted as bullet-list,
# with <H1>..</H1> string contents as clickable link text

$basedir = '/apache/docs/public_html';    # root directory for HTML files
$serverroot = '/apache/docs';        # server root directory
$exclude = '/graphics';                # pathname component to ignore
$dirsep = '/';                            # directory separator character
$searchform = "search.html";            # name of search form
# $DEBUG = 1;                            # debugging output toggle
                                        # if $DEBUG = 1 is set, print
                                        # lots of debugging messages

print "Content-type: text/html\n\n";
print "<head><title>Search Results</title></head>\n";
print "<body>\n";

if ($ENV{'REQUEST_METHOD'} ne "GET") {    # we only process get requests
    print "<H1><i>Error</i></H1>\n";
    print "Malformed or incorrect request detected\n</body>";
    exit 0;
}
$request = $ENV{'QUERY_STRING'};
@parameter_list = split(/&/,$request);    # split name=value pairs
foreach (@parameter_list) {            # for each parameter ...
    ($name, $value) = split(/=/);        # get the name and value
    $name =~ s/\+/ /g;                # convert characters
    $name =~ s/%(..)/pack("c",hex($1))/ge;
    $value =~ s/\+/ /g;
    $value =~ s/%(..)/pack("c",hex($1))/ge;
    $args{$name} = $value;            # put them in %args
}

if ($args{'search'} eq '')  {        # check for a search argument
    print "\n<H1><i>Error</i></H1>\n";
    print "No search string was entered\n</body>";
    exit 0;
}
#
#-----------------------------------------------------------------------------
#
# Now we have acquired the query, we can start looking for files.
# To do this, we will open the directory $basedir using opendir(), get a
# list of the files in it, and save them. Then, for every entry in $basedir
# that is a directory, we will push it onto a stack.Then we pop the next
# directory off the stack and repeat the process, until the stack contains
# no more directories. 
#
# Note that this script explicitly avoids the "." and ".." directories, and
# refuses to follow symbolic links. This ensures that the search goes down,
# rather than up or sideways, in the directory structure.
#
# ASSUMTPIONS: 
#
# That files contain lines like <TITLE>file's title</TITLE>, with no line breaks
#
# That it's a bad idea to invoke Perl's glob() built-in, because it relies on
# invoking the C shell
#
#-----------------------------------------------------------------------------
#

$pattrn = '^[^.]{1,2}.+';             # pattern for files to avoid
                                        # -- one or two dots at 
                                        # beginning of name
push(@dirstack, $basedir);            # initialize stack of dirs to
                                        # visit


# now we are ready to recursively walk the directories below our starting point


while ($#dirstack >= $[ ) {            # while @dirstack is not empty


    $path = pop(@dirstack);            # take the first directory on it
    next if ($path =~ /$exclude/);        # ignore dirs matching $exclude
    @result = &glob($path,$pattrn);        $ call glob() to get file list


    foreach $dir  (@result) {


        $path =~ s/(^.*)$\/$/$1/;         # strip trailing "/" from file
        if ( -d "$path\/$dir") {        # process directories ...
            $location = "$path/$dir"; 
            $DEBUG && print "$location [DIRECTORY]\n";
            push (@dirstack, $location);    # put newly discovered dirs on
                                        # the stack 
            $visited++;                # keep count of dirs visited
        }
    }
    #
    # now we've processed the directories in $path, let's have a look
    # at the HTML files in $path
    #
    @htmlfiles = &glob($path,'(^.*\.[Hh][Tt][Mm][Ll]$)|(^.*\.HTM$)');
    foreach $target (@htmlfiles) {        # foreach HTML file
        if (! -l $target) {             # ignore symlinks
            $DEBUG && print ".";
            push(@targetfiles,"$path/$target");
        }                                # @targetfiles holds the list
    }                                    # of HTML files we've found
}

$DEBUG && print "\n\nFound $#targetfiles matching files in ",
    "$visited directories\n<P>\n";

# now search (using grep()) through @targetfiles for $args{'search'}

print "<H1>Search Results</H1>\nThe following files were found to ",
      "contain [<CODE>$args{search}</CODE>]:\n<P><HR><P>";

print "<UL>\n";                        # start listing results
$catches = 0;


foreach $filename (@targetfiles) {        # loop through the HTML files    

    #
    # read in the file. Search it using grep(). If the target is found,
    # get the file's TITLE and use that, plus the file's pathname, to
    # create a URL for it.
    #
    undef @hits;                        # clear these variables
    undef $hits;
    undef @fil;
    open (FIL,"<$filename") || die "Could not open $filename, read-only\n";
    @fil = <FIL>;                        # read in the file into @fil
    close FIL;

    $hits = grep (/$args{'search'}/i,@fil);    look for $args{'search"} in @fil

    $DEBUG && print "Found $hitz matches for $args{search} in $filename\n";

    if ($hitz > 0) {                    # if we found any results ...
        @hits = grep(/<TITLE>/i,@fil);    # get all lines containing
                                        # <TITLE> tags and save in @hits
        $tag = @hits[$[];                # we use the first <TITLE> line
        chop $tag;                        # get rid of trailing carriage
                                        # returns
        $tag =~ s/(<TITLE>)(.*)(<\/TITLE>)/$2/i;
                                        # get rid of <TITLE>..</TITLE>
        $tag =~ s/^(\s+)(.*)/$2/i;        # get rid of leading whitespace
        $tag =~ s/(.*)([\s]+)$/$1/i;        # get rid of trailing whitespace
        $target = $filename;            # begin building pathname
        $target =~ s/$serverroot//i;        # remove leading server root
                                        # directory from pathname
        next if ($target =~ /$searchform/i);# we don't want to include the
                                        # searchform in the search
                                        # output, do we?
        $output = "<LI><A HREF=\"$target\">" . $tag . "</A>\n";
        print $output;                    # print the URL
        $catches++;                    # total of the number of hits
    }
}

if ($catches == 0) {                    # what to do if we didn't find
    print "<LI><I>No matches found</I>\n";    # any matching files
}

print "</UL>\n";

print "<br><hr></body>";                # tidy up and quit
exit 0;
#
#-----------------------------------------------------------------------------
#

sub glob {

    # usage: @files = &glob(directory,pattern);
    # returns all filenames in directory that match pattern
    # without making use of Perl's ability to call on the C shell's
    # globbing -- we might be running in a chroot'd environment where
    # C shell is not available(!)
    #
    local ($dir) = @_[$[];             # get the directory
    $DEBUG && print "glob(): dir is $dir\n";
    local ($patt) = @_[$[+1] ;        # get the pattern to match
    $DEBUG && print "glob(): patt is $patt\n";
    if (( -e $dir ) && ( -d $dir)) {        # if dir exists, is directory
        opendir (DIR, $dir) || warn "could not open $dir\n";
        @files = readdir (DIR);        # get all the files in $dir
        $DEBUG && print "directory contains \n@files\n\n";
        closedir (DIR);                # close $dir
        eval (@globhits = grep(/$patt/,@files));
                                        # search for $patt in the 
                                        # filenames we found in $dir
                                        # do this inside an eval()
        $DEBUG && print "returning \n@globhits\n\n";
        return @globhits;                # return the array of filenames
                                        # that matched
    } else {                            # if $dir isn't usable
                                        # directory is bogus!
        die "$dir nonexistent\n";        # program exits abruptly. This
        return;                            # should never happen.
    }
}

A few points are worth noting about this program.

Firstly, Perl has a built-in "glob" mechanism, but this program doesn't use it. "Glob" derives from the name of an obsolete UNIX utility that was used to match regular expressions for filenames. The UNIX shells perform globbing, taking patterns and returning all the files that match the patterns (which are simple regular expressions). Perl can quite easily glob files, but it does so sneakily, by using the UNIX C shell.

Normally this would not be a problem, but very often CGI scripts have to execute in environments where other programs (such as the C shell) are inaccessible. This is a common security procedure, and if this script relied on C shell globbing it would fail to work.

Secondly, the program is designed to avoid certain files. It uses a variable, $exclude, to flag directories to avoid. If you follow the convention of keeping all your graphics files in directories called graphics, set $exclude to graphics and the search script will not waste time exploring directories that contain nothing of interest.

In addition to avoiding directories with $exclude in their pathname, the program avoids directories and files that begin with a period ".". This ensures that it doesn't start climbing up to parent directories, fanning out into the filesystem at large. It also avoids symbolic links -- pointers to other directories and files outside of the current directory -- for the same reason. We want the search to be restricted to a hierarchy of directories, not the entire filesystem on the server.


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]