December 1999 Column


[ Site Index] [ Linux Index] [ Feedback ]


Perl

The big majority of Linux systems installed in corporate environs today are configured as servers. This may change as the desktop environments mature, but for now servers are where the action is. I'm going to devote the first half of this column to talking about a programming language close to my heart -- Perl. Perl has been described as "the duct tape of the internet" and "the UNIX swiss-army chainsaw". It bears roughly the same relationship to UNIX (and hence Linux) that Visual Basic bears to Windows, although, unlike Visual Basic, Perl isn't tied to one operating system. (In fact, it runs on more platforms than Java!)

Where Visual Basic is used to create pretty-looking front ends to desktop applications, Perl is used as the glue that holds server applications together. It has been said that you can't really be a UNIX sysadmin without at least a working knowledge of Perl. Consequently, this month I'm going to tell you why -- if you have ambitions in the direction of running your own system -- Perl is what you should be using.

First, let's get one thing straight. Perl is not something to do with the World Wide Web. If you've ever tried to do anything more complex than write a page of HTML, you'll have run into the ubiquitous CGI (common gateway interface) which allows you to write programs that plug into a web server and generate dynamic pages. Something like 80% of CGI programs are written in Perl. But Perl is not a web scripting language; it was actually developed largely as a text processing language (although it's long since out-grown that role, the same way that C outgrew its role as a portable assembly language for writing the UNIX kernel in). Because web applications require the ability to mangle lots of text, Perl (which stands for Practical Extraction and Report Language, or Pathologically Eclectic Rubbish Lister, depending on the phase of the moon and what mood creator Larry Wall is in) is the ideal candidate for the job. But there's more to Perl than web sites, and in fact I'm going to ignore the web for the most part.

Perl is a full-blown programming language; if you include all the various libraries and modules on CPAN (the Comprehensive Perl Archive Network), it's probably as big and complex to learn as Java. Like Java, Perl is descended from C and is object-oriented. Unlike Java, the object orientation isn't compulsory -- indeed, it's a relatively recent addition to the language, which started out as a scripting system (a bit like shell scripting on steroids). Also unlike Java, Perl is weakly typed -- or rather, it has a different idea of variable typing from most older languages. Where C or Pascal might distinguish between data types (integer, floating point, string, long, double, and so on), Perl automatically interconverts low-level data depending on the context the programmer seems to be using it in; Perl's notion of typing is concerned with whether data is singular (scalar, or ordinary variables), plural (arrays, which are enumerated instances of lists), or dictionaries (Perl's built-in hash tables). Perl is to some extent string-oriented and list-oriented, and certainly has powerful operators for manipulating textual data; among other things it has a built-in regular expression parser that puts most text editors to shame.

But Perl isn't simply a text manipulation tool; it provides a full range of UNIX system calls as built-in functions, allowing a UNIX programmer to write applications that use semaphores, sockets, pipes, child processes, and database files without the usual worries about memory allocation or pointers. For more specialised applications, the CPAN archive provides modules (wrappers around object classes) that handle special tasks such as talking to mail, web, or ftp servers, manipulating Oracle or other relational databases, interfacing to devices and files, or providing graphical user interfaces. You can find a whole load of pointers to perl related material on two websites, www.perl.com (maintained by O'Reilly and Associates) and www.perl.org (maintained by the Perl Mongers, an informal organisation of Perl users).

Perl is a semi-compiled language, one of a class that includes Tcl, Python, and Guile. When you feed a Perl program to the perl interpreter, it scans it and compiles it to an in-memory data structure -- then traverses it rapidly. The scanning process allows the interpreter to take advantage of the sort of code optimizations that previously only compilers could provide; meanwhile the fact that it's interpreted means Perl is a dynamic language, permitting run-time definition of new constructs and even self-modifying code.

Just how powerful is Perl? One measure of it is Tom Christiansen's Perl Power Tools project. This is basically an on-going exercise is re-writing the entire range of UNIX command-line utilities in Perl. If you can find it on a standard BSD UNIX system, you'll probably find it in PPT. One advantage of re-writing the universe in pure Perl is that most of the standard UNIX tools are quite small as Perl programs; Perl is a very dense language, and without bloated binaries the whole suite will fit in a couple of megabytes. Another advantage is that any architecture that runs Perl should then run the PPT tools, providing a UNIX-like command line environment. And Perl doesn't suffer the nasty memory allocation problems of many C applications, that use fixed-length buffers; the PPT tools are immune to buffer-overrun attacks, for example. As Perl runs on virtually every operating system that a C compiler can target, this project therefore promises to provide a single free source of UNIX tools on everything from Palm Pilots to Series 390 mainframes.

In performance terms, Perl is fast. Not, of course, as fast as hand- carved assembly language or optimized C, but fast enough: the Perl interpreter itself is very efficiently optimised. One set of analyses put together by Tom Christiansen suggest a 2.5:1 edge for C over Perl ... but a 10:1 edge for Perl over Java. It's certainly no slouch as business programming environments go, although it's not designed for tasks such as writing device drivers or operating system code.

Perl is, of course, open source. Like many open source projects, it has found a degree of commercial support; in this case, from publishing house O'Reilly and Associates. O'Reilly, publishers of the nutshell series of books -- which are absolutely required reading for UNIX or Linux system administrators or professionals -- employ language designer Larry Wall, with a remit to develop Perl further. They also publish a range of Perl books.

Perl books start with the camel book (all the nutshell guides have some sort of beast or bird on their cover), Programming Perl (by Wall, Schwartz and Christiansen). Programming Perl is the definitive book of the language, much as "The C Programming Language" by Kernighan and Ritchie is the definitive book on C. It's not a good book to learn the language from (unless you're already an experienced programmer who is familiar with C, UNIX shell scripting, awk and sed), but it will nail down any of the linguistic ambiguities that make Perl such a fascinating and sometimes frustrating tool.

For a long time, the only credible Perl tutorial was "Learning Perl" by Randal Schwartz (also published by O'Reilly and Associates). The llama book -- for some reason camelids seem to go with Perl -- serves as a gentle introduction to the language, for those with less of a programming background. The second edition has additional introductions to CGI programming for the web, and object oriented programming, although these should be treated as introductions rather than definitive material.

There's a whole shelf of Nutshell books covering various other aspects of Perl. Advanced Perl Programming by Sriram Srinivasan covers language internals, then useful topics such as interprocess communication, networking, object persistence, and developing graphical user interfaces with Perl/Tk. "Learning Perl/Tk" by Nancy Walsh covers the Tk graphics system, originally part of John Ousterhout's Tcl language, now accessible from Perl and very useful for writing graphical applications. "The Perl Cookbook" by Tom Christiansen provides a wealth of utilities and examples, with detailed explanations, showing how Perl makes a powerful real-world tool for managing systems. The Perl Resource Centre is basically the whole CPAN module archive, on CD, with accompanying typeset documentation. And a whole bunch of the books are gathered together in the Perl CD Bookcase, along with "Perl in a Nutshell" (a useful quick reference for anyone familiar with the language) -- five of the most important books on one CDROM with searching and indexing. Forthcoming titles include one on the DBI database interface, a marvelous tool for getting at relational databases which essentially allows the programmer to embed SQL queries in the Perl code and to link to multiple types of database on different machines simultaneously and exchange data between them. There's also an upcoming title on algorithms in Perl which promises to be a useful tome.

An important point to take away from this is that Linux is currently the main development platform for Perl (although it runs on lots of other operating systems). It's also interesting to note that on one major recruitment database in California, notified vacancies for Perl programmers overtook COBOL late in 1998 and overtook Visual Basic in summer '99. Perl is definitely a happening thing, and if you're looking for a programming language to develop that in-house multi-user bookkeeping system in, Perl might well be exactly the tool you need.

Getting in the way: stupid e-commerce website tricks

As you might have guessed from the way I just banged on about Perl, I know a bit about that language. There's a reason: when I'm not writing this column I develop back-office applications in it. On Linux, natch.

I first met Perl in 1991, though I didn't begin using it until I met the web, which was in 1993. The web was one of those things that starts small then explodes in all directions, the way Linux is now going. I did a lot of CGI programming back then. I made a lot of mistakes and learned a lot of things about how not to bolt together a website. Then I sort of stopped writing CGI programs because the money moved in one direction and the interest moved in another, and the web had turned from a gosh-wow what-the-hell-is-that-thing? experience into something guys in suits do for a living. And while I've got nothing against guys in suits doing web development for a living -- I suppose they've got to find something to do to pay the tailor -- it's not really my cup of tea.

So it came as a rude awakening two weeks ago when I tried to book a foreign holiday at short notice -- via the web -- and was taught an important lesson: that the corporate web development business has forgotten all the lessons learned by the mainframe guys back during the 1970's and needs to re-learn them badly.

Let's hit the rewind button and scroll back the tape of history to 1993. Back then, the web was new: the standard browser was NCSA Mosaic, Bill Gates was banging on about how the Microsoft Network would displace TCP/IP, and it was kind of cool to be able to write a scriplet that would insert a hit counter in your web pages. There were no <BLINK> tags because the <BLINK> tag hadn't been invented and Jamie Zawinski wasn't living under a fatwah.

By 1994 this had changed. A stampeding horde of businesspeople had heard one scare story too many about how the internet was going to change the future of the universe; at the end of the year I joined my first web start-up, moved into a desk in the boss's attic (next to a sauna hut on one side and a ski-rack on the other) and began to bend code. Most of the customers wanted a first- generation website: a brochure, basically, explaining who they were, what they did, and who to phone if you wanted some of it.

In 1996 this was already changing. People realised that the rudimentary "brochure" type website was just a placeholder for something more. Start- ups with names like Amazon.com were beginning to mutter about hooking inventory databases up to the net and selling things through them. The term "e-commerce" was well established, although if you wanted to take credit card payments via the web you still needed to sacrifice a black goat at midnight with a silver sickle over an EPOS terminal.

Now we reach the brave new world of 1999. It is possible to take credit cards over the net (the UK's main acquiring banks only turn down about 95% of applicants, so we can downgrade the goat to a chicken). People know how to hook databases up to web servers, thanks to tools like Perl and DBI::DBD. It is therefore possible, in principle, to let a customer tell you what they want and to obtain money from them in return for providing it. In principle?

"Edinburgh to Amsterdam. Fly out the day after tomorrow, fly back on Sunday as late as possible. How difficult can that be?"

All the regional airlines in the UK have websites that purport to let you book tickets online. They have presumably paid developers pots of money for these toys, which is why their sheer incompetence in the user interface department is breath-taking.

First stop on my grand tour of websites was KLM UK. (Yes, I'm going to name culprits here.) They're a firm believer in Windows NT, as witness the succession of brain-dead ASP pages that follow. Hit the "book now" button and it'll take you through a series of pages in which you enter when and where you want to go, and it tries to find a flight that suits you. Too bad that if you try to book a flight using Netscape Navigator 4.06 on a Macintosh (the horrors!) you will get half way into this dialogue, try to give your credit card details, and receive a snide message informing you that your session has been left unattended for more than twenty minutes and for security reasons you'll have to start all over again. All because their designers assumed that the entire world runs Microsoft Internet Explorer and speaks its dialect of JavaScript.

(They lost upwards of five hundred quid on that one; then when I emailed the webmaster contact address given for reporting problems, the mail bounced. Says it all, really.)

Next I tried easyJet. easyJet use Luton as a hub; you can't get to Schiphol from Turnhouse without changing there. The easyJet web-based booking system actually works, but can't handle through-bookings; you've got to work out an itinerary for Edinburgh-Luton and Luton-Amsterdam yourself, then book the tickets. I suppose it's a bit unfair to criticise them for not handling through bookings, as they don't let you check your luggage for a through flight, either, but it's not hard to whip up code that can work out a path through a network of airports.

Then there was British Midland. I like BM, but somehow nothing gives me a sinking feeling quite like jumping to a home page and seeing it start by downloading six billion huge graphics files that don't actually do anything useful except suck up bandwidth. Want to view the timetable for your chosen flight? Click the button and it brings up a table of links to airports you're departing from ... then proceeds to download a separate image for each airport (there are forty of 'em). And the image contents? The name of the airport, in white letters on a blue background. Traversing their website was a bit like wading through treacle: evidently they think that wasting their customers' bandwidth is a selling point.

The booking system was a miracle, too; when you tell a booking system that you want a flight on date (x), and today is (x-2), to be offered flights on date (x-5) suggests that maybe, just maybe, they've cracked time travel.

In the end I picked up the phone and rang one of the carriers. Which got that little obstacle sorted out -- leaving me with the issue of hotel booking and travel insurance.

I'm going to draw a polite veil over the hotel booking systems on the net, because they make the airlines look like paradigms of gleaming efficiency.

I had fun with one of the major travel insurance companies, too. I hit their web site the day before departure, in need of cover at short notice. The online quote and purchase system looked okay -- until I discovered, four forms into a lengthy dialogue, that I'd mistyped my surname, there was no facility to back up and correct it, and using the browser's back button got a warning about expired data. The first law of designing a big, interactive form, is that you pay attention to tracking the state of a customer's interaction, and give them opportunities to correct mistakes at any stage. Instead, the designers of this site seemed more interested in writing cute JavaScripts to take my date of birth and calculate my age.

However, the real sting came in the tail -- after I'd entered my credit card details and hit the "bill me" button. To my horror, a web page arrived: "thank you for placing your order, the policy documents will be posted to you later today." Great, just the thing I need for a 7am flight tomorrow morning!

If they'd bothered thinking about how this system was to be deployed in real life, they might have realised that some users would want immediate service -- basically, a policy number and emergency phone contact -- and told such customers what to do, up front, before forcing them to fill out seven or eight legnthy forms and pay by credit card. But it looks like the skills developed by generations of mainframe systems analysts, from the 1960's onwards, have been abandoned completely by today's web designers.

Web applications are very like mainframe applications. Mainframes have terminals that present forms to the user, who fills them in. You then hit a "send" key and your terminal sends the completed form to the mainframe, where it is run as a job (or as input to a job). One or more screens of results are generated, and these get blatted back to your terminal. Any user has a history of interaction with the mainframe, but this isn't necessarily visible to the job being executed on it. So the programmer developing such a program needs to keep track of just which form the user's filled out, and allow them to switch from one to another.

You can substitute "browser" for "terminal" and "web server" for "mainframe" inthe above paragraph and it won't change the sense at all. Because web applications are basically stateless form-based applications, just like the old IBM mainframe systems of yore, and no amount of dressing them up with glitzy animated graphics changes this. A lot of hard-won knowledge came out of the analysts, whose job was to figure out what the users needed to put into the machine, what they needed to see coming out of it, and what happened in the middle. And what I saw the other week suggested that instead of systems analysts, the people commissioning those big budget online booking systems were paying graphic designers who thought they were building a flashy brochure rather than a mainframe application.


[ Site Index] [ Linux Index] [ Feedback ]