June 1999 Column

June 1999 Column

[ Site Index] [ Linux Index] [ Feedback ]

Linux Expo

This years' LinuxWorld Expo show in San Jose marked a major breakthrough in the commercial visibility of Linux. I wasn't there so I can't tell you what Linus told me over a beer: eight time zones did their best to get in the way. On the other hand, the really important items hit the headlines so fast that you'd have to be hiding in a coal mine wearing ear-plugs to miss them. So I'm going to ignore the news items completely, and get heavily into my Nostradamus mode in order to predict a bumpy ride ahead for the suits.

Oracle corporation are serious about Linux as a platform for their databases. This isn't surprising, given Larry Ellison's famous aversion to Bill Gates; in fact, they're so serious about Linux that they bought the keynote speaker slot for one of their VP's. Oracle are just one of the pack of companies that's getting serious about selling products on Linux. HP, IBM, and tons of other corporate names were there too, with a presence that would have been unthinkable even a year ago. Expos like this are a classic example of two nations rubbing shoulders without understanding one another; marketing people bang the drum and blow the trumpets for their (commercial) offerings in one corner, while Richard Stallman holds forth in another -- expounding the fact that free software will bury them. This sort of conflict seems to have been a defining issue at the expo; more and more, the core Linux hacker-geek culture is coming into contact with the commercial software industry, and the two may not mix very well.

(Personally, I'm rooting for the geeks; they're more entertaining than the guys with the Armani suits and the slick visuals, to say nothing about making me look like I've got good dress sense. But back to the story ...)

The meteoritic rise to prominence of Linux seems to extend far beyond the historically fragmented UNIX world. Some of the companies now announcing ports of software to this environment aren't existing UNIX people; they're coming from the Windows world. This trend is going to accelerate as we see Linux emerge as a platform on a similar scale to MacOS and Windows NT.

One major problem is that many of the new arrivals don't understand either the UNIX tradition, or the Open Source philosophy. Some of them don't even know what the Linux market is -- and as it gets bigger, sucking more of them in, the cluefulness quotient is going to sink through the floor. It's going to be like 1995 all over again, when anybody who could type in a copy of Notepad could set out their shingle as an "HTML Programmer". In point of fact, the vast majority of Linux users at present are hobbyist/home users (mostly of the DIY pursuasion), computing professionals (who use it as a cheap database/network server platform), and a relatively small proportion of business users (who have just discovered that you can buy a copy of SuSE 6.0 or Caldera 1.3 for much less than a copy of Windows 98 and Office 97). These users generaly understandard the UNIX toolset pretty well -- and this creates a yawning pit for people who think they can make money by porting Windows utilities to Linux. Which, unfortunately, seems to be the entire programmer population of Microsoft-land.

Linux, like any UNIX environment, resembles a lego set designed by obsessively reductionistic computer scientists. Underneath the barely existent graphical user interface there is extraordinarily powerful set of command-line tools. The difference in power between the command line and graphical environments is so great that most of the time all a skilled user does with the X windowing system is open a bunch of command line environments to type in.

The UNIX toolset relies on certain standard principles: all data consists of streams of bytes, all programs are configured by text files (often containing instructions in some kind of pocket programming language), all peripheral hardware can be accessed via virtual files. There are no arbitrary restrictions on the kind of data that can be fed to a program: according to legend, Ken Thompson insisted that it should to be possible to feed the output of a C compiler back to it as input. In actual practice, this won't produce anything except error messages, but at least it is possible -- something that Simply Can't Happen on quite a few other operating systems. The principle that data is generic and has repeatedly been proven useful -- and is wholly alien to the application-oriented products now being ported to the UNIX/Linux world.

What does this mean? And what does it mean for Linux?

Well, one symptom of the tool-poor environment DOS and MacOS users are used to is that people working with such systems have to buy third-party applications to do things that a UNIX-like system doesn't need. Now the big software companies are trying to move in on Linux, it's inevitable that some of them -- not understanding the nature of the beast -- will try to port utility programs to Linux that, quitesimply, aren't needed. Who needs an office email package when a typical Linux distribution comes with a choice of sendmail, exim, or whatever (to say nothing of mail clients like Netscape, Balsa, mutt, pine, or elm)? Who needs a virus scanner when their operating system is such an intrinsically virus-hostile environment that none have ever spread in the wild? Who needs a disk defragmenter when they have a self-defragmenting filesystem? Who needs a remote-control utility for operating their system remotely when it comes with telnet and X11 and VNC?

True, there are specialist niches, and there are also people out there who need to interact with the rest of the world when it's running cc:Mail or Lotus Notes ... but to a large extent, many traditional third party applications are simply irrelevant. And to this extent, quite a lot of the big software companies of the Windows and Mac world are going to find themselves unable to penetrate the Linux market.

There's more, though, to chill the blood of a marketing person. Linux really got started with GNU -- GNU's Not UNIX -- founded by Richard Stallman in 1984. This isn't the place to recap the Free Software Foundation's history; suffice to say, they're responsible for more actual lines of actual code in Linux than Linus Torvalds. The fundamentalist Stallmanite ethos is that restricting free access to software is a Bad Thing: software should be free. People should contribute to the commonweal and take from it what they need: from each according to his abilities, to each according to his requirements. A revisionist creed, championed by Eric Raymond and the likes of Tim O'Reilly, terms itself Open Source and focuses on making the bitter message a little more palatable to the overtly commercial. The drum being banged by the proponents of Open Source is that you can still make money while following the letter of Stallman's law: you simply charge for support, customization, and the aftermath of distribution. From this perspective, Open Source software is like glorified shareware: there's no compulsion to pay, no nag screens, but if you choose to pay you get some hand-holding. And then there's the Bazaar.

Let's look at the "software should be free" message a bit more closely. It doesn't mean "thou shalt not earn a living", but it does imply a major revision to the industry's concept of intellectual property. One of the points to note is that copyright law doesn't get on well with free software, and patent law gets along even worse. The open source world is like a precambrian ocean full of utterly bizarre organisms, all competing and cross-fertilizing each other with mad enthusiasm. If one desktop environment grows an ORB, all the others grow an ORB or end up fossilized, lost in the Burgess shale of history. Applications swap features, and nobody minds much because nobody is losing anything.

But people who're not used to the free software world can be burned by this. Many free software projects get set up with the specific goal of cloning some other application -- typically one that is expensive and proprietary. In some cases, this happens for purely ideological reasons; the need to live in a free software world motivates a surprising number of developers. People who bring commercial applications to the Linux world shouldn't be surprised if natives look at them, scratch their heads, and begin to build a clone.

So far, this hasn't really spawned any spectacular lawsuits. But it probably will, within the next year or two. Are Lotus going to take a free clone of Notes lying down? Well yes, maybe: Lotus is a big name company and they're mostly smart enough to know that no pointy-haired boss is going to bet the company on a clone of their product put out by a bunch of hippy communist hackers. But there will be smaller companies moving into the Linux field who see their work being cloned -- and, lacking the name recognition factor, feel that litigation is the only way to defend their market share.

To sum it up: the corporate world is discovering Linux. But those who think of Linux as being just another operating system are going to get themselves burned -- partly because Linux has a radically different idea of what goes into an operating system from Windows, and partly because good ideas get recycled. One thing's sure, though: Linux will out-live the corporate backlash.

Scripting? What's that?

Whenever you use a computer for anything more sophisticated than word processing, odds are that you're not only entering data -- you're telling your software how to manipulate that data. Most of time, the operations are non-repetitive; but sometimes you need to make the computer shoulder some of the leg-work of the more complex operations. And this is the starting point for scripting languages.

Scripting languages are programming languages that are task-oriented by design, unlike more traditional languages like C or C++ (which are great for writing operating system or application code but not so great for writing a quick hack to configure a modem or batch-convert a bunch of image files). Scripting languages are generally easy to pick up, and directly useful for jobs someone using a computer might want to accomplish -- such as automating the process of dialing into an ISP account, hunting down and deleting old backup files, retrieving or storing records in a database, and so on.

If you're used to a Mac or Windows environment you've met scripting languages before. Macs use AppleScript to automate some tasks; PC's use old DOS batch files or Visual Basic for Applications. The picture isn't so clear on Linux. One of the UNIX tradition's strengths is that it is a great environment for programmers: instead of one or two scripting systems, there are billions -- all aimed at specific jobs.

The granddaddy of UNIX scripting languages is the Shell. If you've ever typed a command on a box running MS-DOS, you've met a shell; it's called COMMAND.COM, and you can write batch files that feed commands to it for sequential execution. The original UNIX Bourne shell (named after the guy who wrote it) is like COMMAND.COM on steroids. While most of the time you interact with it by just typing commands, /bin/sh is a fairly powerful programming language interpreter specifically designed for automatically controlling the execution of other programs. Shell programming (as the art is known) looks a bit weird if you're used to a dialect of Basic; the syntax is related to -- but distinct from -- C, and adds features based on pattern-matching, variable substitution, and the ability to redirect streams of data feeding into and coming out of programs so that the output of one program can be fed through a complex pipeline of operations.

If you're using a common Linux distribution, you're probably typing your commands into Bash, the Bourne-Again Shell. Bash is an extended Bourne shell clone; it adds some extra functions (introduced in the Korn shell, developed at AT&T's Bell Labs by David Korn in the eighties) such as job control, aliases, command line editing and recall, and more functions than I can remember. There's a whole family tree of UNIX shells, including the C shell (so called because its syntax resembles that of C, to the point of it not supporting some of the useful i/o redirection capabilities of the Bourne shell), the TC shell (C shell done right), Z shell (a colision between C shell and Korn shell, with added bells and whistles), and -- no, I can't go on. The important point to remember is that, almost by definition, all Linux systems have a program called /bin/sh which is capable of interpreting Bourne shell scripts. Learn the sub-dialect of shell scripting that /bin/sh talks, and you can write scripts that should run on almost any UNIX system.

The Bourne shell is all very well for automating simple tasks, but it has some deficiencies. It has no sensible built-in way of editing files, it can't create graphical user interfaces, and it can't engage in a dialogue with some other interactive program or a device like a modem. To handle each of these tasks, we need some kind of auxilliary language. Linux gives us a whole bundle of special-purpose languages, designed mostly to be embedded within shell scripts rather than run as standalone programming environments. For example, there's sed, the streaming text editor: sed isn't a notepad application, but a tool for iteratively applying some sort of complex edit operation to lines of text as they stream through it -- typically on their way from one program to another. Then there's awk. Awk is a weird beast -- a pattern-matching language that scans its input stream for patterns and executes appropriate operations when it spots one it knows about. Between them, sed and awk are power tools for manipulating unstructured textual information -- while they don't know about complex document formats such as HTML or XML, they're great at slicing and dicing ordinary text. (If you need to do this sort of thing, O'Reilly and Associates publish a couple of books for you: "Sed and Awk", and "Learning the Bash Shell".)

(As an aside, the pattern-matching I mentioned is based on regular expressions. Regular expressions are like the wildcards you use for specifying filenames under DOS, but far more powerful. They constitute a mini-language (with multiple sub-dialects) specifically for searching for patterns, and you find they crop up everywhere; the main Linux text editors (vi and GNU emacs) use them extensively.)

Shell programming has some drawbacks. For starters, the shells aren't so hot on security: in the multi-user, internet-connected world of Linux, it's devastatingly easy to write a neat-o utility that renders your system horribly vulnerable to hacking. For seconds, the shells aren't very hot on performance; they're slow interpreted languages.

For serious task automation, there are (currently) three languages to be aware of: Perl, Tcl, and Python.

Perl has been described as the duct tape of the internet, for good reason; it's one of the most commonly-used tools for constructing web sites, to say nothing of larger scale projects. It's name is an acroynm for Programmable Extraction and Report Language -- the management-friendly euphemism for what is really the UNIX swiss army chainsaw. Perl is a big, complex semi-compiled language with hooks for interfacing to relational database engines and GUI creation toolkits. If you've worked with C or C++ or the shells it should look passingly familiar; this isn't an accident. Perl supports object-oriented programming, and some very large applications get written in it -- as well as short one-liners for automating simple tasks. It'd take this entire column just to summarize Perl's features, so I'll leave it at this for now. Just go to www.perl.com if you need more details.

(As an aside, Perl seems currently to be attracting a bit of attention from the AI community. This isn't surprising to anyone who knows Perl -- it sometimes exhibits an almost eerie degree of intelligence.)

Tcl is a different type of scripting language. Unlike Perl, which tends to run as a stand-alone application, Tcl was originally designed for embedding in other programs -- a drop-in macro language. Syntactically it's simpler and more regular than Perl, relying for its power on extensions supplied by the program it's embedded in. One point of note is that Tcl is the original language for Tk, the graphical interface toolkit. Tk is accessible from inside Perl and Python as well; it's about the easiest way of creating GUI applications for the X11 windowing system. Such applications also work on MacOS and Windows: Tcl/Tk can therefore be used as a high-level tool for writing graphical wrappers for other programs, as well as a drop-in scripting language. There's a lot of Tcl/Tk information at Scriptics; in particular, a list of applications written in Tcl/Tk (such as the Exmh mail reader).

One particularly noteworthy extension to Tcl is expect; expect is a tool that adds commands for holding interactive dialogues with other commands. This is more important than it may seem at first; if you want to write a program to configure a device like a modem you'll need to send commands to it and read its reply, and doing it via expect is far easier than doing it via a more conventional tool. Remember, hardware devices look like files to a Linux application -- but unlike a file, the stuff you read from them varies depending on context.

Python -- named after Monty Python -- falls somewhere in-between. It's a big, complex object-oriented language like Perl, albeit of somewhat more regular syntax. It's also embeddable; if you want to write macros for the KOffice, Python is the main office automation language. The control panel for Red Hat Linux is written in Python, as are various other programs -- web browsers, development tools, mail clients, and so on. Python is a bit newer than Tcl or Perl: expect to see it cropping up elsewhere. (Oh, and did I mention python.org?)

Choosing a Distribution, Part Three

I've discussed packaging issues and the main RPM based distributions in earlier columns. In a future article I'll talk about non-RPM based distributions -- in particular, Debian Linux. In the meantime, there's another aspect of Linux distributions you may want to know about.

Distributions that use a package manager are prone to a particular kind of controversy: what belongs in a distribution, and what doesn't? Official Red Hat Linux, for example, includes a couple of commercial packages (such as the Bru backup system) which you don't get in the free Red Hat distribution. (The rationale is that your commercial Red Hat license includes discounted commercial licenses for Bru.) On the other hand, US-based distributions tend to lack many cryptographic tools -- such as ssh and pgp -- which it is illegal to export from the USA, but which are pretty useful in a networked environment.

One side-effect of this is the creation of metadistributions. A meta- distribution is a kit that extends one of the existing distributions (typically Red Hat) by adding extra facilities which, in the view of the people who created the metadistribution, should have been there from the start. In general, metadistributions are marginal: created by enthusiasts or small businesses looking to put some "value added" on their CDROMs (and coincidentally boost their price from two or three pounds to ten or fifteen pounds).

Possibly the oldest of the metadistributions is Mandrake Linux. This is a souped-up Free Red Hat distribution. It layers KDE, WINE (the WINdows Emulator), and a selection of small-office type software (such as the KLyX and Maxwell word processors, Ksiag spreadsheet, and so on) on top of stock Red Hat Linux. The main advantage of Mandrake is convenience -- KDE is slick, works well, and makes a Linux system look easy to get to grips with. Having it in the default installation kit is a win.

Another metadistro to look for is Definite Linux. This one originates in the UK. It's another Red Hat based metadistribution, and looks quite promising: in addition to KDE 1.1 and Gnome, it has now moved on to the 2.2 kernel series (with appropriate patches to let the rest of the system utilities cope with it), plus those useful cryptographic tools the US government doesn't want us to get our hands on. A snip at ten quid.

[ Site Index] [ Linux Index] [ Feedback ]