Programming Languages Considered Harmful


[ Site Index] [ Linux Index] [ Feedback ]


If you've ever had anything to do with computers, at a level deeper than playing Freecell, you've probably tried programming in some form or another. It might just have been dinking with a Word Basic macro, or you might be a Linux kernel hacker -- but all such activities have one thing in common: they involve trying to express your instructions in a formal language.

Most programming languages are designed to make the programmer's job easier. They feature block structures that let you aggregate commonly grouped commands together. They have a rich and expressive range of operators, and they provide a consistent and hopefully easy to understand (for both humans and computers) syntax. But that's not the _only_ way to do it.

Over the years, heroic dissident language theorists -- or just hackers with too much time on their hands -- have periodically attempted to think outside the box. In a few cases they succeeded beyond their wildest dreams, but in some others they have developed ideas which aren't so much outside the box as barking at the moon. For some reason, most of the culprits have of late focussed their efforts on Linux (and other UNIX family operating systems), so I suppose you could call this a Linux feature. In it, we'll take a look at two languages that should probably never have been invented.

INTERCAL -- we name the guilty parties!

INTERCAL is the Plan Nine from Outer Space of programming languages; it's so bad that it has acquired a cult following and outlived many far more useful tools in the programmer's repertoire. In fact, it's old enough that it was first reviewed in Shopper in September 1992, and it was twenty years old back then.

Invented at Princeton University by Don Woods and James Lyon, on the morning of May 26th, 1972, INTERCAL was designed to fit one goal: it is a programming language unlike any other. It's name -- a cunning acronym for "Compiler Language with No Pronounceable Acronym" -- sets out boldly the direction the language is designed in. "It is a well-known and oft-demonstrated fact that a person whose work is incomprehensible is held in high esteem", as the original INTERCAL manual says; and INTERCAL is as nearly incomprehensible as it's possible for a language to be.

The basic data types are the 16-bit integer, and the 32-bit integer. Variable names and types are, well, simple: a spot (.) prefixes a 16-bit variable, and a two-spot (:) prefixes a 32-bit variable. All variables have a numeric name, which is itself a non-negative integer. So .1 and .0001 are both the same 16-bit variable - but .321 and :321 are different first-class entities. In addition, INTERCAL provides arrays of 16-bit or 32-bit values, two binary, and three unary operators -- immensely powerful, of course, because they have to work overtime to make up for the missing extras other effete programming languages like C or Java provide.

To give you a flavour of the language, the binary operators are interleave (also called mingle) and select, represented by a -- oh, that character doesn't appear on my keyboard. But not to worry: in ASCII on UNIX-type systems, we represent interleave with a ¢ symbol and select with a 'squiggle' symbol (~). (As you might have gathered, INTERCAL's approach to character names is unique, too.) The interleave operator takes two 16-bit numbers and returns a 32-bit number by interleaving alternate bits from each integer. The select operator takes from the first operand whichever bits correspond to 1's in the second, and packs these bits to the right in the result. Both operands are padded on the left with zeros if necessary, thus facilitating INTERCAL's extremely exciting implicit typecasting. I'm not even going to dip my toe in the issue of unary operators -- the whole thing is just too mind-numbingly weird to discuss, even in private among consenting adults. There is, however, a vaguely sane-looking assignment operator, <- (which does more or less what you'd expect an equals sign to do in any other language, most of the time).

INTERCAL's flow of control logic is equally innovative. As with languages such as FORTRAN or BASIC, statements may begin with a logical line label enclosed in wax-wane pairs (()). Labels are integers from 1 to 65535, but those from 1000 to 1999 are reserved for the INTERCAL System Library, although you should feel free to use them -- just don't expect a sane result. Most flow of control expressions consist of the NEXT statement;

  DO (4653) NEXT

is basically a subroutine in disguise: the current location is stored in a stack, against a possible subsequent RESUME. Of course, it might not work; the INTERCAL compilers are quite fussy, and it's a good idea to be polite some of the time --

  PLEASE DO (4653) NEXT

(One mustn't forget the PLEASE statement: INTERCAL compilers are notoriously fussy and tend to sulk if you aren't polite enough. A curious omission, in view of this, is the lack of a THANK YOU gerund.)

The other core control-flow statements are FORGET, RESUME, and COME FROM. FORGET (expression) causes INTERCAL to forget about the designated number of entries on the NEXTing stack. So you can turn a DO ... NEXT into an unconditional branch by subsequently calling FORGET. RESUME has the same effect as FORGET, except that execution is returned to the location immediately following the last entry removed from the stack.

INTERCAL doesn't provide explicit access to a stack or heap for dynamic memory allocation; it has a stash (and smokes it regularly). Using the STASH and RETRIEVE statements you can explicitly move data to and from the stash -- sort of essential, because this is the only form of scoping mechanism INTERCAL even winks at.

There are some more esoteric control-flow operators in the INTERCAL-72 language specification. IGNORE causes INTERCAL to ignore a specified variable or array. For example:

  IGNORE .1

Causes all subsequent assignments to the variable .1 to fails silently. (You can annul this with a REMEMBER statement.) Then there's ABSTAIN and REINSTATE. These cause a line label, or a command, to be ignored (as opposed to a variable) or to stop being ignored. For example, after executing:

  ABSTAIN FROM STASHING

The stash no longer works, and:

  PLEASE ABSTAIN FROM (1000)

Causes the INTERCAL runtime to skip label (1000) when executing. REINSTATE reverses the abstention; you are advised never to ABSTAIN FROM REINSTATING.

The final control-flow statement is COME FROM. INTERCAL programmers have been aware for a long time that GOTOs are considered harmful. To deal with the problem, INTERCAL provides the COME FROM. For example, (777) COME FROM (66) causes execution to jump to label (777) as soon as it reaches (66), whatever might be lurking there.

Finally, like any other language, INTERCAL provides comments. In fact, any line that doesn't parse as valid INTERCAL code, or anything on a line after some valid code, is treated as a comment.

This concludes our whistle-stop tour of the core language. So what's new in the exciting world of INTERCAL since Shopper last covered it (in 1992)?

Back in 1992, the state of the art in INTERCAL was Eric Raymond's C-INTERCAL compiler. C-INTERCAL, implemented in C, generates C source code from INTERCAL; you can find it at ESR's INTERCAL page, along with ESR's revised version of the original INTERCAL programmer's manual. Shopper daringly interviewed Mr Open Source himself at great length, concerning his involvement with INTERCAL:

Shopper
What originally sparked your interest in INTERCAL?
ESR
Burnout. In retrospect, I was probably suffering from stress-induced dementia or pre-traumatic shock or something.
Shopper
How bright is INTERCAL's future? Do you see it as being a driving force in the world of language design and compiler architecture?
ESR
Yes indeed. If mainly in driving people away from those fields, screaming.
Shopper
Is it possible to program in INTERCAL and remain sane?
ESR
Since I have carefully avoided writing anything more than relatively trivial compiler test loads, I don't know. It could be that this has done me no harm, or the damage could be masked by the fact that I am already essentially a gibbering maniac (there are certainly enough people who would argue for that proposition).
Shopper
Do you have any advice for novice INTERCAL programmers?
ESR
Yes: flee now, before it's too late.

One of the most useful developments of the 1990's was internationalization support. Normally INTERCAL performs i/o only on numerical values. Classical INTERCAL writes in data formatted as words (such as "ONE THREE FOUR ZERO TWO"), and reads out data formatted as Latin numerals (such as "XIIIMIVCII" -- I think). But the new internationalization subset recognizes other input languages -- Nahuatl, Tagalog, Sanskrit, Basque, Georgian, Volapuk, and Kwakiutl. (Albanian and Klingon were just too hard.) Output, of course, remains obstinately Latin.

Another big development is TriINTERCAL, which is incorporated into the C-INTERCAL system. TriINTERCAL was a necessary, even essential, step in improving the utility of intercal, because too many programmers were finding the boring boolean and bitwise operators easy to get to grips with. TriINTERCAL doesn't bother with bits; it uses variables consisting of trits (ternary digits -- based on base three arithmetic), and provides a rich suite (about five) ternary logical operators. Variables in TriINTERCAL are 10-trit and 20-trit words. While MINGLE and SELECT are pretty much the same as in ordinary INTERCAL, TriINTERCAL's unary operators are based on a three-valued logic where the fundamental logical combinations are AND, OR, and BUT.

Once we have such a powerful abstraction -- from binary to trinary -- why stop there? Indeed, TriINTERCAL doesn't; and C-INTERCAL provides support for bases from 2 to 7. ("We cut off before 8 because octal notation is the smallest base used to facilitate human-to-machine communication, and this seems quite contrary to the basic principle behind INTERCAL", says Eric Raymond.)

If this was all there was to INTERCAL, you could be forgiven for scratching your head and saying, "I think I'll stick to Java, thanks." But there's more -- and it's more useful than you might think ...

In 1997, largely as a result of the coverage INTERCAL received in Shopper (he's a slow reader), an individual who refuses to disclose his name but who will hereafter be referred to as The Lunatic (because that's what he is), implemented an incompatible INTERCAL compiler in Perl. CLC-INTERCAL is a rare beast -- the Lunatic is a recluse and doesn't approve of the internet, or indeed of software releases in general -- but is perhaps the most radical development in the INTERCAL world to date. (And, according to rumour, it has escaped.)

INTERCAL -- the next generation

CLC-INTERCAL originally consisted of two chunks: a compiler and a bytecode interpreter, a bit like the Java or .NET runtime environment. As of the time of writing, it still consists of two halves. However, in Escape 1.-94.-80 (all CLC-INTERCAL version numbers are imaginary, and it isn't released -- it escapes) the compiler has been re-written in native INTERCAL and can generate output in compiled object code rather than another high-level langauge like C. Currently the INTERCAL runtime engine is the only object code target supported, but Pentium machine code is due real soon now.

The distinguishing features of CLC-INTERCAL are many, but first among them is the computed COME FROM statement. A word of explanation: most programmers are aware of the GOTO statement, and the evils of unrestricted jumps within the flow of control of a program. GOTOs have mostly been abolished, so few people these days remember the computed GOTO. A computed GOTO -- which was a vital feature of FORTRAN in the 1950's -- is an expression which, when evaluated, provides a parameter to a GOTO indicating where execution should branch to. Thus, a computed GOTO doesn't simply jump to another label; it jumps to one of a number of possible destinations, depending on the phase of the moon.

A computed COME FROM does the same; it checks to see if a specific state has been reached, whenever the program executes a target label, and if so, it COMES FROM that line. Depending on the relative phases of the moons of Jupiter, with the asteroid belt thrown in for good measure.

As if computed COME FROM isn't a powerful enough perversion, the Lunatic went on to pioneer the CREATE statement. CREATE wasn't in the original INTERCAL manual, probably because it is an abomination that no other programmer would contemplate. CREATE lets you create new syntax on the fly -- the CLC-INTERCAL compiler is implemented in this way, and it is apparently frightening easy for an INTERCAL wizard to whip up a quick compiler for another language, such as COBOL. However, CREATE may be called at runtime; so the compiler has to be able to be called by a running INTERCAL program to recompile itself if the language definition is changed by the program.

"Whenever a statement is compiled, the resulting code is saved. If the statement is retried, and the compiler has not changed, the compilation step can be skipped. Although this speeds up execution, it does not change the semantics of the program, which behaves as if it is interpreted by a self-modifying interpreter. It is also possible to suspend the program's execution by creating a new object file containing all the currently compiled code. This procedure is sometimes used by the built-in optimiser to speed up the next execution of a program," writes the Lunatic, in an uncanny moment of lucidity. Bolted on top of this preposterous 'just-too-late' compiler are some additional language features -- CONVERT, SWAP, and DESTROY (which destroys language features). By turning CLC-INTERCAL into a mutation engine, the Lunatic has turned it into a laboratory for some of the most sick and twisted concepts in the field of programming languages.

For example, there's CONVERT. CONVERT statement_1 statement_2 causes at runtime the meaning of statement_1 to be changed into the meaning of statement_2. So you can say things like:

 DO CONVERT ABSTAIN FROM LABEL TO COME FROM LABEL

Thus providing a handy new flow of control construct.

One of the less weird aspects of CLC-INTERCAL is its support for object-oriented and functional programming. While the pattern-matching engine (based on a new abstraction called Regular Grimaces, in place of the more common regular expressions used by languages like Perl) is not yet ready, CLC-INTERCAL supports classes, members of which are referred to as students -- you enroll a student in a class, force it to study, and when you're done with it you allow the entity to graduate.

Some modern languages provide polymorphism and operator overloading, as aids in object-oriented programming; you can overload the operator '+', for example, so that in addition to doing the right thing when you try to add two integers or floating point numbers together (that's polymorphism) it does the right thing when you try to add two invisible pink elephants together (that's operator overloading). CLC-INTERCAL introduces the innovative principle of operand overloading. You can, for example, specify that the value of pi is 3. (That'll come as a relief to biblical literalists and fans of Reimann geometry alike.)

A much weirder aspect of CLC-INTERCAL is Quantum INTERCAL. INTERCAL, by virtue of being unlike any other language, has accidentally stumbled across a set of object semantics that are a perfect match for the (as yet unconstructed) world of quantum computing. Contemplate, for a moment, the following code example:

        DO .5 <- #5
        DO .1 <- #1
        DO .2 <- #2
        PLEASE IGNORE .1 WHILE REMEMBERING IT
        DO NOT REMEMBER .1 WHILE IGNORING IT
(1)     DO .1 <- #0
        DO COME FROM (4)
(5)     PLEASE COME FROM .5
        DO .5 <- #5
        DO WRITE IN .3
(8)     DO .4 <- #2 A¢ "'.3 ~ .3' ~ #1"
(4)     DO .2 <- #0

        PLEASE COME FROM (3)
        DO .2 <- #2
        PLEASE COME FROM .1
        DO .5 <- #0
(2)     DO COME FROM .2

        DON'T WORRY THAT I'M REPEATING A LABEL - YOU CAN NOW QUITE
        GET AWAY WITH IT AS LONG AS YOU ARE VERY CAREFUL ABOUT WHAT
        EVIDENCE YOU LEAVE AROUND. SO, WITHOUT FURTHER DELAY, HERE
        FOLLOWS THE REPEATED LABEL:
(8)     PLEASE DO NOT PRODUCE AN ERROR

(3)     DO READ OUT .3

If this chunk of INTERCAL code was a cat confined in a box with a geiger counter, a radioactive source, and a vial of cyanide contrived to shatter if an atom in the source decays, it would at least be able to die. The INTERCAL code can't do that; so instead it exists in a superposition of quantum states. In fact, merely reading this listing and trying to understand it is likely to leave the programmer in some doubt as to whether or not they really exist.

(Note that if a real computer comes along that can store data in qubits, or quantum bits, and if CLC-INTERCAL is ported to it, then it should be possible to do a number of very unlikely things very easily. For example, finding the prime factors of a long number -- an essential step in cracking RSA encryption -- can be coded in about ten lines of Quantum-INTERCAL and will execute in linear time.)

The CLC-INTERCAL system has, of course, played fast and loose with the core language; therefore the compiler bends over backwards in the interests of compatibility. When you run sick (the compiler) you get a choice of back-end compilers, and the notorious CLC-INTERCAL optimizer. There is a graphical front-end (on systems that support the X11 windowing system) and a source-level debugger; there are options to read source code in ASCII, EBCDIC, Baudot code, or Hollerith punched card format, and options to specify the compiler module to use (because CLC-INTERCAL has modules that emulate both C-INTERCAL, and the original Princeton compiler).

Befunge -- in search of a new direction

"The Befunge programming language was created in 1993 by Chris Pressey for the purpose of being original, entertaining, and hard-to-compile", starts the official description of Befunge.

Most programming languages have a program counter that keeps track of the current instruction being executed in the program; flow of control constructs allow the programmer to move the program counter up or down (jumping over or repeatedly executing blocks of code). Befunge is different: the program is stored in a two-dimensional grid, and execution can proceed in any direction -- up, down, left, or right.

The multidimensional nature of Befunge means that control flow can be easily visualized as travelling "through" the program. Many Befunge-language debuggers have been written which demonstrate this by animating the program as it is executed.

Befunge is of course fully self-modifying; no language-level distinction is made between code and data in storage space.

The original Befunge-93 language implementation was developed by a small but dedicated user base; the result was Befunge-96 (with concurrency), and then Befunge-97, which added a batch of new features. By 1998, a new language was born: Funge-98, which took Befunge out of two dimensions and into Hilbert space. Development of Befunge-100 (presumably the victim of an un-patched Y2K bug) allegedly continues.

There exist several different Funge-98 implementations: you can find links to them here -- possibly the most frightening is Paul Mahon's JavaScript implementation. There are also several derived languages in the Befunge family, including Wierd and Blank; these are just too strange to discuss in a wholesome family publication like Shopper.

So, what does a Befunge program look like? Here's one that prints each verse of '99 bottles of beer', counting down until there's no more beer left:

92+9*                           :. v  <
>v"bottles of beer on the wall"+910<
,:
^_ $                             :.v
            >v"bottles of beer"+910<
            ,:
            ^_ $                     v
>v"Take one down, pass it around"+910<
,:
^_ $                           1-v
                                 :
        >v"bottles of beer"+910.:_          v
        ,:
        ^_ $                          ^
                    >v" no more beer..."+910<
                    ,:
                    ^_ $$ @

I said Befunge was weird, didn't I? Just in case you thought that wasn't very weird, here's a Mandelbrot set generator:

 #v 0#   0# 9# p# 0# 1# 9# p# <              <
< >09g" "-"  "**:884**%:79p39p884**/:69p29p v^
  v9p94p98:/**488p95p99:%**488:**"  "-" "g91<
 v>99**>69g884***79g+::*89g884***99g+:*-"@@"4**/29g884***+39g+:884**%92+9p884*
 < vp97 :g9+92p98/**488p99%**488:++g95***488g94*2/**4"@@"*+g99***488g98p9+19/*
  v>19+ 9g:69p884***+:*"@@"4**/89g884***99g+:*"@@"4**/+"   "2***`#v_v    >
  >    ^                                                     v," "<#> 1-:|
                               >55+,@         v_v#:%*88+1g90$<      ,"#" <
^ <             ,+55    p91   _^#:%*88+1g91p90>#< 09p

You are now clearly aware of the first salient point about Befunge: it is even less readable than INTERCAL, albeit slightly simpler. (Nobody has yet implemented a quantum-Befunge or fractal-Befunge, and even the multi-dimensional Funge derivatives have never been deployed in more than three dimensions.)

The core concept of Befunge is that a program is treated as a torus -- execution can proceed left-right, right-left, bottom-top, or top-bottom. When the program counter hits one edge of the page it appears at the opposite side. All commands are single characters; certain commands change the direction of travel of the program counter. (Old Befunge-93 programs have a maximum of 80x25 total command and data bytes.) Like Forth and Postscript, Befunge is stack-based; there are no run-time variables.

The stack is a standard last-in-first-out one; the digits 0 .. 9 are special Befunge commands that, when executed, push their value onto the stack. A double quote (") enters or leaves string mode -- the ASCII value of any character encountered in string mode is also pushed onto the stack. The basic calculation commands are available ('+' -- addition, '-' -- subtraction, '/' -- integer division, '*' -- multiplication, '%' -- modulo, and '!' -- logical negation). To get a number greater than 9 into a cell on the stack, you've got to do arithmetic; for example, to get 43 on the stack, you start by pushing 8, then 5, then multiply, then 3, then add. While executing left-right, this would look like:

  85*3+

But it might equally well be:

  +3*58

You can pop the stack with a period; this outputs the value as an integer followed by a space. ',' pops the stack as ASCII and doesn't follow it with a space. As should be obvious, popping stuff off the stack is a quick and easy (if not entirely sane) way of writing self-modifying Befunge programs, because you can easily emit a string and then turn a corner or two and execute it.

If you're executing right to left. There are four commands that make the program change direction: '>' (go right), '<' (go left), 'v' (go down), and '^' (go up). The '?' command selects a random direction. Whitespace characters are treated as null commands that do nothing.

There are two different types of 'if' statement, depending on which direction you want to turn in. Each 'if' test pops a value off the stack and checks to see if it is true (non-zero) or false. The result is to change the direction of execution. '_' acts like '<' if the stack value is true, and '>' if it is false. '|' acts like '^' if it is true, and 'v' if it is false.

These are of course the basics of Befunge-93, the simplest usable version of the language. There is now a final specification for the much more complex Funge-98 language. Funge-98 not only generalizes to multiple dimensions; it adds I/O facilities and the terrifying prospect of a system call interface.

Apropos the mention of Hilbert space, this isn't entirely true, technically speaking: Funge-98 operates in Lahey-space. The requirements for a line in Lahey-space are the following: Starting from the origin, no matter what direction you head, you eventually reach the origin. If you go the other way you reach the origin from the other direction.

If you head out on a Lahey-line, you would eventually return along the same Lahey-line, heading in the same direction, but coming from the other direction. This would mean you would eventually repeat your path, and (in Funge terms) have consistant wrapping through unlimited space.

(At this point, most budding Befunge programmers who are not topologists break down and start gibbering. It could be worse; proposals for basing a Funge derivative on the surface of a Klein bottle or hex net have been mooted, but never implemented.)

The final horror is of course, Visual Befunge for Windows. But I've been told that if I write about it, I'll be sectioned. So I leave you with this comforting thought: somewhere out there, even as you read this, a programmer is probably giving birth to a creation more monstrous than either of these languages.

Happy hacking!


[ Site Index] [ Linux Index] [ Feedback ]