Linux Format #34 [[ Typographical notes: indented text is program listing text surrounded in _underscores_ is italicised/emphasised ]] Perl tutorial ///TITLE: Parrot Soup ///STRAP: Charlie Stross investigates the plumage of Parrot, the bytecode interpreter behind the forthcoming Perl 6 ///SUBTITLE: Pining for the Fjords Perl 5 came out in 1994, or thereabouts. Perl 6 isn't out yet, but work on the language began in 2000 -- it's having a long gestation. Some concrete material has, however, emerged. First, a lengthy public consultation period took place; then Larry Wall and his helpers began to emit a series of Apocalypses -- design documents covering different aspects of the Perl 6 functional requirements. A lot has changed since Perl 5's heyday. Newer, cleaner languages have come along: Python, in particular, has gained a considerable following for its elegant modularity, and nobody can be unaware of the existence of Java and C# (although the latter is primarily a Microsoft toy, and the former is aimed more at application development than Perl was, initially). Meanwhile, Perl 5's weaknesses have become apparent. It's not impossible to do large-scale application development in Perl 5, but it _is_ hard to maintain consistency between the work of novices and Perl 5 experts, and good development practice is hard to establish. Perl 6 is therefore a work in progress that is intended to address the weaknesses of Perl 5 -- modularity, maintainability, extensibility, and suitability for large-scale application development. Perl 6 will be mostly backward-compatible: the target is for 90% of Perl 5 programs to run without changes, and for 90% of the remainder to work with only minor changes. But it's going to be a whole new language, written from scratch, and the first part of it to emerge in the form of code is already available: Parrot. ///SUBTITLE: Pretty Polly Parrot takes its name from an April Fool's joke by Simon Cozens. Simon announced a merger between Perl and Python, orchestrated by Larry Wall and Guido von Rossum, who were going to collaborate on a new language called Parrot. As it happens, there's a core of truth in the joke. Both Perl and Python are hybrid languages that contain both a compiler and an interpreter. The compiler generates a parse tree which the interpreter then executes. Unlike a purely interpretative language, this permits the benefits of compilation (notably various optimizations) while keeping the benefits of interpretation (easy debugging). Perl 5's structure is a bit of a mess internally, so one of the goals of the design exercise was to separate the compiler from the bytecode interpreter. At this point, Cozens and friends had a brain-wave: if they were going to build a bytecode interpreter (like the Java Virtual Machine) why not do it properly? Why not write one that could run languages other than Perl -- like Python? Or Ruby, Tcl, and anything else anyone felt like writing a compiler for? And so, to Parrot: Parrot is a bytecode interpreter optimized for running high-level languages. It'll be the back-end of Perl -- but there will be other compilers for it as well. Indeed, given the Perl language's innate suitability to pattern-matching and compilation tasks, Perl 6 will be a toolbox for designing and building application domain-specific languages. And it all hinges on Parrot. Parrot is implemented in C -- for maximum portability. (C is about the only language that is available everywhere and sufficiently fast.) It's capable of interpreting several million opcodes per second on a typical PC. However, it also has a JIT (Just In Time) subsystem which permits compiled bytecode to be translated to native machine code and executed directly (only on Intel x86 or Alpha CPUs running Linux or BSD, right now). Unlike the Java virtual machine, Parrot is a register machine, that recognizes four basic data types: INTVAL (an integer type guaranteed wide enough to hold a pointer), FLOATVAL (architecture-independent floating point), STRING, and PMC (a low-level abstract scalar data type: Parrot Magic Cookie). Each data type has 32 associated registers, for example I0 .. I31 (INTEGER registers 0 to 31). The core Parrot assembly language is simplistic but has some interesting opcodes you don't normally see on real microprocessors -- such as string operators. Despite being a register machine, Parrot has a stack and permits you to push and pop frames of registers on a stack -- segregated by register type (so that, for example, you can push all the PMC and FLOATVAL registers without touching the INTVAL and STRING registers). There's also a scratchpad mechanism for keeping track of the name, type, and attributes of variables (which might be needed by a higher level language for checking constraints), and exception handling mechanisms. It's no surprise to find that Parrot provides support for handling objects and compiled bytecode libraries (modules) and shared libraries. More interestingly, there's low-level support for obtaining locks on objects and registers and for spawning new interpreters and for resynchronising threads: Parrot is designed to support multi-threading. You can find all the gory details in the (mercifully short and painless) "parrot_assembly.pod" file that comes in the docs subdirectory of the Parrot distribution. One point to note: Parrot's data types aren't polymorphic. On the other hand, they don't support the intricacies of variable type behaviour you'd expect in Perl or Python. Policy on type conversion needs to be specified by the compiler -- or established by adding new data types. Parrot's mechanism to do this is to define a new object class with appropriate methods, then pass around pointers to objects in this class using the PMC registers and opcodes. Tools are provided that allow you to generate a template for your new class, which then needs filling in and compiling as an extension to Parrot. In this way, Parrot can support scalar types for Python and Perl that have incompatible behaviours. ///SUBHEADING: So what's it good for? You can grab Parrot (current release: 0.0.7) from CPAN and compile it using the GNU utilities -- if you're unused to this, extract the contents of the tar.gz archive and read the file README inside it. Once you've built Parrot, you can run the regression tests to verify that it functions as advertised, then read the documentation in the docs subdirectory. Most of the really interesting stuff is buried in a subdirectory called languages. This is where the compilers live -- programs written in Parrot assembly language that compile other languages down to Parrot code. For a squeaky-clean new system, Parrot has already spawned a lot of compilers. Parrot is an assembly language, but an abstract one -- optimized for the job of writing compilers and interpreting the code they spit out, rather than optimized for the job of running on physical silicon. A whole bunch of small mini-languages have been implemented in Parrot so far, for testing purposes. First in the list is probably the Parrot compilation system itself -- the Parrot compiler will compile any Parrot program written with fully qualified opcode names. There are also a couple of toy compilers, Cola and Jako; Cola is a smallish C-like or Java-like language that's under development, while Jako is another very simple language with just enough complexity to implement while loops. At a higher level, small compilers for Forth, Scheme, and BASIC (admittedly an old-fashioned line-numbered BASIC compiler) are provided -- if you want proof of concept of Parrot's suitability for compilation, the startling number of compilers that have landed on top of what is, after all, only release 0.07 is remarkable! And finally, there are the beginnings of the prototype implementation of Perl 6 itself. There are a couple of different components to Perl 6. One is the regex compiler. Regular expressions are a mini-language in and of themselves; the proposed Perl 6 extended regular expressions are actually rather more powerful than the basic pattern-matching expression language, and Perl 6 Apocalypse 5 adds a whole slew of new facilities, such as named rules and facilities for defining grammars. To implement regular expressions in Perl 6, a separate regex compiler and mini-language is provided, and the core of this is provided in parrot/languages/regex -- a regular expression compiler. There's also the separate Perl 6 compiler itself -- or the sandbox in which it is to be developed -- and a miniperl compiler (which builds a minimal Perl core without most of the language extensions we use, but sufficient to help bootstrap the Perl build process). It needs to be stressed that as of this release, none of these languages are much use. But as Benjamin Franklin commented, "how much use is a baby?" Parrot is going to grow up, and when it does, expect these mini-languages to be joined by full-blown implementations of Perl 6, Python, and possibly Ruby, Java, and Tcl as well. ///SUBHEADING: Parrot says "hello, world!" Here's a simple piece of Parrot code (lifted shamelessly from the manual): ///BEGIN CODE set I1, 10 set N1, 3.1415 set S1 "Hello, Parrot" ///END CODE All we're doing here is inserting values into some registers -- the integer 10 into the I1 (integer 1) register, the float 3.1415 into N1 (number 1) register, and the string "Hello, Parrot" into the S1 string register. We can print like this: ///BEGIN CODE print "Register S1 contains: " print S1 print "\n" ///END CODE And we can loop like this: ///BEGIN CODE set I1, 1 set I2, 10 REDO: print "Hello " print I1 print "\n" inc I1 le I1, I2, REDO DONE: end ///END CODE Which should (your columnist's rusty assembly language skills notwithstanding) print "Hello \n" repeatedly for values of from 1 to 10. And it's a testimony to Parrot's easy learning code that indeed it worked first time. It's functionally equivalent to the following Perl code: ///BEGIN CODE $a = 1; $b = 10; do { print "Hello $a\n"; $a++; } while ($a <= $b); ///END CODE Compare this to the fun you'd have with a real microprocessor's assembly language -- which as a general rule don't come with explicit string data types, so you'd have to define "Hello $a\n" as an array of chars and loop through it for each print statement! Parrot is not only a great platform for Perl 6, it's an easy and rewarding tutorial environment for learning basic assembly language programming. (ENDS) BOXOUT (Parrot futures) This tutorial is based on Parrot 0.0.7, which was the current release in August 2002. Parrot is moving fast, and 0.0.8 came out on September 2nd, followed rapidly by 0.0.8.1 after most of this article was written. Parrot is moving fast; there are a number of significant changes in 0.0.8.1. A number of new language front-ends have found their way into the Parrot distribution, including the foundations of the planned Perl 6 regular expression compiler (which compiles down to Parrot bytecode). The regexp compiler isn't complete; in particular, several major Perl 5 regexp features such as character classes, word boundaries, back references, and look ahead/look behind assertions aren't implemented yet, and some Perl 6 features (mostly regexp operations on arrays and hashes, and weird stuff like two-dimensional regular expressions) aren't there either. The Perl 6 compiler is coming along, and the beginnings of some new compilers have appeared -- Python, Ruby, and (rather surprisingly) Befunge-93. (What next, INTERCAL?) The documentation has been updated a little, and a load of new pmc classes have appeared, including the basics of the boolean and keyed data types. Also a bit boggling are some new tools that have shown up in the languages subdirectory -- specifically, parsers that are designed to translate BNF pseudocode into grammatically correct Perl 6, and BNF grammars for a number of languages. The importance of this toolkit can't be understated if Perl 6 is to reach its goal of becoming a sort of language for creating application domain-specific sublanguages. END BOXOUT (Parrot futures)