Web Architecture Tools


[ Site Index] [ Journalism Index] [ Feedback ]


First you surf the net, then you try to write your own home page. If you're successful, you hang out a sign saying 'Web Designer for hire' and go into business. What comes next?

Most people these days start writing their HTML (hypertext markup language) files using an editor such as Netscape Navigator Gold, Claris Homepage, or Adobe Pagemill. (Real hard cases use a programmers editor like Brief, BBEdit, or vi; lunatics do it using Windows Notepad.)

But writing HTML files and designing graphics are only half the story. There's more to a good web site than lots of pretty pictures and a page of text. Good web sites need content -- and the content needs to be organized. Chances are you'll be juggling five or ten pages at first ... then you add another dozen, suddenly realize that you hate the default navigation buttons you've added at the bottom of every page, and realize that you need to change every single file in your web. At which point you realize there's more to building the web than cool design and interesting content: you need structure, and you need tools to take the grinding labour out of maintaining the structure.

Welcome to the world of web architecture tools: programs that don't help you write HTML, but enable you to maintain large quantities of it efficiently. In this article I'll discuss the requirements for a web architecture tool, explain what it does, and then cover four of the best. Surprisingly, they're all free and they're all available over the web. Even more surprisingly, they're better at what they do than the commercial products currently on offer!

Three types of tool

HTML, the document format used on the world wide web, is a lowest- common-denominator medium. It was originally designed to be totally cross-platform, unlike most proprietary file formats: today, people surfing your personal web site are as likely to be doing it from an IBM mainframe as a PC or Mac. It's easy to assume that because Microsoft and Netscape are the most visible web software companies and Windows (and, to a lesser extent, the Mac) are the commonest small systems on the net that all you need to do is use an authoring package written by one of the giants and you're talking to the world: but it just ain't true.

Getting it right

There are three important things you need to get right in designing a web site -- and I mean a site, with information and content on it that people will want to browse and tell their friends about, not a CV, or a corporate sales brochure for Acme Widgets that only people searching for a widget could possibly be interested in. The three ingredients of a successful web site are visual design and layout, content, and architecture.

Without some degree of design skills your site is going to be unreadable. HTML gives you lots of little tweaks you can use, and as with desktop publishing, it's easy to go over the top and produce something that is an offense to the eyeball. Worse, because HTML is cross-platform, you need to have half an eye on how your design will look on platforms other than the one you use. There are lots of sites out there that say 'Netscape 2.0 Required' on their front page. That's dumb: although about 70% of web users use Netscape, nearly half of them still haven't upgraded to Netscape 2.0 or 3.0. So if you make a site that only Netscape 2.0 users can see, you've cut your potential audience by 65% before you even start.

Content is harder to describe. Without content, there's not going to be anything there that people will bother reading -- sites which are all design and no substance are ten a penny and bore people after the first dozen or so. But it's hard to put your finger on what makes good content. This article is content -- it tells you something, hopefully something you didn't already know. A photo gallery is content; a newspaper is content; a book or a price list or a piece of software to download is content. A list of links to other sites isn't content -- it's a good way of ensuring that surfers bounce off somewhere else in the web as fast as possible.

And finally, there's architecture: the overall structure of the site. Architecture is the most underrated aspect of web design, but it's vitally important to get it right. Get it wrong, and making a single change (like replacing the set of links at the bottom of every page) turns into a nightmare. Get it wrong, and people who find your site won't be able to use it, won't know where they are, and won't stay around to find out. Surfing the web isn't like watching TV, it's like browsing the shelves in an infinitely large library. You don't have a captive audience; if you want to keep them, you need to be nice to them, and a good way to do that is by making your site easy to use.

Two types of software are generally used to deal with the three different ingredients of a good website: design tools and architecture tools. (Nobody's come up with a content tool yet: barring the development of a working artificial intelligence, I doubt they will.)

Design tools include image manipulation programs like Photoshop or Corel Draw, and HTML layout tools like Claris Homepage, Adobe PageMill, and Netscape Navigator Gold. (At a pinch you could also rope programmers editors capable of producing HTML into this category.) Using these tools, you produce web pages.

Web architecture is basically about the design of a web, a collection of hypertext documents, and the links between them. You can do all the layout you like in Netscape Navigator Gold's built-in HTML editor, or in Microsoft FrontPage, but that won't help you make a collection of twenty documents navigable. Nor will they help you make those global changes you urgently need -- say, by changing the background colour in every file on your site.

The purpose of a web architecture tool is to let you edit a web site as a whole: to make global changes and restructurings, to test for broken links between documents, to define which documents should link to each other, and to ensure that the site as a whole makes sense.

If you think it's bad changing the menu of links at the bottom of a dozen pages, spare a thought for the people who maintain Hantsweb, Hampshire County Council's web server. With more than a quarter of a million HTML files, sites like Hantsweb simply can't be maintained by hand. Long before you reach that size, specialist web architecture tools stop being an optional extra and become a necessity.

Tools for architects

Where do the best tools come from?

The internet community existed before the commercial software houses -- people like Adobe or Microsoft -- noticed the web. Lots of software is developed by academic institutions in the US; if they receive money from some government grants, the software has to be made available in the public domain. (Hypertext has been a rich research field for some years now.)

Then there are a number of commercial internet service providers who sell space on their web servers. Some of the biggest, such as GNN (now absorbed by AOL, America On-Line), set out to write software that would make life easier for their users; by giving it away, they gained publicity (and customers for their primary service).

Finally there are the hackers (in the sense of master programmers, not geeks with modems). Some of them, already experienced at writing programming environments, realized that with a bit of adaptation the tools they used for maintaining software could be converted into HTML architecture tools. A web is, after all, a tree of files -- like the source code of a program -- and a useful feature of any development environment (like Borland Delphi, for example), above and beyond the compiler built into it, is the ability to manipulate trees of files.

Each of these categories of user has given rise to software that can be used for managing and maintaining entire webs. They're not always as polished and cute looking as something from the Microsoft stable, but they're robust, rugged, and powerful -- they don't hide you from the complexities, but neither do they stop you doing things that the designers of the mass-market web creation tools never thought of.

AOLPress

At first glance AOLPress looks to be just another web editor, like Navigator Gold; but looks are deceptive. It's packed with additional features, including one hugely valuable one -- weblets -- which make it much easier to maintain a small to medium website.

AOL -- America Online -- bought AOLPress when they acquired GNN (a small ISP and web service provider). GNN were giving away a kit consisting of the GNNServer web server and GNNPress authoring tool, partly as an altruistic gesture and partly as a way of gaining publicity and funds. (You used to be able to register for support -- if you forked over some money.) GNNPress is still being pushed by AOL (subject to the name change) and is available for several platforms: Macintosh, Windows, Windows 95, and a variety of UNIXes (including SunOS and Solaris). You can get a free copy from www.tools.gnn.com.

In use, AOLPress looks like a web browser. This isn't a coincidence; AOLPress is actually a fully-functional browser that supports most HTML 3.2 features (not frames), although it feels subjectively slow when compared to rivals. The magic is invisible until you start typing and discover that when you're looking at a web document, you're able to edit and save it.

But the deep wizardry is hidden beneath an otherwise-unremarkable looking 'Tools' menu. AOLPress has options to 'check links' -- to ensure that the links in your document all point somewhere sensible, so that your readers won't get those annoying '404 - not found' errors. And AOLPress has the ability to 'webize' a directory or tree of directories, generating an iconic map of your website. Each file or directory is represented by an icon and links are shown as arrows; broken links show up visibly, and you can get a visual 'feel' for how your web is connected. Double-click on an icon and you're in the editor again, able to change the file.

AOLPress in HTML edit mode

AOLPress has lots of other whizzy features to help you coordinate your web site. It supports styles: not the Cascading Style Sheet system some newer web browsers are using to determine the look and feel of HTML entities, but styles that determine HTML elements in your documents. And it has a formidable array of options for presenting a visual map of your web.

AOLPress in weblet edit mode

AOLPress isn't suitable for maintaining huge websites (those with over 250 files). Graphical representations of a web fall over when you feed them too many files; even if you reduce them to icons, the hundreds of arrows render any display unreadable. But for small to medium jobs it provides an efficient authoring environment with tools for imposing a standard presentation across the site, and with facilities for checking links and ensuring that everything is properly bound together.

Userland Frontier

Userland Frontier started life on the Macintosh, as a scripting language that could be used to automate tasks in other applications. Today, it's going cross-platform: a Windows 95 version is due out by the end of 1996, and there are rumours of a possible UNIX port.

Originally a commercial product, Frontier's creator Dave Winer made it freely available for public downloading in late 1995 -- while continuing to develop it. (You can get it from www.scripting.com among other places.) Apple had began distributing their AppleScript scripting system with new Macs, and the market for a Mac-only scripting language looked bleak. However, Frontier rapidly gained a following in the Macintosh web developer community, and is mutating into something totally new: an object-oriented programming-language-cum-database-cum-outline-processor that does web publishing and web architecture. Its own language, UserTalk, is somewhat C-like and object oriented; it's designed for automating tasks in other applications.

Trying to describe Frontier is like getting to grips with an Octopus. One thing it isn't is a web browser -- but if you're running Netscape Navigator or Microsoft Internet Explorer, extra 'Frontier' menus automagically appear, and Frontier can drive the web browser. Frontier is based around a database system; items in the database are objects that can contain UserTalk programs, outlines, images, numbers or strings, and even word processor documents. (A future version may also be able to store Java applets and programs in other languages.) It's a kind of swiss army knife for the Macintosh or Windows 95 system.

Userland Frontier

Frontier's use as a web architecture tool springs from Dave Winer's desire to run his own web site from within Frontier. The main Mac web server, WebStar, was compatible with Frontier; he wrote a set of UserTalk scripts that enabled him to store files in the Frontier object database and serve them direct to the web server. This database-based publishing method has a lot of advantages. For example, each file in the object database has a unique name; you can store a glossary of document titles and object names in another table, and have requests for documents redirected to the appropriate database slot. Or you can 'export' a tree of HTML files, interpolating information from the database as you go along. Frontier has hooks for interoperating with Microsoft Office products and other applications like Filemaker Pro; using it, you can glue your desktop applications into a coherent web, or generate HTML using macros written in UserTalk.

Macros bear some explanation. Frontier's UserTalk programming language has full access to the Frontier object database. You can use programs written in UserTalk to do things like traverse a sub-tree of the database, exporting each stored document as an external HTML file, and automatically including links to its parent document or subtree. Fragments of UserTalk code associated with a document can be executed at that time, so that, for example, today's date or some other useful data can be inserted into the document. Or you can use it as a web-server bolt-on: Frontier is multi-threaded and fast, and makes a good environment for writing CGI scripts (if you have a Macintosh or Windows 95 based server).

Don't expect to use Frontier to build your website just by pushing a button; it doesn't work that way. It's a powerful programmable back-end, and if you want to maintain anything from a few dozen to several thousand documents it's probably a good solution; but you'll need to roll your sleeves up and do some script-writing or editing, unless you're happy to have a website that looks just like Dave's. On the other hand, writing Frontier scripts can be fun, and tends to be addictive: once you get started, it's hard to stop.

Meta-HTML

Meta-HTML is the most powerful web architecture tool covered in this review, and (predictably) the hardest to use.

Written by Brian Fox, author of the popular public-domain BASH shell for UNIX, Meta-HTML is basically a package that extends HTML, by giving it many of the features of a full-blown programming language. Again, like the other tools in this review it's free -- you can find it at www.metahtml.com -- though there is a commercial, fully-supported version. MetaHTML is a bit less portable than AOLPress; it's primarily designed to run on UNIX servers, though a Macintosh version is available (and in principle it should be possible to compile it on Windows NT).

HTML, considered as a language, is basically inactive: it has lots of output control commands, but no flow-of-control, no variables or operators, and none of the other baggage that goes with a programming environment. The idea of MetaHTML is to expand HTML into a full-blown scripting language. MetaHTML scripts can then be executed in a variety of ways -- via a CGI script running on a web server, via a web server that 'understands' MetaHTML directly, or via a standalone interpreter. The end result of executing a MetaHTML program is to generate HTML that a user's browser can read.

The simplest use of MetaHTML is to include other HTML documents or variables in your HTML files. For example, suppose you have a site where you want to enforce a standard look-and-feel, by adding a navigation bar to the bottom of all your pages. MetaHTML makes this easy: all your files end with:


[ Comments ] [ Site index ] [ Nonfiction index ] [ Journalism index ]


Where footer.mhtml is the name of a file containing your navigation bar, images, imagemap, or whatever. When you run the files through the standalone MetaHTML interpreter, or the CGI script, it spits out HTML files where the tag is replaced by the file it points to.

You can go further, of course. MetaHTML lets you define variables with the tag, and interpolate them into your files using tags that look like . Doesn't sound like much? MetaHTML also lets you do arithmetic or string operations on variables, provides a range of flow-of-control operations, and even lets you call external code libraries to do things like link to a SQL database or delete and rename files. MetaHTML basically turns your HTML documents into programs.

(There is a web server facility called server-side includes that lets you do many of these things if it's turned on, but MetaHTML is faster, more secure, and has lots of additional features. Plus, server-side includes are routinely disabled on most commercial web servers for security reasons. Whereas, even if your ISP doesn't provide MetaHTML, you can use the standalone pre-processor on your documents before you upload them.)

By using MetaHTML instead of plain old HTML, you can standardize the shared elements of all your documents -- things like headers, footers, toolbars, paragraph formatting, and so on. Whether you serve MetaHTML files via a CGI script, or a tree of HTML files you built earlier by running the processor over a tree of MetaHTML files, it lets you make changes to global document components by editing a single file. And you can implement CGI scripts, shopping-carts, and all manner of other common web applications in MetaHTML.

In summary, MetaHTML offers a whole new way of working. It's quite technical and hard to use -- it's basically a programming language -- but if used properly it lets you abstract web architecture issues from document content. It's suitable for maintaining very large webs indeed, although it may not be worth the effort for small (less than 20 file) sites.

Summary

I've covered three tools. It should be fairly obvious that they're useful for different sizes of site; AOLPress is king for small to medium webs (10-100 files); Frontier is harder to use, but excellent for organizing medium to large webs (20-2000 files); MetaHTML is complex and tricky, but extremely powerful and suitable for huge webs (1000 to 1,000,000 files). As a general rule of thumb, large sites are inevitably harder to maintain than small ones; the largest sites I've worked on had from 25,000 to 250,000 files, and took groups of programmers (not HTML writers!) to keep under control. In the large, web architecture is going to be a growth area, and the larger the web the harder it is to manage. If you want to run a site with more than ten files, you probably need one of these tools. And the time to start learning them is now -- not in six months, when your site has doubled in size.


[ Site Index] [ Journalism Index] [ Feedback ]