More HTML Maintenance Tools

[ Comments ] [ Copyright ] [ Main contents ]

Here are seven short Perl scripts I've written to make life easier for web dudes. They all require Perl 5.002 (or better) and LibWWW-Perl 5.01 (or better). (If you don't have a proper Perl installation, go and look on CPAN.)

The tools are:

  • HTML pretty printer
  • HTML to text
  • HTML postscript printer
  • Burst out PRE segments
  • Build tree of MetaHTML documents
  • (Simple) Put a directory tree via FTP
  • Looking Glass (Complex) Directory tree mirror tool

  • hppp HTML Pretty Printer

    A souped-up, improved version of the tool described in An HTML Prettyprinter; this version is more robust and fixes a couple of minor bugs.

    Purpose: to tidy up ugly or eyeball-unfriendly HTML files.

    One limitation is known; it depends on LibWWW-Perl 5.03's HTML::Parse module for parsing. The parser engine used doesn't have any easy mechanism for ignoring specific tags such as <PRE>; thus, preformatted blocks may be reformatted along with the flow of body text. You can get round this by (a) using precut (below) to burst the preformatted blocks out of the file, (b) running hppp, and (c) running mhc (the MetaHTML macro processor) over the resulting file, to source the preformatted blocks back in.

    (A future prettyprinter may utilize the HTML::TreeParse object- oriented parser skeleton ... once I figure out how to override methods and wrap my head around the deeper OOP wizardry.)

    hppt HTML to Text

    A simple script that uses LWP to invoke the FormatText formatter. It takes an HTML file and produces a plain Latin-1 formatted text stream. (Great for dumping files to dumb terminals or extracting the text from a document).

    hpps HTML to PostScript

    A script that invokes the FormatPS formatter from LWP. It produces formatted PostScript output, and has a large number of options to select attributes such as margins, page size, and typeface. Dump the output from this script straight into a postscript printer, or use ghostscript.

    precut Burst out subdocuments

    precut scans an HTML (or MetaHTML) file and searches for <PRE>..</PRE> tags. Whenever it finds preformatted block, it excises it and writes it in a separate file; the block itself is replaced by a MetaHTML <include> directive. (The original copy of the file is retained with the suffix .bak, for reference.)

    The result is that all the preformatted text blocks in an HTML file are exported to separate files, where they can be maintained more easily; meanwhile, the file is replaced by a MetaHTML file that can be converted back to the original HTML file by running it through metabuild (below) or the MetaHTML processor mhc.

    The purpose of this is to make it easier to maintain HTML documents that describe pieces of Perl source code; the Perl programs can be maintained as separate <include> documents and interpolated into the HTML document at build time.

    metabuild Recursively build a tree of MetaHTML documents

    MetaHTML is a powerful macro-based HTML generator tool (described in a separate essay on this site). You can use it via a CGI script, or as a standalone web server, but for best performance and maximum portability you use it as a document authoring language and preprocess MetaHTML files into HTML before uploading them to your website.

    metabuild takes a directory as its target. It recursively descends into all subdirectories. Files not ending in .mhtml and not named in a stop list (such as .gif images) are copied into a parallel directory tree (and new subdirectories are created where necessary to store them); then all MetaHTML files are piped through the mhc macro processor and deposited in the parallel build tree. Thus, you run metabuild at the top of a directory tree full of MetaHTML, and it builds a parallel directory tree full of processed HTML documents.

    (This is an early cut at a set of development tools I'm working on in Perl for maintaining document trees. MetaHTML is optional to some extent, as I'm mostly using it for macro interpolation rather than its programmatic facilities; it could equally well we replaced by the m4 macro processor, except that it's a bit more "HTML-like" (and versatile, in its CGI or server modes).

    metaput Recursively send a tree to an ftp server

    If you maintain webs on remote servers, you may well need to export an entire tree of documents at a swoop. Exporting via ftp is a pain; the normal ftp(1) client can't do recursive uploads (creating subdirectories along the way), and some more complex clients (like ncftp) run into problems with some servers.

    metaput scans a target directory and uploads everything in it to a destination directory on a specified ftp server, using a username and password you supply. It doesn't traverse symbolic links, but it can send subdirectory trees, and it does everything in binary more (to ensure that your GIFs arrive intact).

    Note that before it conducts an upload session, metaput deletes everything in the target directory. This is to ensure that no inconsistencies exist between your source tree and the destination. It's a brute force approach, but as part of a suite of tools designed for maintaining a remote website it's reasonably effective.

    metaput is a tiny functional mirror-image of, which fetches a directory tree from a remote server.

    Looking Glass FTP mirror utility

    This is metaput's big sister; a full-blown active file mirroring utility. Too many features to describe here; let's just say that it doesn't suffer from metaput's problem about inconsistency checking.

    Looking Glass is currently slightly behind in features, but makes up for it by doing what it does far better than can. Only major limitations are: it can only run on and talk to UNIX or Linux systems, it doesn't handle symbolic links, its configuration file doesn't yet support multiple packages, and (unlike mirror) it doesn't switch to using dbm files automagically if it finds itself processing a whole CD-ROM's worth of files.

    None of these scripts will save the world, protect the ozone layer, or defend the purity of our precious bodily fluids -- but they're all utilities that I've found make life a little easier. Stick them in /usr/local/bin and enjoy.

    [ Comments ] [ Copyright ] [ Main contents ]