Chapter 2: Understanding HTML

Non-standard HTML


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]


So far, this chapter has given a whistle-stop tour of the main elements of HTML 2.0. HTML 2 is the general standard for HTML, released in 1994; it draws together and formalizes the earlier HTML 1.0 and HTML+ standards, and does away with a number of obsolete or inconsistent tags.

At the time of writing (1995) few browsers support a full set of HTML 2.0 tags. The IETF specification lists three levels of conformance; level 0 (mandatory), and levels 1 and 2 (different discretionary levels). Most level 0 tags-the ones in this chapter- are supported universally. But the discretionary features are not universally available yet.

To make matters worse, some software companies have added their own non-standard HTML tags.

The most obvious culprit is Netscape, the browser produced by Netscape Communications Corporation. Netscape currently has 80% of the user base, and thus plays the tune the rest to dance to. (Already TradeWave, inheritors of the EINet web browser, and NCSA, owners of Mosaic, have announced integration of some Netscape HTML features in their next-generation browsers.)

A second culprit is Microsoft, with Microsoft Internet Explorer for Windows 95. However, whereas Netscape have at least submitted their extensions to the World Wide Web Organization for input to the HTML 3 standards process, the additional tags in the Microsoft product are not part of any public standards process.


Netscape HTML extensions

Be warned, however, that there is no guarantee that these tags are compatible with other web browsers! Although Netscape is popular and widely used, documents which use this dialect of HTML may cause unpredictable results when loaded into any other browser. (NCC say the tags are designed not to break other browsers; this is quite correct, as long as the browser in question handles HTML tags correctly, that is, as an application of SGML. Unfortunately not all browsers with this way: some of them are alarmingly ad-hoc in the way they parse input.

The general intent of the Netscape extensions is to provide enhanced formatting control in documents. That is, rather than sticking to the abstract semantic markup model, they tend to roll their sleeves up and muck around with issues like font size and the position of items on the screen. This is really dubious in view of the distinction between information content and rendered document image that is implicit in the whole HTML model, but it makes sense from a pragmatic viewpoint; documents are there to be read, not to conform to some theoretical ideal of well-formed documentdom.

As a rule of thumb, Netscape extensions should be used if you know that your web users will all be using Netscape. If not, you really should consider avoiding them entirely -- or at least, providing an alternative web, stripped of tags that might break some browsers.

They fall into the following categories:

Firstly, Netscape supports inline JPEG images. JPEGs tend to be smaller than GIF format files, but as JPEG is a "lossy" compression medium you probably shouldn't use it if exact graphical reproduction is essential. Furthermore, the lossy compression renders JPEG unsuitable for steganographic fingerprinting purposes, and to cap it all the DCT compression algorithm in JPEG tends to be more computationally heavy than the LZW compression used in GIF87a and GIF89a (meaning it takes longer to display).

Secondly, Netscape can handle interlaced GIF files. Rather than unpacking one bitmap line after another from the top down, as ordinary files do, interlaced files unpack bitmap lines from alternating segments of the image, filling in the gaps. They take no less time to download, but a progressively unpacked interlaced image becomes recognizable long before a normal sequentially-unpacked image would. As Netscape can abort a connection and connect to a new URL while unloading proceeds, this means that users constrained by a modem's bandwidth can see what is happening much faster; it provides a significant illusion of speed. In fact, interlaced GIF files are usable with all web browsers (although many make no special use of the format), and so can be unreservedly recommended.

Thirdly, the <IMG> tag has been hacked, almost out of recognition.

Under HTML 2.0, the IMG tag recognizes very few ALIGN options; just top, bottom, and middle, where the positions apply to the subsequent line of HTML text.

Netscape retains the top, middle, and bottom options. It adds new options: left, right, texttop, absmiddle, baseline, and absbottom:

"left", "right"
produce floating images. The images float to the corresponding side of the window, and text wraps around the side of the window (right or left, respectively).
"texttop", "absmiddle", "baseline"
behave quite similarly to "top", "middle", and "bottom"; the main difference being that they are more precise:
texttop
aligns the graphic with the top of the tallest text in the line (usually the same as that which top would align with)
absmiddle
aligns the middle of the graphic with the middle of the current line (while middle aligns the middle of the graphic with the baseline of the text in the current line)
baseline
aligns the bottom of the graphic with the baseline of the current line of text (as with bottom)
absbottom
aligns the bottom of the graphic with the bottom of the current line of text (that is, with the descenders in the text that dip below the baseline)

"Width" and "Height"

In addition to positioning commands, Netscape can speed up rendering of inline images. The normal process of displaying an image begins when the document is downloaded. The browser scans for included images, then opens sockets to the server and slurps them in. As it does so, it begins to render the page. It cannot make allowances for the size of the images until they are available; at that point, the browser can calculate how much display space to leave for the bitmap.

The WIDTH and HEIGHT options were added to IMG to short-circuit this problem. They take parameters measured in pixels; for example:

<IMG SRC="../gifs/fred.gif" WIDTH="512" HEIGHT="384">

This specifies that fred.gif is 512 pixels wide and 384 pixels high. Netscape therefore knows how much space to leave for fred.gif before it begins to download the inline image -- and can therefore format the surrounding text faster.

<IMG VSPACE=value HSPACE=value>

The VSPACE and HSPACE options control the vertical and horizontal space around the image, to prevent the image from pressing up against the surrounding text. For example,

<IMG SRC="../gifs/fred.gif" WIDTH="512" HEIGHT="384" VSPACE="12">

indicates a gap of twelve pixels should be reserved above and below fred.gif.

Borders

Finally, images are usually surrounded by a border. The BORDER option permits the author to specify the thickness of the border around the image, in pixels. Note that images contained in cross-reference anchors are usually surrounded by a border, so setting BORDER="0" in such an image might confuse your users. For example:

<HREF="http://localhost/foo.html"><IMG SRC="foo.gif" BORDER="0"></A>

will display a graphic (foo.gif) without a border, so that it does not appear to be a clickable anchor.


Design elements

Netscape provides a number of extensions and modifications that affect the actual layout and design of a web page. These enhance the visual appearance of horizontal rules, add positioning control, and add rudimentary font management commands to HTML. (The latter is a particularly suspicious issue insofar as it breaks the ability of the user to determine how the web documents are displayed; designs based on extensive font changes are probably inappropriate for general use.) In addition to control over font size, Netscape permits font attributes to be applied cumulatively. For example, <I><B>italic bold</B>text</I> can be rendered properly. Arguably, this is a good idea. (The only problem is that it is non-standard, so text formatted for display with Netscape may appear different on other browsers.)

The <HR> tag specifies a horizontal rule across the page. Netscape adds four optional paramenters:

<HR SIZE="value">
Specifies how thick the horizontal rule should be. (value is an integer that indicates the rule thickness in pixels.)
<HR WIDTH="value">
Specifies how wide the rule should be. value can be a percentage (of the window width), or an absolute value in pixels. By default, rules are as wide as the window in which they are displayed.
<HR ALIGN="value">
Where value is one of left, right, or center, specifies whether the rule should be centered, right aligned, or left aligned relative to the window.
<HR NOSHADE>
Specifies that the rule has no dropshadow beneath it: a solid line is displayed.

Note that rules that are of a given width (in pixels) may not be displayed properly on systems with a different display resolution -- clipping is likely to occur. (For this reason, I would advise you to stick to specifying rule widths in percentage format.)

Normally, HTML text blocks are wrapped at the discretion of the browser. Netscape provides the following commands for controlling line breaks:

<NOBR> ... </NOBR>
Specifies NO BReak. This prevents the browser from breaking lines between the start and end tags. (Particularly useful in source code listings where the language is whitespace sensitive but a display format is inappropriate.)
<WBR>
Specifies Word BReak. Permits the browser to insert an (optional) line break at this point in a <NOBR>...</NOBR> text block.
<BR>
Accepts the following additional options to help handle floating images:
<BR CLEAR=left>
breaks the line, moves down vertically until there is a clear left margin (with no floating images between the text and the margin)
<BR CLEAR=right>
breaks the line, and moves down vertically until there is a clear right margin (with no floating images between the text and the right margin)
<BR CLEAR=all>
moves down until both margins are clear of images
<FONT SIZE=value>
Permits rudimentary control over font size. value is an integer in the range 1-7; the default text size is 3. Font size directives can be absolute (e.g. <FONT SIZE=4>) or relative to the base font size (e.g. <FONT SIZE= +1> or <FONT SIZE= -2>).
<BASEFONT SIZE=value>
Sets the base font size. All relative <FONT> size changes are tracked with reference to the base font size.
<BIG>..</BIG>
Uses a font larger than the standard font size.
<SMALL>..</SMALL>
specifies a smaller than usual font.
<SUP>..</SUP> and <SUB>..</SUB>
respectively select superscript and subscript.
<center>... </center>
Centers the text lines between the left and right margins.


Document Structure

In HTML 2.0, list entities are tagged with a progression of different bullets, depending on their level of indentation.

Netscape permits you to specify the TYPE of element used as a bullet:

<UL TYPE="element">

legitimate values are "disc", "circle", or "square".

In HTML 2.0, ordered lists increment a count in arabic numerals (1,2,3 ...).

Netscape permits you to specify the TYPE of number system used for list items: "A" (capital letters), "a" (lowercase letters), "I" (capital roman numerals), "i" (lowercase roman numerals), and "1" (default, arabic numerals).

The START tag can be used to start a numbered list at a value other than 1. It accepts an integer parameter. For example:

<OL TYPE="I" START="3">

Specifies that list items should be in uppercase roman, starting from 3 (III).

In addition to these changes, individual list items can have their TYPE reset. If you change the type of a list element, subsequent elements in the list change to match. Thus, you can create a bulleted list that switches to numbers halfway down: the VALUE option can be used to change the index count of numbered list items. For example:


<BL TYPE="disc"> 

<!-- bullet list, using discs as tags -->

 <LI>first item 

<LI>second item 

<LI TYPE="I">Switch to ordered list, uppercase roman numerals 

<LI VALUE="2">... and count backwards! 

</BL>

(A note of caution: the ability to change the type of a list from inside a component element flies in the face of the tree-like inheritance of attributes that is characteristic of SGML-based applications. It might appear superficially useful once in a while, but it's a real can of worms.)

Netscape 2.0 adds support for the <DIV> text division, and adds the ALIGN attribute for both paragraphs and divisions. This permits left, right, and centre alignment; for example <P ALIGN="right"> specifies that the next paragraph is to be aligned with the right margin of the window.

Other document structure changes include Framesets (in Netscape 2.0); these are described in "Framesets" in the chapter "Building a portable web".


Forms

The ISINDEX tag has been augmented with a PROMPT option. ISINDEX indicates that a document is a searchable index. PROMPT specifies the message that is printed in front of the text input field. Its default value is:

<center>This is a searchable index. Enter search keywords:</center>


Client-side imagemaps

See "Imagemaps".


Summary

So far we have focused on HTML at the file level. Although we have been introduced to the concept of URLs, we have not yet examined how to use them effectively. Nor have we met some of the more exotic tags sometimes found in the HEAD section of an HTML document (such as BASE, ISINDEX, LINK, NEXTID, and META), which are of some importance in defining the interrelation between HTML documents in a coherent web. We will see how these work once we have examined the HTTP protocol, in the next chapter. The information in this chapter is sufficient to write small to middling documents, but stops short of being enough for a full-blown professional development effort.

In summary: this chapter introduced the structure of individual HTML files, and the basic relationship between HTML files. The next chapter will explain the structure of a web (that is, a tree of HTML files, access to which is provided by an HTTP server), and some of the more subtle aspects of HTML and HTTP.


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]