Chapter 3: Understanding HTTP

Dynamically updated documents


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]


Up until now, we've looked at HTML documents as being essentially passive, like paper. Admittedly, paper does not support pop-up menus and hypertext links, but the web documents we've seen are essentially static between downloads. They do not spontaneously change without human intervention.

Also noted in passing are those web pages that are connected to the external world by some kind of interface, be it a digital camera pointed at a lizard in a terrarium ( http://iguana.images.com/dupecam.html), or a controller and a robot arm with attached camera. These operate via a CGI script; a request to the script causes the camera to package its most recent image as a GIF file and return it via HTTP. Such pages are interesting, but serve little purpose unless they can be updated automatically and regularly.

There are two ways to drive the regular update of a document; either from the client end (client pull), or from the server end (server push).

In a client-pull system, the web browser receives a document that contains special HTML metainformation tags. These tags cause the browser to automatically take some action at a specified time; for example, by re-loading a page every few seconds, with a new image each time. Currently the only browser that supports this is Netscape, although it is likely to spread. Thus, the client "pulls" in data repeatedly. The repetitive GET requests may also include some control information to tell the server to modify the file being delivered to the client -- for example, by changing the time interval at which pages are pulled in. There is no need for the server to understand "client pull" in any way; all it does is respond to client requests.

In a server-push system, rather than the client pulling pages regularly, the server sends a document then holds the HTTP connection open and sends successive MIME-encapsulated objects throught it. This system is mediated by HTTP rather than HTML; both client and server need to know about "multipart/mixed" MIME documents, but there is no requirement for the client to understand the special HTML tags used in client-pull systems.

Server Push

MIME permits multiple objects to be included in a message by specifying that the Content-type is "multipart/mixed", and adding a boundary that can be used to delimit the objects. HTTP uses the content type "x-mixed-replace" to indicate that the second and subsequent objects should replace their predecessors. For example:

Content-type: multipart/x-mixed-replace;boundary=--====--

This is then followed by a series of MIME-encapsulated objects, separated by lines containing only the boundary string, "--====--". For example:

Content-type: multipart/x-mixed-replace;boundary=--====--

--====--

Content-type: text/html [[ an object goes here, containing text ]]

--====--

Content-type: text/html [[ the second and final object goes here ]]

--====--

The first object is received by the browser and displayed, but the connection stays open until the second object has been received. This blocks up a socket from the server to the client until the last in the series of files has been transferred (although a user can end the session at any time by telling their browser to "stop" the connection).

One use of this is to send a sequence of images that change over time. Because the subsequent objects replace their predecessors, you can use server push to regularly update an inlined image in a web page. Thus, you can effectively use the web as a delivery mechanism for (very) slow-scan video.

Note that when sending objects down a connection using server push, it is a good idea to ensure first that the CGI script driving the connection terminates as soon as the connection disappears. Otherwise, every time somebody connects to your server, a new process will be spawned that will continue running after the user moves on elsewhere.

Client Pull

Client pull relies on a special HTML tag in the HEAD section of a document:

<META HTTP-EQUIV="Refresh" CONTENT="1;URL"> The "META" tag indicates that this is metainformation describing the document. It sets the variable in HTTP-EQUIV named "Refresh" to whatever is described in CONTENT. (HTTP-EQUIV allows an HTML document to specify information equivalent to the HTTP response header from the server).

If CONTENT contains just a number, Netscape responds by reloading the document that many seconds later.

If CONTENT is something like: "1;URL= http://www.w3.org/pub/WWW/index.html", Netscape waits for one second, then loads the designated URL.

Note that if the newly loaded document does not also contain a refresh directive, it will not be reloaded again; the new document replaces the original, and the metainformation associated with the original is lost.

If you create a file with a refresh directive in its header, then load it, your browser will automatically reload it from here to eternity, unless you explicitly go to a different URL or close the window you are viewing it in.

Client pull is probably more useful with a CGI script. For example, the HTTP response

Location: URL is used to redirect browsers to another location; it causes a web browser to load the specified URL. The HTTP response

Refresh: n (recognized by Netscape -- it's the server version of the HTTP-EQUIV metatag described above) makes the browser reload the current URL. You can combine the two:

Location: somewhere.html

Refresh: 10 Thus giving yourself ten seconds to look at somewhere.html before your browser reloads the current URL.

We will see how to emit custom HTTP headers from inside CGI scripts in Chapter 5.

One point to note is that each refresh operation constitutes a new HTTP request. There are some overheads for setting up a connection from a browser to a web server; thus, client-pull is less efficient at loading long sequences of files than server-push.


[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]