Chapter 3: Understanding HTTP

Putting a request together

[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]

Now we've examined the standards, it's possible to explain what happens when you click on a hot-link in a document and your browser says, "host contacted: waiting for reply".
Firstly, your browser works out which URL you clicked on. It expands the URL (a process I'll describe in more detail later), working out which web server the object referred to is stored on (assuming that it is not a local file). It then assembles an HTTP request.
If your browser supports a local cache (like Netscape), it checks the cache to see if a document matching the URL is present there. If the document exists, it checks to see the last modification date on it. It then sends an HTTP HEAD request to the server. If the last modification date on the returned web document header is the same as the copy in the cache, it doesn't bother grabbing the document body -- it displays the local copy,saving bandwidth and time. If the document on the server has been modified since the last request, it pulls in the new one with a GET request (and saves a copy in the cache behind the scenes while it displays it for you).
If your browser doesn't know about local caching, it just sends a GET request for the remote document. If nothing comes back, or an error message is returned (as defined in the IETF draft protocol for HTTP 1.0), then it tells you in no uncertain terms that it didn't find anything.
The process is more complex if you are using a proxy server. A proxy server is a local web server that acts as an external cache for web documents. You configure your web browser to send all its requests to the proxy server. The proxy then mediates your requests.
When you click on a hot-link in your proxy-aware client, it sends a GET request to the proxy server. The proxy server consults its cache, and basically does what the cacheing browser would do -- it sends a HEAD, compares modification dates, and either passes the cached copy back to you or does a GET according to whatever it finds. Of course, your cacheing client might be set up to use a proxy server (effectively making a two-level cache). In this case, describing the transaction is a matter best accomplishedwith diagrams covered in sphagetti-like arrows.
As an aside: proxy servers perform the vital task of reducing long-distance traffic on the net. Because many local browsers can use a single proxy service, it reduces the number of GET requests considerably and replaces them with HEAD requests. Most documents have body parts that are much larger than their headers, so this reduces traffic to remote servers. (Local bandwidth is still significantly cheaper than long distance bandwidth.)
Of course, there are some points that need to be made about HTTP servers.
Firstly, although they listen to requests and reply by sending or receiving messages that look suspiciously similar to MIME e-mail, they are not mail servers. An SMTP mail server does not store documents locally (except in a temporary queue); it tries to forward them as fast as possible to its peers. An HTTP server, on the other hand, doesn't know how to talk to other HTTP servers (unless it is a proxy server). It does, however, know how to find files that are stored locally.
Secondly, you can do lots of things with a private HTTP server that are not immediately obvious. The CHECKIN and CHECKOUT methods, although obsolescent, point the way: PUT and GET are more explicit. The web was designed at CERN to facilitate collaborative authoring -- not simply browsing, but group work to prepare documents. These features are not currently used to any great extent by web browsers; in fact, most current browsers are pretty much read-only, to the point where separate HTML text editors are being sold on the basis that people need an additional piece of software to help write web documents. This situation is not going to last forever. Software like Microsoft HTML Assistant for Word 6 shows the way towards combined browser/editors that use the web as a data repository, rather than a passive publication medium. The ability to share documents (and lock them against third-party changes while editing them, and to store multiple versions) turns a web server into an extremely powerful groupware environment.
Thirdly, a caveat: the version of HTTP described above is HTTP 1.0. This is a preliminary standard. An improved version, HTTP 1.1, is in the works; behind it, early studies are being conducted into the feasibility of HTTP-NG, a next-generation non-text-oriented protocol for high-speed data transfer that will nevertheless be a backwardly compatible superset of the earlier standards. HTTP 1.0 will probably remain standard for quite a while (as the installed base of web browsers and servers adds a certain inertia to it), but new features will appear -- and the web is developing so fast that it is not possible to predict where they will spring up.
There are more HTTP request HEADer fields than those described above. We'll see some of them in action later in this book (notably the fields used by HTACCESS authentication). For the time being, if you need an exhaustive list of HTTP request HEADers, you need to find a copy of the relevent RFC or IETF draft standard for HTTP 1.0; this can be found on www.w3.org, among other places.

[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]