Chapter 3: Understanding HTTP

A potted anatomy of TCP/IP

[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]

TCP/IP (the IP is short for Internet Protocol) is a "protocol stack"; it services requests from software that understands the high-level interface it provides, passes the requests down through several levels to the underlying transport mechanism, exchanges data (at the lowest level) with other computers running TCP/IP, and passes incoming data back up to the software. TCP provides a high-level, reliable, interface to applications, and exchanges data via IP. It is packet-based, meaning that it splits messages into packages before transmission and reassembles them in the correct order at the other end, and reliable, in that it ensures that all the necessary packets in an exchange are received correctly and without errors. IP is the low level protocol that mediates the connections; it is "unreliable", leaving error correction to the higher levels. (Other high level protocols also run over IP, notably UDP and ICMP; some of them,such as TCP, are reliable, while others, such as UDP,are not).
TCP applications can rely on their data stream: they don't need any built-in error tests and don't have to worry about data being corrupted en route. If a packet arrives mangled, the receiving TCP stack simply doesn't tell the application it arrived -- it sends off for a new copy.
Because TCP/IP sits between the user's applications and the actual physical network, it provides a standard model for all networking software. The physical connection beneath a TCP/IP stack can be ethernet, a fibre-optic FDDI network, a modem running one of the SLIP or PPP driver protocols, or an aviary full of carrier pigeons. It doesn't matter: TCP/IP simply sends a stream of numbered and addressed packets down to the transport level, listens for incoming packets coming back from the transport level, and reassembles them into an image of the original data. This abstraction of the protocol from the hardware is important: it means that TCP/IP can provide a common interface for everything from an IBM PC or a Macintosh to a UNIX workstation, a VAX minicomputer, or an IBM mainframe.
TCP/IP relies on computer-to-computer connections (or carrier pigeons) to route packets to their destination; as with the telephone network, your computer doesn't need to be physically wired to another computer to send data to it, as long as both are connected to the network. Packets are tagged with a destination TCP/IP address, and intermediate machines route them towards whichever adjacent computer they believe to be nearest to the ultimate goal. All TCP/IP-running computers must have a unique address. This takes the form of a 32-bit integer, usually represented in numeric form as four 8-bit words separated by periods: for example: 158.152.11.49. You won't often see addresses like this; they are mapped to symbolic names by the client/server software, which invokes a subsystem called the Domain Name System for this purpose. (DNS is a distributed database of hostnames which -- you guessed it -- is maintained over TCP/IP connections.)

TCP/IP addresses are not allocated at random. Virtually all of them are allocated within a network -- a cluster of addresses, usually granted to an organization. For example, all the IP addresses that have a first number in the range 0-126 are allocated to separate "Class A" networks. A class A network consists of systems with IP addresses beginning with the specified number; the administrator of the network is then free to allocate any permitted IP address starting with that number to a given machine.
Each of the mammoth class A networks can have sixteen million or so clients. Another cluster of IP addresses are allocated to the Class B networks, each node of which has the same first two IP numbers. (There also exist Class C networks; a Class C network can support a maximum of 254 hosts.)
Because TCP/IP is limited to an address space of 2³² hosts, various compromises have been made to prevent the address space from being overloaded as new systems come online. For example, local networks can be connected to the internet via a gateway, and configured to use a subnet. And an extended TCP/IP standard is under discussion; this standard will be upwardly compatible with the current one, but provide an address space of 2⁶⁴ hosts.
A computer running a TCP/IP stack needs to be able to allocate bandwidth to several programs executing in parallel. For this reason, two additional chunks of information are transmitted in any TCP/IP transaction; the port number and protocol number of the destination (and the port number of the originating host). TCP/IP can handle up to 2¹⁶ ports, and 2¹⁶ protocols; as protocols can share ports, there is usually no trouble in allocating a communications socket the use of a unique port.
Well-known ports are port numbers that are usually used for a specific network protocol. If you have access to a system running UNIX, look in the file /etc/services; this contains a list of recognized network services and the ports they run on. For example, almost all internet news traffic is carried by connections on port 119 (which is reserved for NNTP, the net-news transport protocol). The world wide web is less standardized, but usually uses port 80; the URL specification makes allowances for a different port to be used. Thus, a web server listening for connections to port 80 of a host will grab any incoming packets and interpret them as HTTP commands.
A protocol in this context is simply a set format used for carrying out transactions between a client and a server. The client issues commands,and the server carries them out and sends back some response to which the client may in turn respond, maintaining a dialogue. In many respects, a protocol can be viewed as a simple programming language. For example, NNTP or SMTP have an accepted repertoire of commands with various parameters, and return error values if an incorrect command is received.

[ Comments ] [ Copyright ] [ Chapter contents ] [ Book contents ]