HTTP Headers

Every WWW connection is in a sense delayed by its accompanying HTTP headers. Some of these headers are necessary. Others aren't.

For example:

HTTP/1.1 200 OK
Date: Thu, 19 Sep 2002 17:30:05 GMT
Server: Apache/1.3.26 (Unix)
Last-Modified: Wed, 18 Sep 2002 01:48:31 GMT
Accept-Ranges: bytes
Content-Length: 13310
Connection: close
Content-Type: text/html

The response starts with a line similar to HTTP/1.1 200 OK, and then specifies additional lines of information in the format Name: value (name colon value). Order and case do not matter.

Server: Server software, e.g., Apache/1.3.26 (Unix). It is either bragging, or a security weakness--I'm afraid both. Do not use it.
Date: Server time, e.g., Thu, 19 Sep 2002 17:30:05 GMT. Necessary for every transaction involving dates, because the client must know this value in order for other dates to make sense.
Content-Length: Specifies byte-length of response, e.g., 1024. Very important. In all possible cases this should be specified.
Accept-Ranges: Partial request methods, e.g., "bytes." Very stupid. Nothing else except for bytes makes sense.
Last-Modified: Date of last file modification. Very useful. Allows clients to intelligently cache documents. Far better than relying on server logic with "Only-If-Modified," etc.
Set-Cookie: Returns a semicolon-separated cookie for the domain, e.g., colors=blue; path=/. Can be useful. Ridiculous when cookies extend beyond an ordinary screen width.
Connection: Type of connection maintenance, e.g., close. Persistent (open) connections allow the connection to be reused, saving considerable time. Poor designs may be unable to implement this feature. Defaults to open (with HTTP/1.1), which is better. As its only present use is to indicate "close," it is a rather dumb-headed header (no pun intended).
Content-Type: Kind of media, e.g., text/html. Basic to the operation of web browsers, which must display varying media types appropriately (e.g., an image or a web page). Only a few kinds are well supported without plug-ins, and accuracy is not guaranteed. Often defaults to text/plain.

Headers are abused far too much, and far too many exist. If I had to supply a boilerplate set of headers, I would use only Content-Type, Content-Length, Date, and Last-Modified. The rest should always make sense; if implicitness creates confusion, a mistake is being made.

Lines are separated by CRLF; however, it is my persuasion that this must be replaced by LF alone, the sooner the better. See my modified version of publicfile.

5-17-03: I recommend the adoption of a Redistribute header in either HTTP or UAR for distributed service. Essentially the concept is that once numerous copies of a resource exist, there is no need for the same single server to supply all further copies. If another server intends to provide some access to its own downloaded copy of a file, then it may so indicate to another server with the Redistribute: [URL] header (or redistribute:url in UAR) accompanying its request. Once any server supplying a resource knows of another server, it may at its discretion supply a Location: URL header in response to some requests above its capacity to respond. A distribution server may require a server indicating the Redistribute header to also supply the MD5 sum of the resource to guarantee its validity (by comparison with the server's own master copy) before adding its URL to its list of possible redistributor addresses.

I suggest that in the short-to-mid-term it may be extremely helpful for the majority of small resources to also be redistributed in this manner to all internet-connected computers while they maintain the same server connection. That is, a server just having provided a 21-KB GIF file to a certain client should expect that the client supply its IP address and a cache resource, e.g., Local: IP.addr/cache.id. The server would reply to future responses from that client for the resource by the negotiation Location: IP.addr/cache.id, from which the client would use its own cache. This would aid the present cache-negotiation techniques. Obviously, if a resource changed, then the server may release the new resource based on the information it knew from the cache-date it previously stored regarding that particular client's request. Also, network caching servers may with their request supply a Redistribute header in place of the Local header, and then the original server may refer future clients to them directly, thus balancing the load soon after a resource is first available.

Much of this negotiation work can be implemented in a small prolog to a present HTTP request-response implementation.