The following sections from the latest (version of September 5, 1995)
HTTP 1.0 spec seem to be relevant:
3.6.1 Canonicalization and Text Defaults
Media types are registered in a canonical form. In general, entity bodies
HTTP must be represented in the appropriate canonical form prior to
the body has been encoded via a Content-Encoding, the data must be in
prior to that encoding. However, HTTP modifies the canonical form
media of primary type "text" and for "application" types consisting of
HTTP redefines the canonical form of text media to allow multiple octet
indicate a text line break. In addition to the preferred form of CRLF,
must accept a bare CR or LF alone as representing a single line break in
Furthermore, if the text media is represented in a character set which
does not use
octets 13 and 10 for CR and LF respectively, as is the case for some
sets, HTTP allows the use of whatever octet sequence(s) is defined by
that character set
to represent the equivalent of CRLF, bare CR, and bare LF. It is assumed
recipient capable of using such a character set will know the appropriate
for representing line breaks within that character set.
Note: This interpretation of line breaks applies only to the
contents of an
Entity-Body and only after any Content-Encoding has been removed.
other HTTP constructs use CRLF exclusively to indicate a line
Content codings define their own line break requirements.
A recipient of an HTTP text entity should translate the received entity
line breaks to the
local line break conventions before saving the entity external to the
application and its
cache; whether this translation takes place immediately upon receipt of
the entity, or
only when prompted by the user, is entirely up to the individual
HTTP also redefines the default character set for text media in an entity
body. If a
textual media type defines a charset parameter with a registered default
"US-ASCII", HTTP changes the default to be "ISO-8859-1". Since the
character set is a superset of US-ASCII , this has no effect upon the
of entity bodies which only contain octets within the US-ASCII set (0 -
presence of a charset parameter value in a Content-Type header field
It is recommended that the character set of an entity body be labelled as
common denominator of the character codes used within a document, with the
exception that no label is preferred over the labels US-ASCII or
and (from 3.4):
HTTP character sets are identified by case-insensitive tokens. The
complete set of
tokens are defined by the IANA Character Set registry . However,
registry does not define a single, consistent token for each character
set, we define here
the preferred names for those character sets most likely to be used with
These character sets include those registered by RFC 1521  -- the
US-ASCII  and
ISO-8859  character sets -- and other names specifically recommended
within MIME charset parameters.
charset = "US-ASCII"
| "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
| "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
| "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
| "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
| "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
In other words, HTTP specifically allows the use of multibyte character
sets which do not use the CRLF sequence, more specifically 16-bit Unicode
(unicode-1-1). It also recognizes that this differs from the behavior
specified by MIME.
10201 N. DeAnza Blvd.
Cupertino, CA 95014-2233