I feel quite uneasy with the definition of Quoted-printable since a long
time. There are several reasons. It seems that the global structure does
not follow a more general -> more particular order in the statement of
the rules. Probably as a consequence, I feel several contradictions in
the text. I include below a commented version of the current text (only
the main problem areas are shown), and a 'proposal' for an alternative,
restructured text. Please note that the alternative text is intended to
represent exactly the same specification ; I am *not* trying to change
anything !!!!!
/AF
Comments on the current version :
In this encoding, octets with decimal values of 33 through
57 inclusive, and 59 through 126, inclusive, MAY be
--60 62
represented as the ASCII characters which correspond to
those octets (EXCLAMATION POINT through LESS THAN, and
GREATER THAN through TILDE, respectively). All other values,
including those corresponding to ASCII SPACE (32), EQUAL
SIGN (61), DEL (127), all octets less than 32, and all
octets greater than 127, are to be represented as determined
by the following rules:
Rule #1: (General 8-bit representation) Any octet may
be represented by an "=" followed by a two digit
hexadecimal representation of the octet's value. The
-- Not true for 13 and 10 representing a line break.
digits of the hexadecimal alphabet, for this purpose,
are "0123456789ABCDEF". Uppercase letters should
always be used when sending hexadecimal data, though an
implementation may choose to recognize lowercase
letters on receipt. Thus, for example, the value 12
can be represented by "=0C", and the value 61 (ASCII
EQUAL SIGN) can be represented by "=3D". Rule #1 is
optional for octets with values of 9 (e.g. ASCII TAB or
HT), 10 (ASCII LINEFEED), 13 (ASCII CARRAIGE RETURN),
and 32 through 126 (SPACE through TILDE), and is
REQUIRED for all other values.
-- The last sentence is not true :
-- - special rules apply for 9 10 13 20, rule #1 is *not* simply
-- optional;
-- - 61 is between 32 and 126, and requires special treatment.
---------------------------------------------------------------------
Alternative, restructured text proposal :
5.1 Quoted-Printable Content-Transfer-Encoding
The Quoted-Printable encoding is intended to represent data
that largely consists of octets that correspond to printable
characters in the ASCII character set. It encodes the data
in such a way that the resulting octets are unlikely to be
modified by mail transport. If the data being encoded are
mostly ASCII text, the encoded form of the data remains
largely recognisable by humans. A message which is entirely
ASCII may also be encoded in Quoted-Printable to ensure the
integrity of the data should the message pass through a
character-translating, and/or line-wrapping gateway.
In this encoding, octets are to be represented as determined
by the following rules:
Rule #1: (General 8-bit representation) Any octet,
except those indicating a line break according to the
local newline convention, may be represented by an "="
followed by a two digit hexadecimal representation of
the octet's value. The digits of the hexadecimal
alphabet, for this purpose, are "0123456789ABCDEF".
Uppercase letters should always be used when sending
hexadecimal data, though an implementation may choose
to recognize lowercase letters on receipt. Thus, for
example, the value 12 can be represented by "=0C", and
the value 61 (ASCII EQUAL SIGN) can be represented by
"=3D". Except when the following rules allow an
alternative encoding, this rule is mandatory.
Rule #2: (Literal representation) Octets with decimal
values of 33 through 60 inclusive, and 62 through 126,
inclusive, MAY be represented as the ASCII characters
which correspond to those octets (EXCLAMATION POINT
through LESS THAN, and GREATER THAN through TILDE,
respectively).
Rule #3: (White Space): Octets with values of 9 and 32
MAY be represented as ASCII TAB (HT) and SPACE
characters, respectively, but MUST NOT be so
represented at the end of an encoded line. Any TAB (HT)
or SPACE characters on an encoded line MUST thus be
followed on that line by a printable character. In
particular, an "=" at the end of an encoded line,
indicating a soft line break (see rule #5) may follow
one or more TAB (HT) or SPACE characters. It follows
that octets with values 9 and 32 appearing at the and
of an encoded line must be represented according to
Rule #1. This rule is necessary because some MTAs are
known to pad lines of text with SPACEs, and others are
known to remove "white space" characters from the end
of a line. Therefore, when decoding a Quoted-Printable
message, any trailing white space on a line must be
deleted, as it will necessarily have been added by
intermediate transport agents.
Rule #4 (Line Breaks): A line break, whatever its
representation is following the local newline
convention, must be represented by a (RFC 822) line
break, which is a CRLF sequence, in the
Quoted-Printable encoding. If isolated CRs and LFs, or
LF CR and CR LF sequences are allowed to appear in
binary data according to local conventions, they must
be represented using the "=0D", "=0A", "=0A=0D" and
"=0D=0A" notations respectively.
Rule #5 (Soft Line Breaks): The Quoted-Printable
encoding REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded with
the Quoted-Printable encoding, 'soft' line breaks must
be used. An equal sign as the last character on a
encoded line indicates such a non-significant ('soft')
line break in the encoded text. Thus if the "raw" form
of the line is a single line that says:
Now's the time for all folk to come to the aid of their
country.
This can be represented, in the Quoted-Printable
encoding, as
Now's the time =
for all folk to come=
to the aid of their country.
This provides a mechanism with which long lines are
encoded in such a way as to be restored by the user
agent. The 76 character limit does not count the
trailing CRLF, but counts all other characters,
including any equal signs.