On Monday, Oct 7, 2002, at 06:14 Asia/Tokyo, Nick Ing-Simmons wrote:
I have re-started work on Unicode aware perl/Tk - and I am playing
with it in "tkmail" (as a test app). Obviously Encode::MIME is just
the thing for a mail tool.
However the encode ops are not ideal:
I know it is not and one of the reasons it is not is that it has to
follow Encode API. MIME Header encoding in its essence is double
encoding so it lacks minute controls that you may want.
If you encode('MIME-Header',...) it _seems_ to always use the 'B' form
maybe my tests are not extensive enough.
That one is documented already;
perldoc Encode::MIME::Header
ABSTRACT
This module implements RFC 2047 Mime Header Encoding.
There are 3 variant encoding names; "MIME-Header",
"MIME-B" and "MIME-Q". The difference is described below
decode() encode()
----------------------------------------------
MIME-Header Both B and Q =?UTF-8?B?....?=
MIME-B B only; Q croaks =?UTF-8?B?....?=
MIME-Q Q only; B croaks =?UTF-8?Q?....?=
The problem is, if you need more minute controls (en|de)code needs more
arguments for that but that will make ordinary (en|de)coding too hard.
If I encode('MIME-Q',...) (as currently) then I seem to get all the
' ' inside the =?UTF-8?Q?...?= and so they become =20 it also seems
to wrap all the ASCII parts too. While this is not wrong, it makes
things less readable for mail clients which don't understand and so
leave the markup for user to see.
That one I am not sure. I got mails of the opposite opinions asking
for strict RFC 2047 compliance (in Jcode), especially when line folding
was concerned. So I made Encode::MIME::Header RFC 2047 compliant. But
I agree that =20 instead of '_' maybe too much. Nevertheless, =20 is
exactly what RFC 2047 recommends;
RFC 2047
As a consequence, unencoded white space
characters (such as SPACE and HTAB) are FORBIDDEN within an
'encoded-word'. For example, the character sequence
=?iso-8859-1?q?this is some text?=
would be parsed as four 'atom's, rather than as a single 'atom' (by
an RFC 822 parser) or 'encoded-word' (by a parser which understands
'encoded-words'). The correct way to encode the string "this is
some
text" is to encode the SPACE characters as well, e.g.
=?iso-8859-1?q?this=20is=20some=20text?=
And more on "Q" Encoding
4.2. The "Q" encoding
The "Q" encoding is similar to the "Quoted-Printable" content-
transfer-encoding defined in RFC 2045. It is designed to allow text
containing mostly ASCII characters to be decipherable on an ASCII
terminal without decoding.
(1) Any 8-bit value may be represented by a "=" followed by two
hexadecimal digits. For example, if the character set in use
were ISO-8859-1, the "=" character would thus be encoded as
"=3D", and a SPACE by "=20". (Upper case should be used for
hexadecimal digits "A" through "F".)
(2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
represented as "_" (underscore, ASCII 95.). (This character may
not pass through some internetwork mail gateways, but its use
will greatly enhance readability of "Q" encoded data with mail
readers that do not support this encoding.) Note that the "_"
always represents hexadecimal 20, even if the SPACE character
occupies a different code position in the character set in use.
(3) 8-bit values which correspond to printable ASCII characters
other
than "=", "?", and "_" (underscore), MAY be represented as those
characters. (But see section 5 for restrictions.) In
particular, SPACE and TAB MUST NOT be represented as themselves
within encoded words.
With this understood,
Suggestions:
- leave ASCII or even iso-8859-1 sequences as such
Only ASCII printable was allowed so I have to decline this one.
'MIME-Q' is already implemented that way. Bottom line is that I do not
want to give up RFC 2047 conformance.
- wrap sequences of ch > 0xff in qhichever of 'Q' or 'B' is shorter
(do both encodings and throw one away).
I'll consider this one instead. This one at least does not breach RFC
2047.
Are patches in that direction likely to be accepted or do I build
a MIME-Smart on top ?
As I said, Encode::MIME::Header has those restrictions;
* the Encode API
* RFC 2047
This is very restrictive considering the nature of MIME Header
Encoding. Surprisingly the name space Encode::MIME itself remains
empty and maybe we can make use of it....
Dan the Encode Maintainer