perl-unicode

Re: Encode::MIME::Header my 2¢

2002-10-07 01:30:04
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:

That one I am not sure.  I got mails of the opposite opinions asking 
for strict RFC 2047 compliance (in Jcode), especially when line folding 
was concerned.  So I made Encode::MIME::Header RFC 2047 compliant.  But 
I agree that =20 instead of '_' maybe too much.  Nevertheless, =20 is 
exactly what RFC 2047 recommends;

RFC 2047
 As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  

I must re-read the RFC but I think I am saying "don't encode multiple
ASCII words as one UTF-8 word.

For example, the character sequence

      =?iso-8859-1?q?this is some text?=

   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is 
some
   text" is to encode the SPACE characters as well, e.g.

      =?iso-8859-1?q?this=20is=20some=20text?=

But likewise a traditional RFC822 Subject line 

Subject: This is some text    

_is_ 4 words

But 

Subject: =?iso-8859-1?q?this=20is=20some=20text?=

Is one word.


   (3) 8-bit values which correspond to printable ASCII characters 
other
       than "=", "?", and "_" (underscore), MAY be represented as those
       characters.  (But see section 5 for restrictions.)  In
       particular, SPACE and TAB MUST NOT be represented as themselves
       within encoded words.

With this understood,

Suggestions:
 - leave ASCII or even iso-8859-1 sequences as such

Only ASCII printable was allowed so I have to decline this one.  

ASCII printable would solve most of my issues - my memory of RFC was 
that iso-8859-1 was the "default" - if it is only ASCII then fine.

'MIME-Q' is already implemented that way.  Bottom line is that I do not 
want to give up RFC 2047 conformance.

Neither do I.


 - wrap sequences of ch > 0xff in qhichever of 'Q' or 'B' is shorter
   (do both encodings and throw one away).

I'll consider this one instead.  This one at least does not breach RFC 
2047.

Are patches in that direction likely to be accepted or do I build
a MIME-Smart on top ?

As I said, Encode::MIME::Header has those restrictions;

* the Encode API
* RFC 2047

This is very restrictive considering the nature of MIME Header 
Encoding.  Surprisingly the name space Encode::MIME itself remains 
empty and maybe we can make use of it....

I probably will - there are a whole slew of Encode-oid issues with 
body part of MIME.



Dan the Encode Maintainer
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/