ietf-822
[Top] [All Lists]

encoded-words, parameter continuation and visually ordered charsets

2002-04-28 13:00:47

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Neither RFC2047 nor RFC2231 seem to define how to handle sequences of 
encoded-words and parameter continuation with visually-ordered charsets like 
iso-8859-8.

There are two alternatives, which differ in when the splitting into 
encoded-words occurs (assuming user input is processed in Unicode 
internally):

1. encoded-words are sorted logically, even when the charset used by them is
   visually ordered:
   That's kind of natural, since the different encoded-words can use
   different charsets each, not all of which may be visually ordered.
   An implementation would thus split the to-be-encoded text at the unicode 
   level, encode each part into a fitting charset, then encode the resulting 
   octet-sequences as encoded-words and insert the encoded words in the order 
   in which they correspond to the source unicode character sequence:

   stringlist = split( unicode_string );
   foreach string in stringlist {
     eightBitText = applyCharsetTransformation( string, charset );
     encodedWord = encodeRFC2047( eightBitText, charset, language );
     phrase.append( encodedWord );
   }
   header.insert( phrase );

2. encoded-words are sorted visually if the the charset used is so.
   An implementation would thus encode the to-be-encoded text using a fitting 
   charset, then split the resulting octet-sequence (with knowledge of 
   character boundaries), then encodes each octet-sequence into encoded-words:

   eigtBitString = applyCharsetTransformation( unicode_string, charset );
   eightBitStringList = split( eightBitString );
   foreach chunk in eightBitStringList {
     encodedWord = encodeRFC2047( eigthBitText, charset, language );
     if ( charset.isRTL ) {
       phrase.prepend( encodedWord );
     } else {
       phrase.append( encodedWord );
     }
   }
   header.insert( phrase );

RFC2231 filename*0*=iso-8859-8''foobarbaz; filename*1=Foo has the same 
ambiguity.

We currently use (1) and had no users complaining about that. But what do 
other implementations do?

Marc

- -- 
Marc Mutz <mutz(_at_)kde(_dot_)org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8zFSg3oWD+L2/6DgRAkYlAJ0Te5TcochdA51PQ1lEj72Dx5vWZQCgsRr0
3r+qjy47LRJ4zy1w7o+yeTI=
=0gor
-----END PGP SIGNATURE-----


<Prev in Thread] Current Thread [Next in Thread>