perl-unicode

Re: How to convert base64 string to utf-8

2004-02-05 10:30:07

At 4:21 pm +0200 5/2/04, ALexander N. Treyner wrote:

Hi John,
Your code works perfect.
But I found one strange thing.
For example I have next string:

        hello ?ÏÂÌ hello world!!!!

that converted by the mail client to

        hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world!!!!?=

After converting it by code you wrote into utf-8, the "_" is still present between second "hello" and "world".
Is it right behavior?

I'm not familiar with the way these headers are composed and was simply providing the code to do the x to utf-8 conversion. If the underline character has to be transliterated to a space then that's very simple. I don't know how an underline is written, then -- I presume they must use =5F. I don't know which RFC deals with these headers, otherwise I could tell you better.

It's probably dealt with in <http://www.ietf.org/rfc/rfc2047.txt> ....

Ah yes:

4.2. The "Q" encoding

   The "Q" encoding is similar to the "Quoted-Printable" content-
   transfer-encoding defined in RFC 2045.  It is designed to allow text
   containing mostly ASCII characters to be decipherable on an ASCII
   terminal without decoding.

   (1) Any 8-bit value may be represented by a "=" followed by two
       hexadecimal digits.  For example, if the character set in use
       were ISO-8859-1, the "=" character would thus be encoded as
       "=3D", and a SPACE by "=20".  (Upper case should be used for
       hexadecimal digits "A" through "F".)

   (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
       represented as "_" (underscore, ASCII 95.).  (This character may
       not pass through some internetwork mail gateways, but its use
       will greatly enhance readability of "Q" encoded data with mail
       readers that do not support this encoding.)  Note that the "_"
       always represents hexadecimal 20, even if the SPACE character
       occupies a different code position in the character set in use.

   (3) 8-bit values which correspond to printable ASCII characters other
       than "=", "?", and "_" (underscore), MAY be represented as those
       characters.  (But see section 5 for restrictions.)  In
       particular, SPACE and TAB MUST NOT be represented as themselves
       within encoded words.



JD