At 4:21 pm +0200 5/2/04, ALexander N. Treyner wrote:
Hi John,
Your code works perfect.
But I found one strange thing.
For example I have next string:
hello ?ÏÂÌ hello world!!!!
that converted by the mail client to
hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world!!!!?=
After converting it by code you wrote into
utf-8, the "_" is still present between second
"hello" and "world".
Is it right behavior?
I'm not familiar with the way these headers are
composed and was simply providing the code to do
the x to utf-8 conversion. If the underline
character has to be transliterated to a space
then that's very simple. I don't know how an
underline is written, then -- I presume they must
use =5F. I don't know which RFC deals with these
headers, otherwise I could tell you better.
It's probably dealt with in <http://www.ietf.org/rfc/rfc2047.txt> ....
Ah yes:
4.2. The "Q" encoding
The "Q" encoding is similar to the "Quoted-Printable" content-
transfer-encoding defined in RFC 2045. It is designed to allow text
containing mostly ASCII characters to be decipherable on an ASCII
terminal without decoding.
(1) Any 8-bit value may be represented by a "=" followed by two
hexadecimal digits. For example, if the character set in use
were ISO-8859-1, the "=" character would thus be encoded as
"=3D", and a SPACE by "=20". (Upper case should be used for
hexadecimal digits "A" through "F".)
(2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
represented as "_" (underscore, ASCII 95.). (This character may
not pass through some internetwork mail gateways, but its use
will greatly enhance readability of "Q" encoded data with mail
readers that do not support this encoding.) Note that the "_"
always represents hexadecimal 20, even if the SPACE character
occupies a different code position in the character set in use.
(3) 8-bit values which correspond to printable ASCII characters other
than "=", "?", and "_" (underscore), MAY be represented as those
characters. (But see section 5 for restrictions.) In
particular, SPACE and TAB MUST NOT be represented as themselves
within encoded words.
JD