[Top] [All Lists]

Re: utf-8 vs Base64 in LZJU90 and FS

1998-10-08 06:34:59
On Thu, 08 Oct 1998 06:04:59 EDT, "Al Costanzo" said:
I have pondered the request to not use utf-8 and use base64.

Um.. Is this one request, or two?

utf-8 is a *charset*, just like iso8859-1 or us-ascii.

base64 is a content-transfer-encoding, used to keep brain-dead MTAs
from screwing up your utf-8, or iso8859-1, or other non-7-bit-clean
charset/binary data.

LZJU90 uses UTF-8 because brain dead mailers will only understand
the 7 bit ascii subset of UTF-8 where mailers that understand utf-8
will make better use of the encoder.

Can you please re-explain this?  There appears to be a confusion regarding
the difference between a charset and a CTE.  A compression algorithm
should be totally immune to charset issues (otherwise, how would you
compress a binary object?).

Could someone expand on the reason utf-8 should be not used?

For what it's worth, your mail showed up with a base64 encoding wrapped
around what was *flagged* as charset=utf-8.  However, RFC2045, section
4.1.2, states:

   In general, composition software should always use the "lowest common
   denominator" character set possible.  For example, if a body contains
   only US-ASCII characters, it SHOULD be marked as being in the US-
   ASCII character set, not ISO-8859-1, which, like all the ISO-8859

My MUA generated a warning message that utf-8 was an unknown charset,
but tried its hardest to display it as 8859-1 (which just *happened*
to suceed).  Some MUAs throw up their hands entirely.  That's why you
shouldn't flag it as utf-8 unless it is needed.  I didn't see any non
us-ascii characters in the mail.. and after downgrading to us-ascii,
the base64 CTE wasn't needed either.

/Valdis (who now gets to go figure out why his MUA was able to
display the base64/utf-8 correctly, but Lost Big Time on the reply...)

Attachment: pgpPBICSlKoVF.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>