Re: quoted-printable

Ned Freed writes:

Actually, it is very difficult to find example of a content type that
works differently. Let's see -- handy examples include PostScript,
SGML, TeX, LaTeX, and of course Richtext.


OK, thanks, I can see what you're getting at.

But as I understand it Kermit is not trying to generate
documents, it is trying to provide a full-featured conversion
facility. A conversion facility is in no position to mandate what
sequences are used.


Well, let's not talk about Kermit, let's talk about email. If we use
2022 in email, I think we should restrict ourselves to as small a
subset as we can. I mean, if people cannot implement the 2022 subset
Compound Text correctly, how can we expect people to implement full
2022 correctly?

Also, "be conservative in what you send"...

Even MIME says:

                 NOTE:  Beyond US-ASCII, an enormous  proliferation
                 of  character  sets is possible. It is the opinion
                 of the IETF working group that a large  number  of
                 character  sets  is  NOT  a  good thing.

(Though the latter is of course slightly different from the issue of
multiple types of escape sequences.)

But there are other
character sets on the horizon (I already have to cope with two for
Japanese, and this before the arrival of 10646), and the problem of
how to convert from one to the other is not that far away. RFC-CHAR is
attempting to address the need to support existing practice while
allowing for conversion to/from future practice.


I'm still not convinced that conversions based on 10646 will be that
useful. For example, it is not clear whether East Asian users will
accept the "Han unification" done in the current draft of 10646. So if
one converts from a 2022-like encoding to 10646, and then tries to
render that, the user may not see what he/she wants to see. I'm not
stating this as a fact. This is just a concern of mine, based upon
numerous comments made by Japanese, saying that Unicode and 10646 are
destroying culture, etc.

But having two mnemonic formats is an entirely different kettle of
fish. We don't need two, we need one that has the input of the entire
community going into its design.


I feel the same way, but, unfortunately, we already have more than one
set of mnemonics. There are at least 3 sets known to "the character
encoding experts". (By Keld Simonsen, Alain LaBonte/, and Johan van
Wingen.)

And the Vietnamese-using community has yet another method, which uses
up to three characters to represent letters with two accents. I have
been trying to convince them that it would be a good idea to unify
these approaches, but I haven't made much progress.  They've been
using their method for a couple of years already.

I'm beginning to think that perhaps an all-encompassing truly
multilingual encoding won't be used that much. Perhaps it would be a
good idea to simply document and name the formats currently used by
the various communities of the world. E.g. one for iso-2022-jp, one
for the Vietnamese method, one for Latin-1, and so on. And then wait
to see which multilingual encodings catch on.


Regards,
Erik