If we encoded it in quoted-printable, it becomes gobbledygook in the
absence of decoding software. This is the primary rationale (there
are others) for the existence of mnemonic, and I think it's a darned
good one... :-)
Seriously, I doubt that quoted-printable will catch on for such things
as German text. Maybe for Makefiles, but not for German text.
This is all fine and dandy if you know what character set is being
used. You definitely should know this if you're talking about
text/plain messages. But there are lots of other sorts of things we
send via mail. Many of them don't break down cleanly into a single
character set. Others involve multiple character sets and cannot be
characterized by a single external character set.
Could you provide a few examples?
But mnemonic is only
appropriate when a message has a 1:1 mapping from its bytes into a
single character set definition.
I think Keld's system can support messages that mix e.g. Latin-1 and
Latin-2 quite well. (What do you mean by "a single character set
definition"? Something like 10646?)
You don't have to go very far past
plain text before this condition no longer holds.
I will admit that I had plain text in mind. Perhaps mnemonic is not
very useful for things other than plain text.
(I see escape sequences that involve the use of 8-bit
characters quite frequently, so the impact of this on encoding
methodologies is obvious.)
The escape sequences that I have in mind for a multilingual encoding
are based on ISO 2022, and do not use 8-bit characters.
If you want to experiment with a system
that tries to exclude escape sequences from conversion you might want
to look at Kermit.
As far as I know, Kermit uses many of the escape sequences specified
in ISO 2022. However, I think that it uses too many of 2022's
features. It is possible to do the same thing with far fewer different
types of escape sequences.
But mnemonic
in no way obviates the need for quoted-printable, which is the
encoding of choice for text objects that cannot be conveniently
categorized as being in a given character set.
Sorry, I didn't mean to say that quoted-printable is unneeded. I
simply meant to say that I don't think that quoted-printable will
catch on for certain types of messages, such as German plain text.
I'm not sure what the size of Keld's proposal has to do with anything.
What I meant by "humungous set" is that Keld's proposal includes
mnemonics for languages that are currently encoded very differently.
For example, RFC-CHAR also contains specifications for Japanese. Most
of the Japanese text in messages in the Internet and beyond are
encoded in a subset of ISO 2022, which is far more compact than Keld's
mnemonics for Japanese (i.e. 2 bytes vs 8 bytes).
Apart from the obvious size problem, hardly any Japanese users have
software that understands Keld's mnemonics. Most of the software
understands the 2022 subset. So Japanese encoded in Keld's mnemonics
would be extremely unreadable.
It is quite clear that the Japanese will not use Keld's mnemonics for
their usual email. So the question is: What would Keld's Japanese
mnemonics be used for? For use in other countries? Wouldn't this be a
rather minor usage, in terms of volume in characters per day? Also,
wouldn't it be less confusing if Japanese was encoded in one way (i.e.
the Japanese way) instead of two ways?
If you have additional problems with RFC-CHAR I'd like to hear what
they are. But issues of scope are not a valid area of concern for the
Working Group, in my opinion.
You say (later) that you are reluctant to pursue two mnemonic
approaches at once. In much the same way, I am reluctant to pursue two
approaches for encoding Japanese at once. Since there is already an
established encoding for Japanese, the Japanese mnemonics should be
removed from RFC-CHAR.
If it is true that issues of scope are not a valid area of concern for
the Working Group, I would like to hear the Chair himself say so.
Stay tuned for the next version of the multilingual encoding draft,
which will take into account some of the realities that we have bumped
into lately.
I won't comment on this apart from saying that I'm reluctant to pursue
two mnemonic approaches at once.
I'm not necessarily advocating two different mnemonic approaches. We
may well end up including some of Keld's work in the new document,
either by a reference or by explicit inclusion if that is felt to be
desirable.
If you could work out your
differences with what Keld has proposed and come up with a unified
result I think we'd all be a lot happier. (I have found that Keld is
more than willing to listen to suggestions on how to modify RFC-CHAR
to make it a better specification.)
Well, I'm sorry to say that I have not found Keld at all willing to
make changes that I propose.
I also feel that this group has basically given Keld the go-ahead to
continue the development of RFC-CHAR, with the stated goal that it
will become a standard.
As far as I can tell, this group has not made any such decision. You
yourself were complaining about the lack of comment on RFC-CHAR a
little while ago. Silence does not mean agreement.
I quite frankly
don't like what I see happening here -- I see a possibility that
RFC-CHAR will be abandoned, and I think this is a huge mistake.
I also don't want RFC-CHAR to be abandoned. I think that it might be
possible to reach consensus on the Latin-1 part quite quickly.
Erik