ietf-822
[Top] [All Lists]

Re: Updated MIME "fix" list

1993-02-08 06:38:24
  (i) Unless you have a copy of ISO IS 10646 (not DIS-1.2 plus or minus
SC2 notes) in hand, the assertion that RFC1345-MNEM can represent all of
the characters in it would seem to be to be a little strong.

The way it can represent *all* the IS 10646 characters are via
the _?Uxxxx_ notation. This way it should be possible to represent
them all. 31 bit is also supported.
   I have to admit that I forgot about this notation.   But using it
really doesn't "represent" mnemonic encoding--it is an escape for
unrecognized characters.  If a message is drawn primarily from a row or
plane that is not understood by the encoder (and the decoder is either
minimal or that doesn't understand the row or plane either), it is
pathological--no more readable than base64 and requiring a lot more
space.

Wrt composed characters RFC 1345 is clear, 00F8 is LATIN CAPITAL
O WITH STROKE , there is no special name for the combining STROKE
(I do not have the draft standard at hand, so I just assume
that 0338 is one of the combining strokes). Assuming intro=&

   00F8               &O/
   006F,0338          O&_?u0338_
...
   Assuming that the relevant materials from 10646.1 DIS-2 survive into
the Standard...
   -- I think the weasel-language of clauses 23.3 and 23.4, and the
lovely text about the "R-zone" (clause 10.2) really do suggest that all
three of these are the same character --&O/--and that O&_?u0338_ is as
much of a travesty as _?u004F__?u0338_ would be (see "no better than
base64" above).
   -- I would suggest that, even if SC2 was unwilling to specify a
canonical order for situations like this, if MNEM is really going to be
mnemonic to the degree possible, then it should and, e.g., the second
and third sequences above should produce exactly the same MNEM sequence.

  [ Aside: 0000 0338 is "combining long solidus overlay" ]
Keld, I think this demonstrates two things, neither of them implying
that MNEM is fatally flawed:

(1) Like 10646 itself, we need to have more experience with the real
interactions between MNEM and the real (and unseen) 10646 before trying
to make it an integral part of any standard.  Those experiences are
likely to lead to some additional rules and conventions in MNEM (e.g.,
what to do about these combining sequences) and/or some profiling of the
use of 10646 and/or MNEM for email use.  In the near term, that
conclusion is consistent with, and reinforces, Dave Crocker's conclusion
that MNEM can't be added to MIME at this time without delaying MIME
fairly seriously.

(2) Your responding to my "representing" question with _?uNNNN_ suggests
another way to think about MNEM in a MIME context.  Again, I don't think
this is ready--I'm prepared to argue than *anything* based on, or that
has to reference, 10646 is not ready until the final text of the
standard is readily available to anyone who needs it--but...
   We've got pretty general agreement, I think, that quoted-printable is
useful for single-octet characters and pretty marginal for longer ones. 
While it can be used in other contexts, we were able to invent it and
treat is as an encoding, because there is pretty universal knowledge
(not heuristic or intuitive understanding) about what the column 2-7
characters "mean" and, for text that is "mostly" in those columns, the
algorithmic escapes for columns 10-15 are compact and readible once one
gets used to it.
   It seems to me that, with some careful rethinking and profiling, one
could begin to make a similar case for MNEM in MIME use (I'm repeating
that distinction here because I suspect there may be other applications
that might call for different rules).  It is still "character set",
rather than "transfer encoding" because -- good intuitive guesses and
the ability to learn it quickly aside -- one has to reference external
tables and learn new things to be sure what, e.g., 10646-characters are
involved.  As "character set", you should try to avoid inheriting some
10646 stupidity, e.g., having no formal preference between
single-entity and combining sequence representation of the same
"character".   But, like quoted-printable, there is going to be a "pretty
readable" base set -- columns 2-7 of 8859 for Q-P, some selected rows in
the BMP for 10646 -- and other material that will call for more-or-less
lousy escapes that utilize character position and not "name" or "shape"
or "meaning".  That also suggests that, like quoted-printable, we will
need to develop intuitions and/or guidelines about contexts in which
MNEM is appropriate (and when not) that are somewhat more focused than
the kind of "use it anywhere" language you have periodically suggested.
   Seems to me that this is worth pursuing a bit further, possibly once
we have some experience with how we and others profile 10646 in "native"
10646 applications.
   But, again, in the short run, this implies "not ready for formal
incorporation in a standard".  Would it be reasonable to suggest that
you create a separate distribution list to develop things further and
compare experiences in much the way that ISO-2022-JP was done?

    --john

<Prev in Thread] Current Thread [Next in Thread>