Re: Updated MIME "fix" list

John C Klensin writes:

The MNEM charset can convey all of the
ISO 10646 characters without any information loss, and it can
be dispayed on equipment with very modest capabilities, only
ASCII is required.


Keld,
  I feel a little like we are going around in a circle here.

  (i) Unless you have a copy of ISO IS 10646 (not DIS-1.2 plus or minus
SC2 notes) in hand, the assertion that RFC1345-MNEM can represent all of
the characters in it would seem to be to be a little strong.


The way it can represent *all* the IS 10646 characters are via
the _?Uxxxx_ notation. This way it should be possible to represent
them all. 31 bit is also supported.


  (ii) My examination of RFC1345 does not make the MNEM representation
of 10646 composed characters clear.  To use a particularly handy
example, it is not clear to me whether the following 10646 sequences
produce the same RFC1345/MNEM representations and, if they do not, what
representations the second and third would produce (codes are hex
representations within the BMP; if one prefers the 32-bit form, each of
those given should be preceeded by "0000 "):
   00F8
   006F,0338
   0338,006F


10646 is pretty clear on what it means with characters.
Wrt composed characters RFC 1345 is clear, 00F8 is LATIN CAPITAL
O WITH STROKE , there is no special name for the combining STROKE
(I do not have the draft standard at hand, so I just assume
that 0338 is one of the combining strokes). Assuming intro=&

    00F8               &O/
    006F,0338          O&_?u0338_
    0338,006F          &_?u0338_O

If, by support for "all of the ISO 10646 characters" you are assuming
implementation level 1 only, then we need to establish a profile for the
use of 10646 and tie MNEM to it, possibly by revision of RFC1345.  Even
if your intent is to accept the combining characters, you are not
supporting "all of the ISO 10646 characters" but only all of the BMP
characters, i.e., 10646.1.


RFC1345 covers up to level 3 on the character level, and the 31
bit canonical form is covered.

Now, I don't personally have real serious problems with a rule that
says, basically, "implementations should recognize the form
'charset=mnem' and treat it exactly as they would treat
'charset=us-ascii'".


This would only be a minumum treatment, like the minimum treatment
of the ISO 8859 parts. Implementations should be allowed to display 
MNEM in an enhanced character set if they like.

But that would, I think, do a disservice to "real" implementations of
mnemonic, in which one would hope that an implementation receiving it
and having good rendering capability would automatically translate to
10646 or equivalent codings and then make high-quality glyphs for each
"character".  But I think that, as soon as you even *permit* such
improved renderings, you need to have enough of a 10646 profile to make
the needed translating tables, etc., possible.


Yes, if you want to use more than just plain ascii here, you need tables.

  If the statement above (which is close, if not identical, to what you
have asked for with "the support requested is that a MIME compliant UA
should recognize the MNEM charset and just display the characters as
ASCII") were included in RFC1341bis, then an implementation that
displayed the actual glyphs for the characters being symbolized by a
multi-glyph string would be non-conforming.  That isn't what you want,
is it?


It should read "the minimum support requested... " 

Keld