Re: recommended encoding for RFC-XXXX

  In order to prevent any uncertainty about this intent in potential
readers of RFC-XXXX, and to avoid needless linguistic bias towards the
non-alphabetic languages, we should avoid formally "recommending" ANY
other format in the RFC text.  This explicitly means that I think that
"mnemonic" should be clearly permitted but should not be specified as
the officially "recommended" format for either headers or body in the
RFC.

        
% One problem with the current 10646 (UNICODE merged) is that it is
% inherently 8-bit which then needs an encoding. This can be base64,
% quoted-printable or mnemonic.
        
% Eunet and nordunet has specified that they want mnemonic as the
% recommended encoding in the new RFC standard.
        
  Base64 encoding of IS-10646 (whenever it happens) would be vastly
superior in its multilingual capabilities and would completely remove
all of the concerns about Asian language support that strongly remain
with regards to any of the alternatives to IS-10646 encodings.

  While I understand that EUnet and NORDUnet's concerns are
legitimately focused on their users, who are European and use European
languages, the IETF  *MUST*  be more global in outlook.  Adoption of any
encoding other than IS-10646 in BASE64 as "recommended" clearly
represents a bias against the users of CJK ideogrammatic languages.
Such a bias would be needlessly and shamelessly Euro-centric.  This is
in no way suggesting that mnemonic should be marked as "not
recommended."  In my mind there should be 3 different classes of
items, namely:

  1) the single recommended preferred way of sending enhanced mail 
     (e.g. IS-10646 in BASE64)
  2) alternative methods which work well in particular areas,
     but are less general solutions (e.g. Mnemonic) and are
     not marked either as "recommended" or as "not recommended".
  3) methods which are not recommended and whose implementation
     might or might not be specified as optional, but which are 
     specified because they couldn't be omitted from the draft for
     one reason or another (until a clearer, more detailed spec
     appears this would include JIS-ISO-2022)

  One other thing to recall is that the current RFC-ZZZZ does not
limit itself to just having 7-bit and 8-bit transport, but adds the
architectural hooks needed so that 16-bit and 32-bit transport are
easily added once IS-10646 becomes a reality.  The encoding and
decoding of native IS-10646 to BASE64 IS-10646 is much faster and
easier to correctly implement than would be a 32-bit wide lookup table
as the proposed Mnemonic encoding would require.  For example, mail
gateways handling wide<-->7-bit conversion could do so much more
easily to/from BASE64 to native IS-10646 with significantly lower
costs.  There are a number of similar examples.

  Keld's document on Asian language wrt Mnemonic which he mailed to me
(and to which I responded back to him with concerns and questions, for
example it turns out that only characters used in the Beijing dialect
of Chinese are going to be in Mnemonic so characters commonly
appearing in Hong Kong daily newspapers simply won't be available at
all) indicates that the Chinese language alone will have order 15K
entries in Mnemonic.  These are no longer reasonably sized lookup
tables after one includes Chinese and the other Asian ideogrammatic
languages and all of the other languages characters defined in
2DIS-10646.  The costs of preferring Mnemonic are not limited just to
the CJK users but will instead impact all who have IS-10646 capable
equipment (such as X windows and MS Windows and other platforms will
be), though the impact is disproportionately harsh on CJK users.

  I am (and have been) on the record that implementation of Mnemonic
is probably worthwhile for the alphabetic languages and am not
lobbying for its removal.  I remain unconvinced that it is anywhere
near as useful for ideogrammatic langauges and others have indicated
the same concerns on this list.  I'm trying very hard to be reasonable
and firmly on culture- and language-neutral territory here.

  There are clear technical reasons (one is cited above) why a BASE64
representation of IS-10646 will be preferable to a MNEMONIC
representation of the same text once the IS is a done deal.  Note
that until 10646 stabilises Mnemonic cannot stabilise since Keld
has said that Mnemonic will track the DIS' contents and so must
be unstable to precisely the same degree as the DIS.

  Ran
  atkinson(_at_)itd(_dot_)nrl(_dot_)navy(_dot_)mil