Re: Comments on and FWD: I-DACTION:draft-klensin-emailaddr-i18n-02.txt


Hello John,

I have read your draft. Congratulations, I think it is
extremely well written and well argued.

I have the following comments:

- Editorial: occasionally, please make shorter paragraphs.
  It makes things a bit easier to read.

- Name of the extension: Something a little bit more specific
  than just "I18N" would probably be better.

- 3.4.1: Quite a bit of text is spent on the idea to upgrade
  Web browsers to automagically detect and convert special
  (e.g. ACE-like) encoding forms in the context of Webmail.
  This is a total non-starter, and you should just say so.
  What I have not seen discussed, but is much more feasible
  (although it's still much better to not have to do it) is
  to upgrade the webmail software on the server side to
  convert things from ACE or whatever to actual characters
  (or numeric character references if the charset used in
  the Web page has limitations) and back.

- Editorial: I don't think you have used the term ACE (ASCII
  compatible encoding), but it would be more precise in some cases
  (instead of writing something like 'IDN/punicode-like').

- Editorial: In point 2. of 4.1, it's unclear how many alternatives
  there are, or what goes logically together.

- Regarding point 3. in 4.1, and point 7.2, I think allowing
  anything else but UTF-8 should not only be strongly discouraged,
  it should be forbidden. There are some reasons for this:
  - Does the current LHS allow anything else than US-ASCII?
  - Not having a globally consistent way to interpret the
    bytes in there as characters creates all kinds of problems.
  - For security reasons (e.g. to avoid smuggling of syntax-
    relevant US-ASCII characters in overlong encoding), on
    may want to run various checks on UTF-8. This won't
    work with arbitrary byte sequences.

- Editorial: 4.3, first paragraph, overstriking: This indeed has been
  used in the past. Is it still used? Where? I would tone this down
  a bit more, e.g. change "that there is a long history" ->
  "that historically, there has been some use".

- Editorial: 4.3 "it SHOULD first verify that the string is
  valid for a domain name according to IDNA rules": There should
  be a reference to the relevant section/operation/flags in IDNA.

- 6., point 4.: Yes, please bundle UTF-8-HEADERS and your proposal,
  and make 8BITMIME mandatory. This will strongly simplify
  implementation and deployment.

- 7.1: allowing punycode in domain names with UTF-8 LHS seems
  no big harm. Mail should still be delivered in such a case.
  But the question (more for Paul's draft than for yours) is
  whether MUAs should convert this to something readable.
  Probably saying MAY is the right thing here, to mark the
  use of punycode in this case as an occasional option, not
  something that should be used in general.

- 7.2: See above.

- 7.3: First a stupid question: What characters are excluded by
  prohibiting quoted strings or characters? If it's things
  such as spaces, quotes, and so on, excluding them is probably
  not a bad idea.
  Rather than using restrictions, it may be good to give some
  guidelines.
  Some guidelines or restrictions may be needed for right-to-
  left characters/bidirectional issues.

7.4: see above; yes, please require 8BITMIME

7.5: Yes, if this extension and 8BITMIME are in use,
     RFC 2047 encoding should be dropped.

8: Editorial: "accomodate local character sets": Is this sets
   in the strict mathematical sense (in this case I suggest
   using 'character repertoires') or does it include encodings
   (then I would suggest 'character encodings' or maybe 'charset's).
   "some character script other than ASCII": Latin, Cyrillic,...
   are scripts. ASCII isn't a scrip.

8: ideas such as SEPARATOR="...": In my opinion, don't.
   Scripts around the world have imported/accepted punctuation from
   other scripts (in particular Latin) much better than letters.


Regards,   Martin.


At 12:41 04/01/30 -0500, John C Klensin wrote:

Hi. This draft is an update on the SMTP approach to mailboxinternationalization.


Three things:

        * I am copying the SMTP list on this because everything
        is getting connected to everything else, as my on-list
        discussion with Nathaniel about the "trace fields in the
        envelope" proposal illustrates.  Please, to preserve
        everyone's sanity, do not start parallel discussions or
        cross-post -- this one is clearly an IMAA issue, at
        least for historical reasons.

        * There is one more proposal I-D coming in this series.
        As a one-line summary, it outlines ("specifies" would be
        too strong at this stage, but that is the intent) the
        encapsulation model that this draft suggests for
        downgrading.

        * Very high-level summary of changes from the -01
        version: I have become convinced that, if we are going
        to have an internationalized structure for email --not
        merely a collection of kludges and workarounds-- we are
        going to need to make some rather basic changes.  They
        include SMTP extensions for i18n addresses and alternate
        addresses and UTF-8 headers as well as UTF-8 headers, in
        some form, themselves.  I've picked up on Paul's UTF-8
        header proposal because it seems sensible and nothing
        else is on the table, but the proposal announced below
        eliminates the need for alternate address headers and
        all of the mucking around in the message payload by MTAs
        they imply.   And I'm convinced that we should have
        _one_ header to cover this rather than a collection.
        I.e., the model is that one is internationalized or not,
        rather than one in which internationalization is done
        piecemeal and incrementally, with all of the profiling,
        multiplicative cases, and long-term cruft that would
        imply.

Please do not react to what you think is in this proposal without readingit or, especially based only on what you think it is about from readingthe above. That is a waste of your time and that of everyone on thelist, as previous rounds of such discussions amply demonstrate.


best,
   john