Hello John,
I have read your draft. Congratulations, I think it is
extremely well written and well argued.
I have the following comments:
- Editorial: occasionally, please make shorter paragraphs.
It makes things a bit easier to read.
- Name of the extension: Something a little bit more specific
than just "I18N" would probably be better.
- 3.4.1: Quite a bit of text is spent on the idea to upgrade
Web browsers to automagically detect and convert special
(e.g. ACE-like) encoding forms in the context of Webmail.
This is a total non-starter, and you should just say so.
What I have not seen discussed, but is much more feasible
(although it's still much better to not have to do it) is
to upgrade the webmail software on the server side to
convert things from ACE or whatever to actual characters
(or numeric character references if the charset used in
the Web page has limitations) and back.
- Editorial: I don't think you have used the term ACE (ASCII
compatible encoding), but it would be more precise in some cases
(instead of writing something like 'IDN/punicode-like').
- Editorial: In point 2. of 4.1, it's unclear how many alternatives
there are, or what goes logically together.
- Regarding point 3. in 4.1, and point 7.2, I think allowing
anything else but UTF-8 should not only be strongly discouraged,
it should be forbidden. There are some reasons for this:
- Does the current LHS allow anything else than US-ASCII?
- Not having a globally consistent way to interpret the
bytes in there as characters creates all kinds of problems.
- For security reasons (e.g. to avoid smuggling of syntax-
relevant US-ASCII characters in overlong encoding), on
may want to run various checks on UTF-8. This won't
work with arbitrary byte sequences.
- Editorial: 4.3, first paragraph, overstriking: This indeed has been
used in the past. Is it still used? Where? I would tone this down
a bit more, e.g. change "that there is a long history" ->
"that historically, there has been some use".
- Editorial: 4.3 "it SHOULD first verify that the string is
valid for a domain name according to IDNA rules": There should
be a reference to the relevant section/operation/flags in IDNA.
- 6., point 4.: Yes, please bundle UTF-8-HEADERS and your proposal,
and make 8BITMIME mandatory. This will strongly simplify
implementation and deployment.
- 7.1: allowing punycode in domain names with UTF-8 LHS seems
no big harm. Mail should still be delivered in such a case.
But the question (more for Paul's draft than for yours) is
whether MUAs should convert this to something readable.
Probably saying MAY is the right thing here, to mark the
use of punycode in this case as an occasional option, not
something that should be used in general.
- 7.2: See above.
- 7.3: First a stupid question: What characters are excluded by
prohibiting quoted strings or characters? If it's things
such as spaces, quotes, and so on, excluding them is probably
not a bad idea.
Rather than using restrictions, it may be good to give some
guidelines.
Some guidelines or restrictions may be needed for right-to-
left characters/bidirectional issues.
7.4: see above; yes, please require 8BITMIME
7.5: Yes, if this extension and 8BITMIME are in use,
RFC 2047 encoding should be dropped.
8: Editorial: "accomodate local character sets": Is this sets
in the strict mathematical sense (in this case I suggest
using 'character repertoires') or does it include encodings
(then I would suggest 'character encodings' or maybe 'charset's).
"some character script other than ASCII": Latin, Cyrillic,...
are scripts. ASCII isn't a scrip.
8: ideas such as SEPARATOR="...": In my opinion, don't.
Scripts around the world have imported/accepted punctuation from
other scripts (in particular Latin) much better than letters.
Regards, Martin.
At 12:41 04/01/30 -0500, John C Klensin wrote:
Hi. This draft is an update on the SMTP approach to mailbox
internationalization.
Three things:
* I am copying the SMTP list on this because everything
is getting connected to everything else, as my on-list
discussion with Nathaniel about the "trace fields in the
envelope" proposal illustrates. Please, to preserve
everyone's sanity, do not start parallel discussions or
cross-post -- this one is clearly an IMAA issue, at
least for historical reasons.
* There is one more proposal I-D coming in this series.
As a one-line summary, it outlines ("specifies" would be
too strong at this stage, but that is the intent) the
encapsulation model that this draft suggests for
downgrading.
* Very high-level summary of changes from the -01
version: I have become convinced that, if we are going
to have an internationalized structure for email --not
merely a collection of kludges and workarounds-- we are
going to need to make some rather basic changes. They
include SMTP extensions for i18n addresses and alternate
addresses and UTF-8 headers as well as UTF-8 headers, in
some form, themselves. I've picked up on Paul's UTF-8
header proposal because it seems sensible and nothing
else is on the table, but the proposal announced below
eliminates the need for alternate address headers and
all of the mucking around in the message payload by MTAs
they imply. And I'm convinced that we should have
_one_ header to cover this rather than a collection.
I.e., the model is that one is internationalized or not,
rather than one in which internationalization is done
piecemeal and incrementally, with all of the profiling,
multiplicative cases, and long-term cruft that would
imply.
Please do not react to what you think is in this proposal without reading
it or, especially based only on what you think it is about from reading
the above. That is a waste of your time and that of everyone on the
list, as previous rounds of such discussions amply demonstrate.
best,
john