On Jul 17, 2006, at 6:21 AM, Aaron Stone wrote:
On Sat, 2006-07-15 at 15:16 +0100, Alexey Melnikov wrote:
Kjetil Torgrim Homme wrote:
2). Non ASCII text in rejection string - should it cause creation of
DSN/MDN, runtime error or stripping of non-ASCII content?
Should we add a tagged argument to control this?
Or maybe we need another capability to enable UTF-8 clean rejection?
That capability needs to be added to SMTP, right?
Yes, that would be an SMTP extension for allowing of UTF-8 human
readable response text.
I'd rather like to see such a UTF-8 extension, instead of the
workarounds being proposed to shoehorn messages into US-ASCII.
Of course it would be nice to be able to issue other sorts of text
in SMTP responses. But such proposals have been going nowhere for
years. However, the IETF EAI (Email Address Internationalization)
Working Group work may finally kick this into happening. (After
all, currently SMTP responses often include a domain name such as on
the banner line, or an e-mail address such as in a response to a
RCPT TO: or EXPN: command. So internationalizing email addresses
leads right into at least some internationalization of SMTP sessions.)
Note that although that although the EAI focus is on
internationalization of email addresses, their charter is a more
general examination of internationalization of the email environment,
as internationalized email addresses rapidly bring up other issues.
Quoting from Section 1.2, Problem Statement, of
Overview and Framework for Internationalized Email
http://www.ietf.org/internet-drafts/draft-ietf-eai-framework-01.txt
Internationalization of email addresses is not merely a matter of
changing the SMTP envelope, or of modifying the From, To, and Cc
headers, or of permitting upgraded mail user agents (MUAs) to decode
a special coding and display local characters. To be perceived as
usable by end users, the addresses must be internationalized, and
handled consistently, in all of the contexts in which they occur.
That requirement has far-reaching implications: collections of
patches and workarounds are not adequate. Even if they were
adequate, that approach risks an assortment of implementations with
different sets of patches and workarounds having been applied with
consequent user confusion about what is actually be run and
supported. Instead, we need to build a fully internationalized email
environment, focusing on permitting efficient communication among
those who share a language or other community (see [I18Nemail-
constraints] for an extended discussion of this optimization).
And take a look at the mentioned I18Nemail-constraints:
Internationalization in Internet Applications: Issues, Tradeoffs, and
Email Addresses
http://www.ietf.org/internet-drafts/draft-klensin-ima-constraints-00.txt
especially
3.3. Communication across languages and cultures
All of this implies that those who communicate across language and
cultural groups will be required to learn, if they do not understand
already, to be quite self-aware about the use of internationalized
identifiers, as well as other examples of characters or languages,
across those boundaries. There will be a lower level of demands on
those who communicate only in a single language and within a single
culture. This is, of course, not an issue that originated with the
introduction of the Internet: it has been this way since languages
and scripts started to differentiate from each other and since
different cultures came into contact. As we internationalize the
network, a user of a given language that cannot be fully expressed in
ASCII will always be faced with a choice between insisting on the
purism of an email address local part and domain name in the script
associated with the local language and maximizing the number of
people who can communicate with her conveniently. In some cases, the
right answer will be "local language", in others, it will be "ASCII",
and in still others it will be "maintain two addresses". We are not
required, and should not try, to make that choice for users: the
users should make the best choices for their own needs, preferably
after understanding the consequences of the choices. As a community,
we will need to be very clever about user interfaces. As an example
much more general than email, if someone with no ability to read
Chinese characters sees a domain name written in those characters and
decides she wants to copy and paste it somewhere, the copy mechanism
is probably going to need to provide for both "copy the Chinese" and
"convert quietly to punycode and copy that". Either choice, by
itself, will be wrong sometimes. Users who both want to use Chinese-
script domain names and communicate outside that language or script
or culture are going to either learn to understand the difference and
relationship, or develop some good rituals that work, or the network
will keep slapping them in the head with failed lookups or bounced
mail until they do learn. Of course, substantially any language or
script could be substituted for "Chinese" in that example.
Substitute in the words "error response text" where the above discussion
talks about email addresses, and I think it still applies pretty well.
But in any case, take a look at the EAI planned timeline, which is at
this point merely for _experimental_ RFCs (to start testing out ideas
and implementations), and then keep in mind that even supposing an RFC
comes out with an SMTP extension for non-US-ASCII text in SMTP dialogue
(in particular, non-US-ASCII text in SMTP responses), there will be a
long (long!) time before one can forget about the old software that
doesn't
support such. So as far as SMTP responses are concerned, either
sticking
to US-ASCII, or being able to downgrade from non-US-ASCII to US-ASCII
when
dealing with pre-extension SMTP software is going to be necessary for a
long, long time.
So as far as what we can do _now_ for rejection text: sticking with
US-ASCII
rejection text has an efficiency advantage (allows SMTP protocol level
rejection) for those who are willing to accept the restriction of
US-ASCII
text, but some users and user communities will prefer (demand!) to use
their
"own language", as they can with the "old" (original) reject behavior,
even
though that means the cost of generating a notification message (DSN or
MDN)
instead of being able to do SMTP protocol level rejection. And I would
consider that completely appropriate for them to do in non-spam
rejections,
though I might hope/encourage them to stick with US-ASCII and hence make
SMTP protocol level rejection possible for believed-to-be-spam message
rejections.
I would take a quote from above regarding email addresses (local
language
vs. US-ASCII vs. providing both forms of email address) and apply it to
rejection text:
We are not
required, and should not try, to make that choice for users: the
users should make the best choices for their own needs, preferably
after understanding the consequences of the choices.
That is, I believe that we need to allow users -- when they wish -- to
choose which behavior they want. And I do not believe that it is
possible
to avoid at least some user education/training, at least in the form of
a "smart" user interface, especially for users who normally operate in
languages not written using US-ASCII. While Western European language
users may be able to remain happily oblivious of the difference between
SMTP protocol rejection and rejection in the form of a notification
message,
oblivious of any (desired) difference in rejecting spam vs. other
rejections,
users of languages that use/need a different charset _will_ need to
have at
least some awareness (or the interface that generates Sieve filters on
their
behalf will need some awareness) that it is much preferable to reject
believed-to-be-spam messages with the "stick to US-ASCII-only" option,
whereas more "personal" rejections for other reasons can, if the user
wishes,
be rejected using more personalized text in their own language.
Regards,
Kristin
Aaron