ietf-822
[Top] [All Lists]

Re: i18n headers straw poll

2003-02-25 09:13:56

[ietf-822 added to copies due to the IDN and msg-id issues]
[the charset of the body content of this message may be
 incorrectly labeled]

J.B. Moreno wrote:
On 2/24/03 4:48 PM, Bruce Lilly at <blilly(_at_)erols(_dot_)com> wrote:

 IMO this is a killer for Punnycode anywhere except the
Newsgroups header -- you post a message that looks like junk to everyone
(which is the current case for punycode) and it'll get flamed to death (well,
by those that actually see it, punycode is likely to be caught by a lot of
spam filters).



Are you claiming that IDNs cannot or will not work? Or for that matter
local-parts if/when they are standardized?


No, domain names have no pre-existing history that will be a problem

If you mean non-conforming use as in the case of Usenet newsgroup
names, they do have a pre-existing history; the following header
excerpt constitutes an existence proof of that (and does the same
for local-part as well):

Received: from [61.72.6.29] (helo=³ë´ÂÄÄÇ»ÅÍ)
        by mx02.mrf.mail.rcn.net with esmtp (Exim 3.35 #4)
        id 18MVgZ-0000lw-00
        for blilly(_at_)erols(_dot_)com; Thu, 12 Dec 2002 10:55:39 -0500
Received: from mail pickup service by ³ë´ÂÄÄÇ»ÅÍ with Microsoft SMTPSVC;
         Fri, 14 Dec 2001 00:58:54 +0900
From: "=?euc-kr?B?yKu6uLTjtOfA2g==?=" <sales(_at_)kitech21(_dot_)com>
To: <blilly(_at_)erols(_dot_)com>
Subject: =?euc-kr?B?W8irurhdILzux8649CDHz7OqILi4teW8xbytILW3ILi5wMwgufa8vA==?=
        =?euc-kr?B?v+R+fn5+fn4=?=
Date: Fri, 14 Dec 2001 00:58:53 +0900
MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="----=_NextPart_000_45A0_01C1843A.8043A150"
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-ID: <³ë´ÂÄÄÇ»ÅÍCcSkpLiSV00001169(_at_)³ë´ÂÄÄÇ»ÅÍ>

Note the msg-id. Also the Received fields.  I have no idea
what charset or language is supposed to be in use, since there's
no indication (I could guess that it might be euc-kr as in
the properly-tagged From and Subject fields, but that would
be no more than a guess, and might be wrong[*]).  Nevertheless,
there is clearly some non-ASCII stuff in both the domain
(to the right of the '@' in the msg-id, which is identical
to what was (illegally) supplied in the ESMTP EHLO command
as documented in the top Received field, and to what was
claimed as a domain name in the lower Received field) and
the local-part (to the left of the '@' in the msg-id). Yes,
such use is non-conforming (and there are plently of instances
of non-conformance in this excerpt), but nevertheless such
use is hereby documented.

Q.E.D.

Anyway, as far as non-ASCII newsgroup names are concerned,
Charles has repeatedly stated that there are *no* such
groups, so either there is "no pre-existing history that
will be a problem" for newsgroup names or Charles is wrong.
Either way punycode seems to be a viable approach to
newsgroup name i18n -- the existence of one or more
non-conforming newsgroup names or (spam) mail messages
isn't an obstacle.  And I haven't seen it proposed in
any other places but IDN, local-part, and newsgroup names
(though if it works for IDN, it'll probably end up in
Path and Xref, and I wouldn't be surprised if there were
consideration for Distribution as well).

Notes:
* As RFC 1557 (the reference for EUC-KR) notes, ESC $ ) C
  must appear before any S0 characters, and as there's no
  such sequence, the charset used for the alleged domain
  name is certainly not valid EUC-KR. Moreover, EUC-KR
  uses two octets per Korean character, and the alleged
  local-part has an odd number of octets.  Whatever it
  is, in the words of Shakespeare "it was as Greek to me"
  (actually it's not Greek, since it clearly isn't
  ISO-8859-7 either; it is equally not Korean and not Greek).


<Prev in Thread] Current Thread [Next in Thread>