ietf-mxcomp
[Top] [All Lists]

Protocol - Support for internationalization?

2004-07-21 03:38:02

Referencing:  draft-ietf-marid-spf-3-protocol-00.txt

Section 5.2 ext: Explanation

The explanation string is intended to be displayed to legitimate senders in the
form of a short message or URL, via the SMTP receiver.

I.e. it is intended to be read by 'end user', probably non-technical, humans.

I can't see any support for international characters in the draft.

RFC1035 section 3.3.14 (which defines DNS TXT records) appears to delegate the
semantics of the text to the using domain/context, and seems to have nothing to
say about the encoding of a <character-string>.

If international characters are to be supported there would have to be:

(1) Either a specification of the mandatory character set to be used (e.g.
Unicode), and the encoding to be used (e.g. UTF-8) or some means of indicating
that character set/encoding on a message -specific basis,

(1) The details of how code points (characters) in this encoding  are to be
represented in the DNS TXT record (such as %-escaped UTF-8).

The protocol macro language (section 7.1) states that uppercased macros are
URL-encoded and references RFC2396.

This implies that the URL %HH method of inserting non-ASCII and URL-illegal
octet values may (MUST?) be used, but, like RFC2396, says nothing about points
(1) and (2) above.

There has been a lot of W3C work recently on Internationalised Resource
Identifiers (IRI):
 http://www.w3.org/International/iri-edit/draft-duerst-iri-09.txt

Using IRI syntax, if Mike Dürst wanted to put his name into the DNS TXT
explanation  (the ü has an umlaut) he would encode it in UTF-8 using the % URL
escaping as
"Mike D%C3%BCrst".

Explanations are intended to be used in 'bounces'. The explanations are
associated with the domain of the message originator, the bounce message
generation is done by the MTA of the receiving domain, so the receiving MTA MUST
assume that all explanations are UTF-8 encoded.

    Note:  US-ASCII is a sub-set of UTF-8,
    so all 'plain english' messages would be correctly represented.

The bounce message generator will have to tag _all_ messages containing
explanations as using UTF-8 (by using the appropriate message / MIME-part
Header).

Should the protocol use / reference IRIs and require bounces incorporating
originator-supplied explanations to assume the use of UTF-8?

Or is there / should there be some other way of supporting internationalized
messages?

Or is only US-ASCII to be supported?

Chris Haynes



<Prev in Thread] Current Thread [Next in Thread>