Many thanks for your comments. I'm sorry my reply is late.
On 2010/01/22 0:58, John C Klensin wrote:
--On Monday, November 30, 2009 07:38 -0800 The IESG
The IESG has received a request from an individual submitter
to consider the following document:
- 'The 'mailto' URI Scheme '
<draft-duerst-mailto-bis-07.txt> as a Proposed Standard
The IESG plans to make a decision in the next few weeks, and
solicits final comments on this action. Please send
I thought I had sent notes on this some weeks ago, but can find
no record of having done so,
Neither could I, sorry.
so apologize for the late
The mailto specification exposes several of the problems with
the interaction between the URI model and the syntactic and
semantic conventions of assorted protocols, especially
protocols that were specified and deployed long before there
was such a thing as a URI. In this case, that situation is
complicated by the observations that mailto URIs are very
widely deployed, making backward compatibility important, and
that the existing specification in RFC 2368.
Because of the lateness of this review, I'm ignoring issues
that I don't consider especially significant. I believe that
the following issues _are_ significant:
(1) Special characters, particularly "+", and percent-encoding
The specification talks about the need to encode various
special characters, particularly characters that have reserved
meanings in the URI specification such as "%" and "/". One of
the failings in prior mailto specifications was that the state
of "+" was left ambiguous wrt whether it needed to be encoded
or not. "+" is heavily used in subaddress techniques and,
partially because of the interactions noted in Section 5 of
this document, has caused a problematic interaction with the
use of the same character as an encoding for a blank space in
web forms. The problem is noted and discussed in more detail
in RFC 3696.
Despite the discussion in the third paragraph of Section 5,
the document leaves ambiguous whether the correct
representation of an email address like john+ietf(_at_)example(_dot_)com
in a mailto URI is
Both of these are correct. There is no real ambiguity, all characters
not specially mentioned (including 'a'-'z',...) just stand for
themselves, and '+' is part of this. And all such characters can be
escaped, although that's not usually done (see below).
and whether either of those, if interpreted in a web form
context, is expected to be treated as
both of which are valid addresses under RFC 5321 (see the
production for "qtextSMTP" there -- the
"john\ ietf"@example.com form is not required.
I have added a sentence about subaddresses, i.e. I have changed:
When producing 'mailto' URIs, all spaces SHOULD be encoded as %20.
When producing 'mailto' URIs, all spaces SHOULD be encoded as %20,
and '+' characters MAY be encoded as %2B.
Please note that '+' characters are frequently used as part of
an email address to indicate a subaddress, as for example in
I hope this helps.
It is also worth noting that, while
are considered equivalent in this specification,
No, not exactly.
are considered equivalent. Not just by this specification, but by RFC
3986 to start with.
are formally different and may have quite different semantics
(only the final delivery SMTP server knows).
I assume this still applies to the mail addresses
That's all well and good. Both
while to reach
you have to use
The draft clearly says that '%' in an email address has to be escaped,
and that's just what we are doing here. This may not really be easy, but
it's clearly defined, and it's not rocket science. There's also an
example, gorby%kremvax(_at_)example(_dot_)com, in the example section.
That ambiguity is not just an encoding issue and difficulty for
those who use subaddresses. It creates a vector for potential
attacks that is not noted in Security Considerations
Could you expand on the 'potential for attacks'? I understand that a lot
can go wrong with escaping if one isn't careful, but "going wrong"
doesn't necessarily translate into "attacks".
section concentrates more on social problems, such as address
harvesting and information exposure, than on actual attacks on
the mail protocols and system). More generally, while the
document makes the observation
"Care has to be taken both when encoding as well as when
decoding to make sure these operations are applied only
(from the end of Section 2(1)) it does not discuss how that is
to be done,
I'm not sure any explanation is necessary. If you put a mail address
into a mailto URI, you escape, if you take it out, you unescape.
nor does it note the risks of not doing it in
Security Considerations. That is important because there is
some anecdotal evidence that the rule is widely violated,
especially in web applications that move information back and
forth between mailto URI and email address formats.
(2) I18n issues
While the authors have done a careful and thoughtful job of
trying to anticipate the needs of the long-term (i.e.,
post-experimental) EAI work, there are possible ambiguities
that are not considered in addition to the "alternate address"
issue mentioned in Paragraph 3 of Section 1. Some of the
important ones of these are the non-ASCII equivalent of the
discussion above: Because RFC 5321 fundamentals that are not
changed (or proposed to be changed) by the EAI work imply that
represent three different target mailboxes
Yes, they represent three different mailboxes. But the mailto URIs/IRIs:
represent only two different mailboxes, namely
duerst(_at_)example(_dot_)com for the first mailto URI, and dürst(_at_)example(_dot_)com for
the second and third one. The mailbox d%C3%BCrst(_at_)example(_dot_)com would be
denoted by mailto:d%25C3%25BCrst(_at_)example(_dot_)com
(unless the final
delivery server makes some decision to the contrary). Again,
extreme care about the sequencing of decoding and other
interpretation can bypass the problem, but the document is not
nearly cautious enough about this and especially the security
and "user surprise" implications of trying to do that in
distributed modules so that operations are performed out of
order and/or other than exactly once.
I have added the following paragaraph to the security section:
Programs manipulating 'mailto' URIs SHOULD take great care to not
inadvertedly double-escape or double-unescape 'mailto' URIs, and
to make sure that escaping and unescaping conventions relating to URIs
and relating to mail addresses are applied in the right order.
I hope this addresses your concerns.
(3) Interactions between RFC 5321 and 5322.
The specification covers over the subtle differences between
envelope and header addresses, treating addr-spec and
?to=<hfvalue> as effectively equivalent. Differences between
the implications and semantics of the envelope/delivery address
and the header field "To:", which are quite clearly
distinguished in RFC 5321 and 5322 are ignored or prohibited.
Possibly that is a reasonable design choice, but it is not
discussed. In my opinion, if the functionality the difference
implies is going to be inaccessible via the mailto URI, that
decision should be discussed, if only to prevent confusion,
poor implementations, and misuse.
Are you aware of any such poor implementations, or misuse? I'm only
aware of the misuse of making the envelope and header To: different
(often used by spammers), and same of course for From, although it
doesn't apply here, because the spec says to ignore any from= in the
URI. If yes, can you supply actual text?
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
Ietf mailing list