ietf
[Top] [All Lists]

Re: Last Call: draft-duerst-mailto-bis (The 'mailto' URI Scheme) to Proposed Standard

2010-03-08 04:11:03
Hello John,

Many thanks for your comments. I'm sorry my reply is late.

On 2010/01/22 0:58, John C Klensin wrote:


--On Monday, November 30, 2009 07:38 -0800 The IESG
<iesg-secretary(_at_)ietf(_dot_)org>  wrote:

The IESG has received a request from an individual submitter
to consider  the following document:

- 'The 'mailto' URI Scheme '
    <draft-duerst-mailto-bis-07.txt>  as a Proposed Standard

The IESG plans to make a decision in the next few weeks, and
solicits final comments on this action.  Please send
...

Hi.

I thought I had sent notes on this some weeks ago, but can find
no record of having done so,

Neither could I, sorry.

so apologize for the late
submission.

The mailto specification exposes several of the problems with
the interaction between the URI model and the syntactic and
semantic conventions of assorted protocols, especially
protocols that were specified and deployed long before there
was such a thing as a URI.   In this case, that situation is
complicated by the observations that mailto URIs are very
widely deployed, making backward compatibility important, and
that the existing specification in RFC 2368.

Because of the lateness of this review, I'm ignoring issues
that I don't consider especially significant.   I believe that
the following issues _are_ significant:

(1) Special characters, particularly "+", and percent-encoding

The specification talks about the need to encode various
special characters, particularly characters that have reserved
meanings in the URI specification such as "%" and "/".  One of
the failings in prior mailto specifications was that the state
of "+" was left ambiguous wrt whether it needed to be encoded
or not.  "+" is heavily used in subaddress techniques and,
partially because of the interactions noted in Section 5 of
this document, has caused a problematic interaction with the
use of the same character as an encoding for a blank space in
web forms.  The problem is noted and discussed in more detail
in RFC 3696.

Despite the discussion in the third paragraph of Section 5,
the document leaves ambiguous whether the correct
representation of an email address like john+ietf(_at_)example(_dot_)com
in a mailto URI is

    mailto:john+ietf(_at_)example(_dot_)com      or
    mailto:john%2Bietf(_at_)example(_dot_)com

Both of these are correct. There is no real ambiguity, all characters not specially mentioned (including 'a'-'z',...) just stand for themselves, and '+' is part of this. And all such characters can be escaped, although that's not usually done (see below).


and whether either of those, if interpreted in a web form
context, is expected to be treated as

   john+ietf(_at_)example(_dot_)com              or
   "john ietf"@example.com

The former.

both of which are valid addresses under RFC 5321 (see the
production for "qtextSMTP" there -- the
"john\ ietf"@example.com form is not required.

I have added a sentence about subaddresses, i.e. I have changed:

When producing 'mailto' URIs, all spaces SHOULD be encoded as %20.

to:
      When producing 'mailto' URIs, all spaces SHOULD be encoded as %20,
      and '+' characters MAY be encoded as %2B.
      Please note that '+' characters are frequently used as part of
      an email address to indicate a subaddress, as for example in
      <bill+ietf(_at_)example(_dot_)org>.

I hope this helps.

It is also worth noting that, while

    mailto:joe(_at_)example(_dot_)com           and
    mailto:joe%65(_at_)example(_dot_)com

are considered equivalent in this specification,

No, not exactly.

      mailto:joe(_at_)example(_dot_)com       and
      mailto:jo%65(_at_)example(_dot_)com

are considered equivalent. Not just by this specification, but by RFC 3986 to start with.

the email
addresses

    joe(_at_)example(_dot_)com               and
    joe%65(_at_)example(_dot_)com

are formally different and may have quite different semantics
(only the final delivery SMTP server knows).

I assume this still applies to the mail addresses

     joe(_at_)example(_dot_)com       and
     jo%65(_at_)example(_dot_)com

That's all well and good. Both

      mailto:joe(_at_)example(_dot_)com       and
      mailto:jo%65(_at_)example(_dot_)com

stand for

     joe(_at_)example(_dot_)com

while to reach

     jo%65(_at_)example(_dot_)com

you have to use

     mailto:jo%2565(_at_)example(_dot_)com

The draft clearly says that '%' in an email address has to be escaped, and that's just what we are doing here. This may not really be easy, but it's clearly defined, and it's not rocket science. There's also an example, gorby%kremvax(_at_)example(_dot_)com, in the example section.


That ambiguity is not just an encoding issue and difficulty for
those who use subaddresses.  It creates a vector for potential
attacks that is not noted in Security Considerations

Could you expand on the 'potential for attacks'? I understand that a lot can go wrong with escaping if one isn't careful, but "going wrong" doesn't necessarily translate into "attacks".

(that
section concentrates more on social problems, such as address
harvesting and information exposure, than on actual attacks on
the mail protocols and system).  More generally, while the
document makes the observation

        "Care has to be taken both when encoding as well as when
        decoding to make sure these operations are applied only
        once."

(from the end of Section 2(1)) it does not discuss how that is
to be done,

I'm not sure any explanation is necessary. If you put a mail address into a mailto URI, you escape, if you take it out, you unescape.


nor does it note the risks of not doing it in
Security Considerations.  That is important because there is
some anecdotal evidence that the rule is widely violated,
especially in web applications that move information back and
forth between mailto URI and email address formats.


(2) I18n issues

While the authors have done a careful and thoughtful job of
trying to anticipate the needs of the long-term (i.e.,
post-experimental) EAI work, there are possible ambiguities
that are not considered in addition to the "alternate address"
issue mentioned in Paragraph 3 of Section 1.  Some of the
important ones of these are the non-ASCII equivalent of the
discussion above: Because RFC 5321 fundamentals that are not
changed (or proposed to be changed) by the EAI work imply that

    duerst(_at_)example(_dot_)com
    dürst(_at_)example(_dot_)com                and
    d%C3%BCrst(_at_)example(_dot_)com

represent three different target mailboxes

Yes, they represent three different mailboxes. But the mailto URIs/IRIs:

    mailto:duerst(_at_)example(_dot_)com
    mailto:dürst(_at_)example(_dot_)com               and
    mailto:d%C3%BCrst(_at_)example(_dot_)com

represent only two different mailboxes, namely
duerst(_at_)example(_dot_)com for the first mailto URI, and dürst(_at_)example(_dot_)com for the second and third one. The mailbox d%C3%BCrst(_at_)example(_dot_)com would be denoted by mailto:d%25C3%25BCrst(_at_)example(_dot_)com

(unless the final
delivery server makes some decision to the contrary).  Again,
extreme care about the sequencing of decoding and other
interpretation can bypass the problem, but the document is not
nearly cautious enough about this and especially the security
and "user surprise" implications of trying to do that in
distributed modules so that operations are performed out of
order and/or other than exactly once.

I have added the following paragaraph to the security section:

Programs manipulating 'mailto' URIs SHOULD take great care to not
inadvertedly double-escape or double-unescape 'mailto' URIs, and
to make sure that escaping and unescaping conventions relating to URIs
and relating to mail addresses are applied in the right order.

I hope this addresses your concerns.


(3) Interactions between RFC 5321 and 5322.

The specification covers over the subtle differences between
envelope and header addresses, treating addr-spec and
?to=<hfvalue>  as effectively equivalent.  Differences between
the implications and semantics of the envelope/delivery address
and the header field "To:", which are quite clearly
distinguished in RFC 5321 and 5322 are ignored or prohibited.
Possibly that is a reasonable design choice, but it is not
discussed.  In my opinion, if the functionality the difference
implies is going to be inaccessible via the mailto URI, that
decision should be discussed, if only to prevent confusion,
poor implementations, and misuse.

Are you aware of any such poor implementations, or misuse? I'm only aware of the misuse of making the envelope and header To: different (often used by spammers), and same of course for From, although it doesn't apply here, because the spec says to ignore any from= in the URI. If yes, can you supply actual text?


Regards,    Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   
mailto:duerst(_at_)it(_dot_)aoyama(_dot_)ac(_dot_)jp
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Last Call: draft-duerst-mailto-bis (The 'mailto' URI Scheme) to Proposed Standard, "Martin J. Dürst" <=