Re: Please comment on draft-duerst-mailto-bis-04.txt


Martin Duerst wrote:

I expect to submit it to the IESG soon.


The draft doesn't depend on the improvements in 2822upd
or 4234bis, referencing RFC 2822 and 4234 is no issue.

 [RFC 2368]

contains some advice against using a bcc field in a
mailto: URI, but this doesn't seem to be followed,
and we were unable to find any reason, so we removed it.


MUAs might not support Bcc, not display it in their 
default configuration, or reject attempts to preset 
Bcc.  Better keep the advice, in an ordinary public
mailto: URL Cc: has more or less the same effect.

I'm not subscribed to ietf-822(_at_)imc(_dot_)org, so please
keep me (and my coauthors) in the cc.


Can't do, will send separate copy from the Reply-To
address, URI list added to the Cc:

==================================================
= Review of mailto-bis-04 (top down single pass) =

You use %2C to separate <addr-spec>s derived from
the RFC 2368 syntax #mailbox based on RFC 822 2.7.
Among other # oddities that forbids runs of comma.

How does %2C match the overall STD 66 syntax ?  If
xxx in a mailto:xxx?yyy pattern is a <hier-part>,
and that's a <path-rootless>, finally arriving at
<segment-nz>, then an unencoded comma is a <pchar> 
matching <sub-delims>, why do you use %2C here ?

My old browser had issues with comma in URLs, but
it was implemented years before STD 66 was written.

The difference between your <some-delims> and the 
<sub-delims> plus ":" and "@" in STD 66 appears to
be "&" and "=" (you need them for &hname=hvalue)
PLUS "/" and "?".  What's the reason to exclude 
"/" and "?" from <some-delims> ?

Whatever you end up with, please explain it in the
draft, comparing obscure ASCII subsets is a PITA.

A note about <addr-spec> says that some characters
have to be percent-encoded because they are not 
allowed in an STD 66 URL.  It took me about a year
to understand that that's beside the point for the
purpose of news: URLs.  Some characters have to be
percent-encoded because they otherwise don't match
<pchar> in STD 66.

That's a subtle difference.  You write that of 'the
characters in sub-delims, at least the following
also have to be percent-encoded: "&", ";", and "="'

I don't see why, you need "&" and "=" only after the
"?", not before in the address list, and you don't
do anything special with ";".  Percent-encoding ","
*within* <addr-spec> would make sense, if you use it
as delimiter for an address list, but at the moment
the draft uses %2C.

Testing mail to "co,ma"@example + "am,oc"@example 
1: mailto:%22co,ma%22(_at_)example%2C%22am,oc%22(_at_)example

That would be IMO strange, cleaner versions could be
2: mailto:%22co,ma%22(_at_)example,%22am,oc%22(_at_)example
3: mailto:%22co%2Cma%22(_at_)example,%22am%2Coc%22(_at_)example

(2) keeps comma as is no matter what its purpose is,
(3) uses an unencoded comma to separate addresses.

You forbid NO-WS-CTL and <obs-local-part>, please add
<obs-domain>.  That's crap like user(_at_)example(oops).com
with comments, folding, and white space on both sides
of the dots separating domain labels.  

IMO you need a MUST NOT for all obs-cenities mirroring
the same MUST NOT in RFC 2822 and 2822upd.  If 2822upd
adds NO-WS-CTL to "obs" and you upgrade the normative
reference remove the then redundant MUST NOT NO-WS-CTL.

Note (3) about comments and whitespace in a local part
is ambiguous, you want to forbid CFWS, but likely not
<quoted-pair> horrors like "sp\ ce"@example - I think
that works, mailto:%22sp%5C%20ce%22(_at_)example is okay (?)

  BTW, for reasons unknown to me "bare space" is not 
  allowed in <quoted-string>, recently discussed on the
  SMTP list.  This could be a bug in RFC 2822 not yet
  fixed in 2822upd, maybe a missing SP in <qtext> (?!?)

Note (4) is odd, there are no "non-ASCII" characters
in domains used for (non-EAI) e-mail, so why discuss 
it, or limit it to domains ?  You could just say that 
all percent-encoded characters that are not ASCII are
supposed to be UTF-8 for compatibility with RFC 3987
and ongoing I18N work (EAI, IDN, IRI). 

While you're at it please note that this won't work
as expected with many UAs and so SHOULD be avoided at
the moment.  All mailto: URI producers SHOULD use the
A-label form of domains, URI consumers might have no
idea what U-labels are, percent-encoding doesn't help
with this issue.  It only helps in non-UTF8 documents
for native mailto: IRIs not in the document charset.

Or maybe for *all* mailto: IRIs in non-UTF8 documents,
the Firefox 2 bug for your non-UTF-8 IRI tests likely
also affects mailto:, not only http:

| When the internationalized domain name is used to
| compose a message, the name must be transformed to
| the IDNA encoding where appropriate [RFC3490].

The appropriate place is IMO the mailto URI producer.

| The considerations for reg-name in [STD66] apply.

I'm not aware of a registry permitting %-characters
in their <reg-name>s.  Please let's focus on the LDH-
labels as required for SMTP in mailto: URLs.  We can
do UTF8SMTP etc. later, mailto is complex enough. :-(

Okay, you have "should A-label" at the end of (4), I
proposed a kind of "temporary" SHOULD above.  Notes
4 and 5 are far too long and confusing, chapter 2 is
for folks trying to understand the mailto: syntax,
incl. users not familiar with EAI / IDN / IRI / I18N.

Notes 4 and 5 could be subsections of a section with
"I18N considerations".  IMO any I18N in 2368bis is
irrelevant before it doesn't at least work for ASCII
and STD 66.

| Percent-encoding is needed for the same characters
| as listed above for "addr-spec".

Stupid question, why ?  Behind the "?" you don't need
to worry about "/" and "?" anymore (just an example).

| The "body" hname should contain the content for the
| first text/plain body part of the message.

s/hname/hvalue/  The body= concept is weird.  It's not
clear what the charset is, if you assume UTF-8 it might
not fly for old UAs, assuming document charset is worse,
assuming local charset of the UA also makes no sense.

Better deprecate it.  Who implemented body= anyway, and
how bad was it ?

| Non-ASCII characters can be encoded in hvalue as follows:

Indeed, it works like a charme, but not for body= hvalue.

| Non-ASCII characters can be encoded according to UTF-8
| [STD63], and then each octet of the corresponding UTF-8
| sequence is percent-encoded to be represented as URI
| characters.

But that doesn't fly.  The MUA started by the browser (if 
that's what happens) can assume that it's running in the
local charset of the operating system, it doesn't need to
support UTF-8 at all.  The only interoperable solution is
what you have as (1), RFC 2047 + 2231.  (2) doesn't work,
not before UTF-8 is the only charset worldwide outside of
museums.  You cannot decree this above what RFC 2277 did.

It's the job of the URI producer to get it right, mailto:
URLs prepare a message/rfc822 with an US-ASCII header.

The URI consumer is the weaker part - you have to protect
them for interoperability, not force them to upgrade when
they're not ready for it.  Shift all "UTF-8 and beyond"
issues into "I18N considerations" - they muddy the water
for the job at hand, define a STD 66 compatible mailto:
URL preparing a 2822upd + MIME compatible message/rfc822.

| mailto:?to=addr1(_at_)an(_dot_)example%2C%20addr2@an.example

If my <segment-nz> theory has merits this is not "nz",
i.e. syntactically invalid.  I'm too lazy to check this
against the regular expression in STD 66.

A hname "to" like "bcc", let alone "body", is a bad idea 
and best avoided.  The "to" function belongs to "mailto",
not into the query part - a very simple mailto: approach
could be to ignore query parts.

| A mailto URI designates an "internet resource", which
| is the mailbox specified in the address.

I'm not sure about this, the "resource" appears to be a
"proto"-message/rfc822, which can be sent to one or more
mailboxes with SMTP (or whatever, but IIRC 2822upd only
mentions SMTP, not UUCP / LMTP / ...) when it's ready. 

Other URI schemes and a MIME access-method deal with a
"mailbox" as a "resource".  Maybe I'm confused, or maybe
it would help if you say "one or more".

| The operation of how any URI scheme is resolved is not
| mandated by the URI specifications.

Depends, the nntp: URL scheme is designed for NNTP, it
would be slightly more complex if it were designed for
article numbers on "non-NNTP" news servers (example).
The NNTP details are not mandated, but the design fits.
JFTR, I guess we agree.

You have a good In-Reply-To example in 7.1, please add
this hname to the "save and useful" list in section 4.

Please remove body= from "save and useful", it's neither
save nor useful, it's a bad idea to start with.  While
keywords= are save they are rarely used.  IMO cc= is a 
more realistic candidate for "save and useful".  Other
bad ideas (in addition to body=, bcc=, and to=) are 
date= or message-id= for obvious reasons.  

Maybe enumerate all potentially "save and useful" header
fields:  subject=, cc=, in-reply-to=, keywords=, is that
really all ?

| When producing mailto: URIs, all spaces SHOULD be 
| encoded as %20.

The given reasons are compelling, justifying MUST.  The
"+" or "_" hacks are for Google or Wikipedia.  Not too
bad in subject= or keywords=, but not in addresses etc.

| The mailto URI scheme is limited in that it does not
| provide for substitution of variables.

That's not a specific mailto: limitation, you could say
"URI schemes are"...

Section 6 is where you could put all UTF-8 notes, maybe
rename it to "Internationalization considerations" as
proposed in RFC 2277.  As is section 6 is a lame excuse
for breaking existing software and annoying poor users.

7.1 + 7.2 are excellent.  

| Applicable protocol:
|  None. This registration is made to assure that
|  this header field name is not used at all, in order
|  to not create any problems for mailto: URIs.

It suffices to reserve it for "mail", I don't see how it
could affect "http" or "news".  SIP has apparently its
own way to register header fields => not your problem.

 Frank