ietf
[Top] [All Lists]

Re: Last Call: draft-ietf-imapext-sort (INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS) to Proposed Standard

2008-02-27 21:22:07
The proposed changes in the comments below create significiant 
incompatibilities with multiple interoperable client and server 
implementations that have been in production use and widely distributed 
worldwide for several years.

The result of making any of these changes would be instability and 
inconsistency between implementations, creating an environment in which 
nobody can use these extensions because there is no reliable behavior.

This document and its protocol are not new.  Both have been around for 
many years, and their publication was delayed unreasonably, due to (what 
is now generally recognized to have been) a false belief that 
internationalization had to be solved first.

Any of these changes would add further multi-year delay to this protocol 
and specification.  Some of these involve considerable complexity which 
will require long discussion to hammer down.

I recommend that these issues be punted to future work in new standards.

Further comments interspersed below.

On Wed, 27 Feb 2008, Dan Karp wrote:

The IESG is considering the following document again now that
important dependencies are ready:

- 'INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS'
   <draft-ietf-imapext-sort-19.txt> as a Proposed Standard

Outlook (and some other clients) has an alternative solution to the
problem of localized followup/reply prefixes during the base subject
extraction process.  I believe it treats *any* string of 1 to 3 letter
characters followed by a colon as an ignorable prefix.  This may result
in too many false positives, but it's an alternative that you may want
to consider.

It does indeed result in false positives, and is arguably a kludge aimed 
more at Outlook's non-compliant alternatives to "re:" (which *is* defined 
as the One True Reply Prefix in RFC 2822) than the forward prefix problem.

I don't see unambiguous benefit of instituting this incompatible change, 
there is cost with new false positives and instability/unreliability of 
behavior between implementations.

I certainly do not want to redefine IETF standards as "follow whatever 
Microsoft [or other big vendor] does."  Outlook generates non-compliant 
"localized reply prefixes" based upon a false notion that "re:" is an 
abbreviation for the English word "reply".  This is a bug that should be 
fixed in Outlook.

Also, the proposed sorting rules for the address fields (FROM, TO, and
CC) are in general worse than the rules that most clients currently use
for sorting on these headers:

"worse" is a matter of opinion.  Reasonable people may disagree and 
may consider what "most clients" do to be "worse".

 - The sort criteria for FROM, TO, and CC are based only on the
local-part of the first address in the header.  So 
<larry(_at_)oracle(_dot_)com>
and <larry(_at_)google(_dot_)com> sort together, which seems really wrong.  Is
there a compelling reason not to combine the addr-mailbox with the
addr-host when generating the sort key?

Although the rule in SORT was created in the days of a much smaller 
Internet community, it remains a better choice intra-enterprise where 
userids map to a single person but host ids can be all over the place.

Now, one may argue that intra-enterprise email should have a single host 
id, but that is not the reality.

Again, this would be an incompatible change that may benefit some, will 
cost others, and will create instability and unreliability.

 - The vast majority of mail clients display the decoded addr-name in
the "From" column, falling back to addr-mailbox(_at_)addr-host if addr-name
is NIL or blank.  It's these strings that need to be in sorted order.
If the server-side FROM sort results in the client displaying a "From"
column that's not sorted, the server-side sort isn't useful.

Sorting the addr-name opens a HUGE can of worms.

Given the names "Bandou Mitsugoro", "Mark Crispin", "Nishimura Aiko", 
"Pedro Castro Gomez", and "Mao Zedong", a stupid sort will collate these 
as:
        Bandou Mitsugoro
        Mao Zedong
        Mark R. Crispin
        Nishimura Aiko
        Pedro Castro Gomez
and a falsely-clever sort will collate these as:
        Nishimura Aiko
        Mark R. Crispin
        Pedro Castro Gomez
        Bandou Mitsugoro
        Mao Zedong
Both of these are totally wrong.

The actual correct collation, assuming(!) surname-first collation and 
Latin character ordering(!!), is:
        Bandou Mitsugoro
        Pedro Castro Gomez
        Mark R. Crispin
        Mao Zedong
        Nishimura Aiko
due to where the surname is located in various cultures.

And even that is making multiple unwarranted assumptions.  The addr-name 
may not even be a name that has a surname e.g., a corporate name.  Latin 
character ordering may not be correct either; in Japanese "Bandou" will 
collate before "Nishimura", but "Tanaka" will collate before either of 
these.

This is why this was punted to be a sort of the addr-mailbox.  Once again, 
changing this now would be an incompatible change that may benefit some, 
will cost others, and will create instability and unreliability.

 - Multi-recipient messages are very common, and there is no semantic
meaning to the order of recipients in the To and cc headers.  Messages
sent to "<A>, <B>" and those sent to "<B>, <A>" should really sort
together since they have the same set of recipients.  Would it be
acceptable to parse the relevant header, determine which address sorts
"highest", and always use that address as the sort key for the message?

Doing something like this was punted due to practicality's sake; and I 
don't see this changing.

In practice, To sorts are rare.  What little use they get is with mailing 
lists.

Once again, changing this now would be an incompatible change that may 
benefit some, will cost others, and will create instability and 
unreliability.

 - Likewise, we should give serious consideration to completely
ignoring groups when determining a message's sort key.  While a
fully-qualified address means the same thing from one message to
another, the membership of a group can easily change from one message to
the next.

This idea is definitely something that should be punted to future work. 
Among other things, it begs the question of "what is a group" and how an 
IMAP server is supposed to determine that.

 - I don't think I've ever seen a client that displays a "cc" column in
the UI.  In fact, I can't come up with a remotely plausible use case for
when a user would want to sort on CC.  Unless there's a real reason for
including CC sorting, it should be dropped from the draft.

I agree that cc sorts are extremely rare.  Nonetheless, the implementation 
burden of including the capability is neglibible; and removing this may 
cause a negative impact to something that uses it now.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
_______________________________________________
IETF mailing list
IETF(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>