fetchmail-friends
[Top] [All Lists]

[fetchmail] Re: POP3 LAST vs UIDL [WAS: "Limit" doesn't cooperates with "flush"]

2003-10-11 04:29:19
Quoting from Eric S. Raymond's mail on Fri, Oct 10, 2003 at 04:50:34AM -0400:
- Currently, there is a serious bug in the uidl code where unseen
  messages are marked as seen under specific options. I have attempted
  to fix it in the thread "fetching uids of mails". If a user has
  specified 'flush' to delete oversized mails by default, such mails
  will also get deleted! If one has the following rc file,

I have never trusted the UIDL code.  I didn't write it, and it is 
sufficiently ugly and complicated that I don't think I understand it
completely.  This is a major reason I have resisted Matthias Andree's
plan to move to UIDL tracking.

For POP3, fetchmail uses only two methods to track new messages with
POP3: LAST and UIDL.

LAST assumes that there is a clear demarcation between old and new
mails. It assumes that all the mails which have been downloaded and
delivered lie at the start of the mailbox. This assumption easily gets
invalidated in the face of temporary problems. The LAST command is
completely unreliable for the following reasons:

- If some of the retrieved mails had delivery problems (say, due to
  transient SMTP error), those mails never get downloaded again in
  further polls. Note that the seen flag is set here automatically as
  the mail was downloaded successfully.

- If some of the mails get skipped (not retrieved at all) for some
  reason (say, due to 'limit'), the mails never get downloaded. Note
  that even though the seen flag is not set for such skipped mails,
  the seen flag has been set for the mails after the skipped mails
  which were downloaded. Since the LAST command returns the last seen
  mail, such intermediate skipped mails never get downloaded in
  further polls.
  
- If there is a socket error, the POP3 server may revert the state of
  the mailbox. The output of LAST will not then change in the next
  poll. So, all the mails that were downloaded & delivered
  successfully will get redownloaded in the next poll.

For example, consider this mailbox which has 10 mails. Assume that
some of the mails were either not downloaded at all or not delivered
for various reasons:

 #1 downloaded & delivered
 #2 skipped (due to 'limit')
 #3 downloaded & delivered
 #4 not delivered (dns lookup failed)
 #5 not delivered (smtp server gave transient error)
 #6 downloaded & delivered
 #7 skipped (due to 'limit')
 #8 downloaded & delivered
 #9 new mail
#10 new mail

Here, #8 is the last seen mail.

In this case, in the next poll, mails #2, #4, #5, #7, should also get
downloaded with the new mail. However, since LAST returns the last
seen mail (8 here), 2, 4, 5, 7 will never get downloaded.

In short, the lack of user control over the SEEN flag combined with
the lack of interface other than the LAST command leads to skipping of
undelivered mails. The use of the LAST command does not lead to
reliable mail delivery.

IMO, support for LAST should be removed from fetchmail itself. Also,
RFC 1725 was right in removing the LAST command as it is unreliable in
tracking unseen mails.

So, the only sane option left is UIDL. Matthias Andree is right!
Inspite of the complexity required in tracking new mails, UIDL is
infact quite a safe and secure method for tracking new mails. One
problem was the extra traffic in downloading all UIDs. However, my
patch for using binary search to get new mails should atleast
alleviate this problem.

Also, once LAST is removed, the issue of TOP-vs-RETR is meaningless.
The server side flags will anyway not be used. Whether the seen flag
is set or not will not matter.

If a server does not support UIDL, the only safe retrieval is by using
fetchall. Using LAST even as a backup is not recommended due to the
problems mentioned above.

The basis of the UIDL code is quite simple. It has been written in an
ugly and complicated way, as you mention it, quite unnecessarily!

I am not sure if the pop3_slowuidl() code should be kept. Is there
anyone using the slow uidl method which downloads the headers of all
mails and uses the message-id as the UID? For multidrop, the
assumption that each mail has a unique message-id is incorrect. If the
same mail is delivered multiple times for multiple recipients to the
same multidrop mailbox, the assumption that message-id of each mail is
unique is invalidated. Also, there is a huge loss of bandwidth if the
headers of all mails are downloaded just to get the message-id.

If the code for LAST and pop3_slowuidl() is removed, the UID code is
going to look much better and cleaner.

-- 
Sunil Shetye.