fetchmail-friends
[Top] [All Lists]

[fetchmail] [PATCH] fetching uids of mails

2003-10-01 01:42:26
[ This idea is similar to the one in the thread "fetching sizes of
mails"; the patch is to be applied after the patches in that thread. ]

Currently, fetchmail gets the UIDs of all mails right at the start.
This is a problem when there are too many mails in the mailbox
(especially when the connection is flaky). The current transaction
goes as:

POP3> STAT
POP3< +OK 10000 2000000                 # there are 10000 mails!
POP3> UIDL
POP3< +OK
POP3< 1 1064918388.20227_0.myserver.mydomain.com
POP3< 2 1064918389.20228_0.myserver.mydomain.com
POP3< 3 1064918390.20229_0.myserver.mydomain.com
...
POP3< 10000 1064928388.30227_0.myserver.mydomain.com
POP3< .

The size of ID depends upon the POP server. Assuming that each ID is
40 bytes in length, the average size of each line is 47 bytes
(including CRLF), so nearly 470 kbytes are transferred before
downloading the first mail. There is also a loss of bandwidth if a
socket error occurs.

Getting all the UIDs the first time may be a good idea. However,
getting the same UIDs again and again (especially in daemon mode) is
not strictly required.

Here is a patch which (in daemon mode) gets all the UIDs the first
poll, then tries to search for new UIDs using binary search in the
subsequent 9 polls. The cycle is repeated every 10 polls.

Now, the transaction (during binary search) will go as:

POP3> STAT
POP3> UIDL 5000                         # get the uid of 5000th mail, is old
POP3> UIDL 7500                         # is old
POP3> UIDL 8750                         # is old
POP3> UIDL 9375                         # is old
POP3> UIDL 9688                         # is old
POP3> UIDL 9844                         # is old
POP3> UIDL 9922                         # is new
POP3> UIDL 9883                         # is old
POP3> UIDL 9902                         # is new
POP3> UIDL 9892                         # is old
POP3> UIDL 9897                         # is old
POP3> UIDL 9899                         # is old
POP3> UIDL 9900                         # is old
POP3> UIDL 9901                         # is old

[ so 9901 is old and 9902 is new, start getting mails from 9902
onwards ]

POP3> LIST 9902                         # this is due to the previous patch on 
sizes
POP3> RETR 9902
POP3> UIDL 9903
POP3> LIST 9903
POP3> RETR 9903
...

This patch makes the assumption that all new mails occur after all old
mails in the mailbox when using binary search (the basic idea has been
borrowed from pop3_slowuidl()!).

This patch also fixes the following bugs:

- If both 'keep' and 'expunge' are specified, fetchmail does a
  spurious logout+login after every 'expunge' mails even though no
  mails are getting deleted. Say:
  
  $ fetchmail -e 1 -k -p pop3
    # logs out after every mail download!

- If there is a socket error, the newsaved list is not freed.

- the numbers (associated with UIDs) in the oldsaved list are not
  reset at the end of poll.

- the numbers (associated with UIDs) are stored in a 'short int'. This
  can cause an overflow when there are too many mails in the mailbox.

- fetchlimit does not work with uidl correctly in daemon mode. In the
  first poll, all the new UIDs are added to newsaved list. When the
  fetchlimit is reached, swapping of lists is done. In the next poll,
  fetchmail sees the new UIDs in oldsaved list also and assumes that
  the mails are all old. Those mails never get downloaded. Say:

- The assumption that a UID in oldsaved list is necessarily old is not
  correct. This is a problem in daemon mode when all mails don't get
  downloaded due to options like 'fetchlimit' or 'limit' and the UID
  lists get swapped. This assumption is made in pop3_getrange() and
  pop3_is_old(). Say:

  $ fetchmail -B 1 -U -d 100 -p pop3
    # gets only one of all the existing new mail!

- If there is a socket error, the mails which were seen before the
  socket error are downloaded again during the next poll. The correct
  solution is to add a new UID to both newsaved and oldsaved and mark
  it as seen in both the lists after download. In case, the UIDs don't
  get swapped (say, due to socket error), the already downloaded mails
  will not get downloaded again.

-- 
Sunil Shetye.

Attachment: fetchmail-6.2.4-fastuidl.patch
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>
  • [fetchmail] [PATCH] fetching uids of mails, Sunil Shetye <=