fetchmail-friends
[Top] [All Lists]

Re: [fetchmail] Re: POP3 LAST vs UIDL

2003-10-12 05:10:07
Sunil Shetye <shetye(_at_)bombay(_dot_)retortsoft(_dot_)com> writes:

IMO, support for LAST should be removed from fetchmail itself. Also,
RFC 1725 was right in removing the LAST command as it is unreliable in
tracking unseen mails.

So, the only sane option left is UIDL. Matthias Andree is right!

Thanks for your support.

Inspite of the complexity required in tracking new mails, UIDL is
infact quite a safe and secure method for tracking new mails. One
problem was the extra traffic in downloading all UIDs. However, my
patch for using binary search to get new mails should atleast
alleviate this problem.

There are two issues with UIDL currently:

1. the linear list leads to O(n^2) complexity for looking up a single
   UID is probihibitively expensive. I reduced some of the (function
   call) overhead by making the recursive function iterative instead,
   which is a linear speed-up of three, but it doesn't fix the problem
   that with --keep, fetchmail takes many seconds to find out what mail
   it has seen and not.

2. Eric is chary about touching UID code, and he's probably right, it's
   delicate equipment.

   As suggested before, I'll repeat that there should be one UID file
   per account - that is, (user, server) tuple, so we don't need to
   worry about swapping and saving. Saves memory as well. There are
   several approaches: * use a data base (BerkeleyDB, GDBM), * use a
   flat text file, but read it into a hash or rbtree.

Also, once LAST is removed, the issue of TOP-vs-RETR is meaningless.

Sort of. TOP still has one potential use: aid filtering. Assume we're
running with antispam list or a potential future "policy" extension
(that would be a program that is shown the mail headers and then says
ACCEPT, BOUNCE or DISCARD) program(1). IMAP4 allows to retrieve headers
only, and so does POP3. With "TOP 1234 0", we'd peek at the headers,
pass these to the policy extension and if it says BOUNCE or DISCARD, we
can drop the mail without saying RETR. Such a mechanism is very useful
when a mail virus or spam can be told from the header already. Of
course, such bandwidth reduction is only useful if the mail is large
enough that download time outweighs transit time(2).

I am not sure if the pop3_slowuidl() code should be kept. Is there
anyone using the slow uidl method which downloads the headers of all
mails and uses the message-id as the UID?

I have been running "proto pop3 uidl" for ages. ALL my upstream servers,
without any exception, support UIDL. That is a standard service users
should expect, if their ISP doesn't support POP3 + UIDL, the user should
complain - or switch ISP.

Footnotes:

(1) Such a program may be run twice, once for "headers only", once for
    the full mail. It is a necessary replacement for "antispam" when MDA
    is configured, since MDA will not have anything to match UIDL
    against. (Of course, we could list exit codes in antispam and
    discard anything that caused the MDA to exit with code 77,
    EX_NOPERM).

(2) fetchmail should consider pipelining commands except authentication,
    CAPA and should also eliminate all voluntary delays. Any POP3 server
    that needs client delays after sending a reply is broken and should
    be declared unsupported. Users will urge their ISP to replace the
    broken upstream software. I know of no user who ever complained
    about "getmail" (by Charles Cazabon) not "waiting" for server locks
    to be cleared or something. That way, the mail fetch rate can be
    higher than 1/RTT - and the RTT can be as high as 1/4 s for
    interleaved DSL links, which means that regardless of bandwidth,
    only 4 mails per second make it.

    Changes to fetchmail are minor, it would need to send off like 100
    RETR commands in a row and then read back the replies. Care must be
    taken to not exhaust the TCP send buffer, to prevent deadlocks.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95