fetchmail-friends
[Top] [All Lists]

[fetchmail][PATCH][FIX] fetchmail 5.9.0/5.9.7 ssl-related hang

2002-02-10 06:44:12
Hello,

I believe I finally tracked and fixed the long-standing "server troubles
with SSL cause fetchmail to hang with 100% CPU" bug.


This mail is somewhat long, I'm sectioning it to make it more concise.
This mail has six sections:

  1   non-technical problem description - for users
2,3,4 technical problem description - for hackers
  5   workaround for users - for users
  6   patch to fix the problems - for Eric and people who want to go
                                  figure if this fixes their problems


1. PROBLEM:
   The chat is POP3/SSL: USER, +OK, PASS, -ERR, QUIT -> hang

   The server drops the connection right after -ERR, so there's no
   chance the QUIT can get through.

   fetchmail then hangs with 100% CPU. I recall that I had these hangs
   in a different place when the server died earlier after accepting the
   connection.

2. DIAGNOSIS: (you can safely skip to section 5 below unless you
   understand fetchmail's internals, without missing important
   information.)

   strace shows that fetchnews loops on these syscalls:

   getpid()                                = 2826
   read(3, "", 5)                          = 0
   read(3, "", 5)                          = 0

   lsof says about fd #3:

   fetchmail 2826 emma    3u  sock    0,0         129440 can't identify protocol
   At that time, the IPv4 connection has gone away.

3. REPRODUCTION: I have a test program (Perl) that allows me - when used
   with stunnel - to reproduce the problem:

------------------------------------------------------------------------
#! /usr/bin/perl
# NAME THIS PROGRAM fakepop3.pl ####################################
$| = 1;
for(;;) {
        print "+OK POP3 Fakedaemon ready.\r\n";
        if (<> !~ /^user\s/i) { print "-ERR\r\n"; next; }
        print "+OK\r\n";
        if (<> !~ /^pass\s/i) { print "-ERR syntax\r\n"; next; }
        print "-ERR Datenbankfehler\r\n"; close(STDOUT); last;
}
------------------------------------------------------------------------

   Save this script as fakepop3.pl, create some fake /tmp/server.pem and
   launch the server with (maybe stunnel can work without certificate,
   didn't check):
   /usr/sbin/stunnel -d 1995 -l fakepop3.pl -p /tmp/server.pem

   Then add this to .fetchmailrc (you may need to comment out other "poll
   localhost" sections)
   poll localhost with proto pop3 uidl port 1995
   user x password y is junkme here ssl

   Then run fetchmail -Nvd0 localhost.

4. BUGS FOUND: I tracked this down to some bugs in socket.c.

   #1 socket.c uses ERR_get_error rather than SSL_get_error, however,
   SSL_read/SSL_peek are supposed to use SSL_get_error. Not sure if that
   matters, I've fixed this anyhow.

   #2 socket.c does not call SSL_get_error when SSL_peek or SSL_read
   return 0, but it must call SSL_get_error in that case to distinguish
   between "no data" and "error" conditions. (Talk about a strange API.)

5. WORKAROUND: For fetchmail users, a workaround is to install stunnel
   and reconfigure .fetchmailrc to use something similar to this:

   plugin "/usr/sbin/stunnel -c -r pop3.web.de:995"

   Put this after "poll" and before "user". Remember to adjust your
   server and port. Mind you, stunnel had some issues, be sure to get
   the latest version (at least 3.22) from http://stunnel.mirt.net/
   (official) or http://www.stunnel.org/ (trustworthy inofficial).

6. PATCH/FIX: Here's a patch against fetchmail 5.9.7 that fixes the
   problem for me. Tested against OpenSSL 0.9.6something on SuSE Linux
   7.3, not sure about older SSL versions. The patch is (C) 2002 by
   Matthias Andree, and is put under a triple LGPL/GPL/MIT
   license. Choose what suits your needs.

   This patch also applies against fetchmail 5.9.0, at least when
   applied with GNU patch, it will report offsets of consistently 33
   lines. A patch against fetchmail 5.9.0 can be found at
   http://studserver.uni-dortmund.de/~su0186/fetchmail-5.9.0.patch --
   but it is untested, I only tested fetchmail 5.9.7.

   Eric, please consider the patch for 5.9.8, and if you apply it,
   please release another 5.9.x to get a wide test base before jumping
   to 5.10.0 or 6.0.0, as I fear build woes with older OpenSSL/SSLEAY
   versions.

--- ./socket.c.orig     Sun Feb 10 13:54:53 2002
+++ ./socket.c  Sun Feb 10 14:20:29 2002
@@ -563,13 +563,14 @@
                        later change the behavior of SSL_peek
                        to "fix" this problem...  :-(   */
                if ((n = SSL_peek(ssl, bp, len)) < 0) {
+                       (void)SSL_get_error(ssl, n);
                        return(-1);
                }
                if( 0 == n ) {
                        /* SSL_peek says no data...  Does he mean no data
                        or did the connection blow up?  If we got an error
                        then bail! */
-                       if( 0 != ( n = ERR_get_error() ) ) {
+                       if( 0 != ( n = SSL_get_error(ssl, n) ) ) {
                                return -1;
                        }
                        /* We didn't get an error so read at least one
@@ -581,8 +582,13 @@
                        newline = NULL;
                } else if ((newline = memchr(bp, '\n', n)) != NULL)
                        n = newline - bp + 1;
-               if ((n = SSL_read(ssl, bp, n)) == -1) {
-                       return(-1);
+               /* Matthias Andree: SSL_read can return 0, in that case
+                * we must cal SSL_get_error to figure if there was
+                * an error or just a "no data" condition */
+               if ((n = SSL_read(ssl, bp, n)) <= 0) {
+                       if ((n = SSL_get_error(ssl, n))) {
+                               return(-1);
+                       }
                }
                /* Check for case where our single character turned out to
                 * be a newline...  (It wasn't going to get caught by
@@ -635,16 +641,20 @@
 #ifdef SSL_ENABLE
        if( NULL != ( ssl = SSLGetContext( sock ) ) ) {
                n = SSL_peek(ssl, &ch, 1);
+               if (n < 0) {
+                       (void)SSL_get_error(ssl, n);
+                       return -1;
+               }
                if( 0 == n ) {
                        /* This code really needs to implement a "hold back"
                         * to simulate a functioning SSL_peek()...  sigh...
                         * Has to be coordinated with the read code above.
                         * Next on the list todo...     */
 
-                       /* SSL_peek says no data...  Does he mean no data
+                       /* SSL_peek says 0...  Does that mean no data
                        or did the connection blow up?  If we got an error
                        then bail! */
-                       if( 0 != ( n = ERR_get_error() ) ) {
+                       if( 0 != ( n = SSL_get_error(ssl, n) ) ) {
                                return -1;
                        }
 

-- 
Matthias Andree

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."         Benjamin Franklin


<Prev in Thread] Current Thread [Next in Thread>