ietf-clear
[Top] [All Lists]

[clear] pyCSV

2005-08-18 09:41:10
John,

Thanks for your excellent review of this code.  Your understanding of 
Python is good enough to have caught some subtle problems.

The main issue is whether we should include an MTA action other than 
'ACCEPT', 'REJECT', or 'TEMPFAIL'.  I guess I should have said that this 
routine is to be called *after* it is known that the sender offers CSV 
authentication.  In that case it makes sense to reject when the promised 
CSV records are not there.

That still leaves the "weight==3" case, which I did not know about, and 
don't understand.  If a sender can't make a clear statement as to the 
authorization of all servers with A records under a name that he chose, I 
don't see any value in having CSV records for those servers.  We can 
certainly add an "unknown" action to our list, and let receivers decide 
what to do, but I would much rather have a simple script returning a clear 
recommended action.  SPF suffers from this kind of ambiguity, and I thought 
CSV was supposed to avoid it.

If there is a clear consensus among the CSV folks, I will add another 
possible action to this script, but I recommend avoiding this 
ambiguity.  If a sender promises CSV, then can't be definite about his CSV 
authorizations, he shouldn't get any advantages from "offering" CSV.  Note 
also that all *actions* are ultimately the choice of the 
receiver.  Modification of this script should be easy enough that you can 
change 'REJECT' to 'ACCEPT' and not even have to re-compile the MTA.

I've updated the code to correct the other problems you pointed out.  See 
http://purl.net/macquigg/email/python/pycsv.py for the latest rev.

At 11:23 PM 8/17/2005 -0400, John Leslie wrote:

David MacQuigg <dmquigg-clear(_at_)yahoo(_dot_)com> wrote:

Following is the first draft of my script to check CSV records, and it
seems to work correctly on the examples in the docstrings.  The script was
written with Sendmail in mind, but it should interface easily with just
about any MTA, since the return values are pretty much universal.

Suggestions are welcome, particularly anything I haven't thought of.

# pyCSV.py  8/16/05  David MacQuigg

   I assume this is written in Python, which I'm not familiar with.
(So my comments may be off-base, alas...)

Your comments are right on target.  I'm glad to see Python is living up to 
its promise of self-documenting clarity.  Any remaining obscurity is due to 
my being new to the language.  I'm just learning about executable 
doc-strings and other cool stuff.

def csv(IP, helo):

   I assume "IP" is the remote IP address of the TCP connection, and
"helo" is the string from the HELO/EHLO command.

Correct.  These items are passed in from the MTA some time after the HELO 
command.

    Returns ( action, SMTP_reply, header )

      action: 'ACCEPT', 'REJECT', 'TEMPFAIL'

   CSV most often produces I result I call "unknown", in that the IP
address is neither known to be an authorized SMTP client nor known to
not be an authorized SMTP client. There are several ways this can happen.

See comments above.  In my SPF wrapper, I use 'CONTINUE' for an uncertain 
result, meaning - This test was a waste, try some other method.

      SMTP_reply = ( SMTP_code, Xcode, explanation )
        SMTP_code: SMTP Reply Code per RFC-2821
        Xcode:  Enhanced Mail System Status Code per RFC-3463

      header = {'label': 'Authent:',
                 'text': '%s %s CSV %s' % (IP, helo, result) }
        result: 'PASS', 'FAIL'

Examples:
csv('168.61.5.27', 'harry.mail-abuse.org')
('ACCEPT', (250, '', 'Sender CSV OK'), \
[{'text': '168.61.5.27 harry.mail-abuse.org CSV PASS', 'label':
'Authent:'}])

   This is an "authenticated and authorized" case, deserving the full
trust that local policy assigns to "harry.mail-abuse.org".

csv('192.168.0.64', 'harry.mail-abuse.org')
('REJECT', (550, '5.7.1', \
"'192.168.0.64' not authorized by 'harry.mail-abuse.org'"), [])

   This is a "client is not authenticated" case, where a "complete"
list if IP addresses for SMTP clients authorized to use the HELO string
"harry.mail-abuse.org" was returned, but the actual IP address of the
connection is not one of them. Rejection of all email is appropriate.

csv(IP, 'yahoo.com')
('REJECT', (550, '5.7.1', "No SRV record for '_client._smtp.yahoo.com'."),
[])

   This is an "unknown" case, where no SRV record is published. CSV
yields no information on whether the SMTP client issuing that HELO is
authorized or not. Rejection of the email is NOT appropriate.

I expect yahoo and other reputable senders will tell us what authentication 
methods they offer by publishing that information in DNS, at an agreed 
location like '_auth.yahoo.com'.  Senders with unknown methods will get the 
same spam filtering as the rest of the unauthenticated mail.  In the 
example above, we assume that yahoo has declared that it offers CSV, but 
there were in fact, no CSV records at the expected location.  This is like 
a hardware failure, something that should be resolved quickly by the 
sender.  They should either remove CSV from the declarations in their _auth 
record, or add the required CSV records.

    ## Get an SRV record for the helo name:
    name = '_client._smtp.' + helo
    try:
        reqobj = DNS.Request(name, qtype='SRV', timeout=DNS.timeout)
        resp = reqobj.req()

    except DNSError, expln:
        exp = str(expln)
        if exp == 'Timeout':
            msg = ("Timeout getting SRV record for '%s'.\n" % name
                 + "Try again later."  )
            return ('TEMPFAIL', (450, '', msg), [] )

   This differs (slightly) from the "unknown" case, in that there might
be a SRV record, and a later retry might yield a different result. It
is reasonable to return a temporary error or to treat this as "unknown".

        else:
            msg == exp + "\nDNS error getting SRV record for '%s'" % name
            return ('REJECT', (550, '?.?.?', msg), [] )

   This is a case of messed-up DNS. CSV expresses no opinion on what
is the most reasonable action; but I personally tend towards treating
it as an "unknown" case.

I've changed this to a 'TEMPFAIL', and added an ALERT to the receiver's 
admin.  After looking at the DNS.py code, and trying a few extreme tests, 
like disconnecting my router in the middle of a session, I see that most of 
the error messages relate to problems on the receiver's end ("No working 
nameservers", "No route to host", etc.).

    ## Check for too few or too many SRV records:
    lr = len(resp.answers)
    if lr == 0:
        exp = "No SRV record for '%s'." % name   # or non-existent domain
        return ('REJECT', (550, '5.7.1', exp), [])

   I don't know what the DNS.Request response format is. No SRV RR is
a normal case, and should yield an "unknown" result.

    if lr  > 1:
        exp = "Found %s SRV records for '%s'. Should be 1." % (lr, name)
        return ('REJECT', (550, '5.7.1', exp), [])

   This does not allow for multiple CSV versions (distinguished by the
"priority" field. It would be better to count the priority==1 records.
(More than one of those _is_ an error.)

Good catch.  I'm now counting only the priority=1 records, ignoring others 
that may have a different version number.

## Extract the needed info from a version 1 record:
     count = 0
     for ans in resp.answers:
         data = ans['data']
         version = data[0]   # CSV version (SRV priority field)
         if version != 1:
             continue  # ignore any records that are not version 1
         count += 1
         weight   = data[1]  # authorization ( 1 = NO, 2 = YES )
         port     = data[2]  # subdomain authorization
                             # ( 0 = unspecified, 1 = CSV required )
         target   = data[3]  # authorized hostname (ID)

    ## Extract the needed info from the response:
    ra0 = resp.answers[0]
    rad = ra0['data']
    priority = rad[0]  # CSV version

   This should be checked. The recommended action is to ignore all
SRV records with a version you don't know (currently only version 1
is knowable).

Done.

    weight   = rad[1]  # authorization ( 1 = NO, 2 = YES )
    port     = rad[2]  # subdomain authorization
                       # ( 0 = unknown, 1 = CSV required )
    target   = rad[3]  # authorized hostname (ID)

    if weight != 2:
        exp = "'%s' not authorized to send mail" % target
        return ('REJECT', (550, '5.7.1', exp), [])

   Hopefully, weight==3 won't be a common case, but you should allow
for it. It means that some SMTP client(s) are authorized to use that
HELO string, but the list of IP addresses is not available in DNS.
Thus, weight==3 is another "unknown" case.

Assuming we run a CSV check *only* for domains that have declared their 
offering of CSV, I think this is the *only* case with an "unknown" 
result.  The question for CSV folks is whether we should add an ambiguous 
*action* for this one edge case, or insist that all CSV publishers figure 
out where their public mail servers are and make their CSV records unambiguous.

    ## Check the A records for the authorized name:
    try:
        reqobj = DNS.Request(target, qtype='A', timeout=DNS.timeout)
        resp = reqobj.req()

   (This, of course, doesn't work for IPv6.)

I've added a comment at the appropriate place, so we don't forget to add 
IPv6 later.

    except DNSError, expln:
        exp = str(expln)
        if exp == 'Timeout':
            msg = ("Timeout getting A records for '%s'.\n" % target
                 + "Try again later."  )
            return ('TEMPFAIL', (450, '', msg), [] )

   A temporary SMTP error seems the most appropriate here.

        else:
            msg == exp + "\nDNSError getting A records for '%s'" % target
            return ('REJECT', (550, '?.?.?', msg), [] )

   This is another messed-up DNS situation. Again, CSV expresses no
opinion on what action is most reasonable.

Changed to TEMPFAIL, as above.

    ## Make a list of the authorized IP addresses:
    aa = []
    for ans in resp.answers:
        aa.append(ans['data'])

    ## Check incoming IP against the list:
    if IP in aa:
        action = 'ACCEPT'
        SMTP_reply = (250, '', 'Sender CSV OK')
        header = {'label': 'Authent:',
                    'text': '%s %s CSV PASS' % (IP, helo) }

   In principle, I quite approve of adding a header; however, the
devil is in the details...

The header here includes only the essential information needed by 
downstream receivers.  I have allowed for more than one header, in case a 
method has its own (e.g. Received-SPF:).  We can also add optional 
key=value pairs at the end of the Authent: header.

        return (action, SMTP_reply, [headers] )

   This is the "authenticated and authorized" case.

    else:
        action = 'REJECT'
        SMTP_reply = (550, '5.7.1',
            "'%s' not authorized by '%s'" % (IP, helo) )
        return (action, SMTP_reply, [] )

   Here we've collected a complete list (assuming DNS.Request works
the way I guess it does), and the actual IP address used is not on
the list. Rejection of email is appropriate.

   IMHO, it would be better to design this routine to have an "unknown"
return action.

Unknown is a result, not an action.  The SMTP action could be 'ACCEPT', and 
we would add an UNVERIFIED keyword to the header.

Authent: 168.61.5.27 harry.mail-abuse.org CSV1 UNVERIFIED

Or we could do like we have done with SPF and add a 'CONTINUE' action.

   Also note that this routine makes no attempt to search for parent
domains which specify whether CSV records are published for all the
subdomains which are authorized SMTP servers. No such search is
_required_ by the spec, but it is helpful to have some way of detecting
this case.

It seems to me that the need to search parent domains is an unfortunate 
consequence of not knowing up front what methods a domain offers.  Again, 
my assumption is that the csv() check is called *after* we know that CSV 
records are offered for a specified HELO name.  If the promised records are 
not there, its an immediate REJECT.

--
Dave
************************************************************     *
* David MacQuigg, PhD     email: david_macquigg at yahoo.com     *  *
* IC Design Engineer            phone:  USA 520-721-4583      *  *  *
* Analog Design Methodologies                                 *  *  *
*                                 9320 East Mikelyn Lane       * * *
* VRS Consulting, P.C.            Tucson, Arizona 85710          *
************************************************************     *


<Prev in Thread] Current Thread [Next in Thread>