On Wed, 01 Dec 2010 21:39:37 PST, Jon Steinhart said:
One of the big pieces that's needed is a modern mail parser. As per earlier emails, I think that this is complex enough that it's a job for lex and yacc. A big thing that someone could do to help me with this would be to collect all of the various grammar into a single document. I'm willing to write the code for it, but I'm not a complete rfc junkie and find the whole thing hard to read. If some of you could slog through the rfcs and collect this stuff we could make some real progress.
There's several ways to go here. The actual grammar is (mostly) in RFC5322, except for the MIME headers (which are mostly simple enough that a simple ad-crock parser should be able to deal with it, just "Fieldname: [tag=value]*" for the most part. Parsing the tag/value pairs is easy - the semantics are a pain because they're often context-sensitive (ignore this tag unless this other tag doesn't say 'inline', etc...). Large chunks of the grammar are there only for crufty corner cases (if anybody is interested, read section 3.1.4 of RFC822 for an example of its awesomeness) Just because I remembered seeing it before, here's a rfc822 address validator, done as one Perl regexp: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html Yes, you really want to use lex/yacc to build a parser instead. :) And then the question of what to do when certain other common MUAs and MTAs manage to ignore the RFCs and produce something ugly - although the biggest offender is still the various poorly written spamware out there. But since no spam filter is 100% effective, we *do* have to be robust in the face of crap. Unfortunately, parsers created from a BNF or similar tend to be a tad brittle when recovering from syntactically incorrect input (anybody ever had a missing ) or } leave an error message 500+ lines away from the actual error?
pgp2zKFBHOd0T.pgp
Description: PGP signature
_______________________________________________ Nmh-workers mailing list Nmh-workers(_at_)nongnu(_dot_)org http://lists.nongnu.org/mailman/listinfo/nmh-workers
Previous by Date: | Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Jon Steinhart |
---|---|
Next by Date: | Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Robert Elz |
Previous by Thread: | Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Jon Steinhart |
Next by Thread: | Re: [Nmh-workers] Understanding nmh (aka. What's the goal), markus schnalke |
Indexes: | [Date] [Thread] [Top] [All Lists] |