procmail
[Top] [All Lists]

Re: new features for procmail

2005-11-14 02:41:40
Gary Funck schreef:

Ruud adds:
- time-based arithmetic, by conversion to epic seconds
  other mktime functions (presumably conversions between
  time zones)

I assumed UTC/GMT epoch seconds.
(either seems OK to me)


- unicode should be used as fully as possible

... without the general user having to know any specifics about it.

Because US-ASCII is inside ISO-8859-1 is inside Unicode, you don't
notice it until you need to go beyond ISO-8859-1. Even people working in
different 8bit code pages, often limit themselves to ASCII for searches.


Gary adds:
[...]
- extensions should be "context free" in that when we exchange new
  procmail constructs on this list, it should be obvious from reading
  the recipe that it is new syntax (this argues against, dependency
  upon the name of an .rc file, or a global variable, or command line
  argument.

Maybe a 4-flag on the recipe, after (but maybe unconnected to) the :0.
That will certainly look strange enough. The '4' maybe implies the 'p'
we talked about earlier.

   :0 4   # using procmail-4 features
   * ^^From \S+ (.*)(?{ FROM_TIME = mktime($^N) })

Maybe another name than mktime() is better, as long as it returns the
seconds from the start of Jan 1, 1970 (UTC).
(64 bit!)


- add 'expr' functionality for evaluating simple expressions

Maybe implemented as a calc() function.


- mail address parsing support.  Reliably and correctly parse the
  addresses in TO: FROM: and any other headers.  Once the address
  has been broken out, support splitting it into name and address
  and host name part.

Read `perldoc -q address`, which points to
http://www.cpan.org/authors/Tom_Christiansen/scripts/ckaddr.gz
but also warns: there are deliverable addresses that aren't
RFC-822 (the mail header standard) compliant, and addresses
that aren't deliverable which are compliant.



- provide support for parsing the TLD from a hostname

   :0 4
   * SOMEHOST ?? ([^.]+)\.([^.]+)$(?# to do: 'co.uk' etc.)
   {
      DOMAIN = $1
      TLD    = $2
   }

or even

   # the following match() returns a list of 2 elements
   (DOMAIN, TLD) = match( $SOMEHOST, /([^.]+)\.([^.]+)$/ )

   FROM_TIME = mktime( match( /^^From \S+ (.*)/ ) )



Maybe a variable name should always have a $-prefix:

   :0 4
   * $SOMEHOST =~ ([^.]+)\.([^.]+)$(?# to do: 'co.uk' etc.)
   {
      $DOMAIN = $1
      $TLD    = $2
   }

   ($DOMAIN, $TLD) = match( $SOMEHOST, /([^.]+)\.([^.]+)$/ )

   $FROM_TIME = mktime( match( /^^From \S+ (.*)/ ) )


- ability to extract URL's and/or e-mail addresses from message bodies
  (arguably could be implemented with the new PCRE support, but this
  extraction would be builtin, faster, and syntax added for looping
  through the addresses/URL's)

These URLs often need quite some decoding, because they contain all
kinds of tricks to not get recognized, including Unicode-lookalikes of
familiar characters.


- In addition to PCRE matching, implement "approximate matching",
  ala String::Approx (on CPAN)

http://search.cpan.org/~jhi/String-Approx/Approx.pm
I think it is hard to make that practical, since it basically works on
short strings like single words. It would not work very well against the
variant spellings of 'viagra', because the 'Levenshtein edit distance'
can be made arbitrarily big.



- Add a "lint" option to procmail, to have it scan the rc file for
  blatant errors and to diagnose possibly unsafe/incorrect recipes,
  without actually executing the script.

Like perl -c, see `man perlrun` or `perldoc perlrun`.


- Improve procmail's performance by having it statically compile
  scripts, where possible (recursive includes and includes that can't
  be statically evaluated throw this out), and use the statically
  compiled .pc-procmailrc as long as it is newer than .procmailrc,
for example.

Many recursive includes can be statistically evaluated.


Something else: the AND/OR issues of conditions.

Current way:

  EITHER  = '9876543210^0'
  OR      = $EITHER
  OR_EVEN = $EITHER

  NEITHER  = '!'
  NOR      = $NEITHER

  IF       = ''
  AND_ALSO = $IF

  IF_NOT        = '!'
  AND_NOT       = $IF_NOT
  AND_ALSO_NOT  = $IF_NOT

  :0
  *$ $EITHER   condition-A
  *$ $OR       condition-B
  *$ $OR_EVEN  condition-C
  action

  :0
  *$ $NEITHER  condition-A
  *$ $NOR      condition-B
  *$ $NOR      condition-C
  action

  :0
  *$ $IF        condition-A
  *$ $AND_ALSO  condition-B
  *$ $AND_ALSO  condition-C
  action

  :0
  *$ $IF_NOT        condition-A
  *$ $AND_NOT       condition-B
  *$ $AND_ALSO_NOT  condition-C
  action

I skipped the NAND(A,B,C), which is the same as OR(NOT A, NOT B, NOT C).

-- 
Grtz, Ruud


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>