Gary Funck schreef:
Ruud adds:
- time-based arithmetic, by conversion to epic seconds
other mktime functions (presumably conversions between
time zones)
I assumed UTC/GMT epoch seconds.
(either seems OK to me)
- unicode should be used as fully as possible
... without the general user having to know any specifics about it.
Because US-ASCII is inside ISO-8859-1 is inside Unicode, you don't
notice it until you need to go beyond ISO-8859-1. Even people working in
different 8bit code pages, often limit themselves to ASCII for searches.
Gary adds:
[...]
- extensions should be "context free" in that when we exchange new
procmail constructs on this list, it should be obvious from reading
the recipe that it is new syntax (this argues against, dependency
upon the name of an .rc file, or a global variable, or command line
argument.
Maybe a 4-flag on the recipe, after (but maybe unconnected to) the :0.
That will certainly look strange enough. The '4' maybe implies the 'p'
we talked about earlier.
:0 4 # using procmail-4 features
* ^^From \S+ (.*)(?{ FROM_TIME = mktime($^N) })
Maybe another name than mktime() is better, as long as it returns the
seconds from the start of Jan 1, 1970 (UTC).
(64 bit!)
- add 'expr' functionality for evaluating simple expressions
Maybe implemented as a calc() function.
- mail address parsing support. Reliably and correctly parse the
addresses in TO: FROM: and any other headers. Once the address
has been broken out, support splitting it into name and address
and host name part.
Read `perldoc -q address`, which points to
http://www.cpan.org/authors/Tom_Christiansen/scripts/ckaddr.gz
but also warns: there are deliverable addresses that aren't
RFC-822 (the mail header standard) compliant, and addresses
that aren't deliverable which are compliant.
- provide support for parsing the TLD from a hostname
:0 4
* SOMEHOST ?? ([^.]+)\.([^.]+)$(?# to do: 'co.uk' etc.)
{
DOMAIN = $1
TLD = $2
}
or even
# the following match() returns a list of 2 elements
(DOMAIN, TLD) = match( $SOMEHOST, /([^.]+)\.([^.]+)$/ )
FROM_TIME = mktime( match( /^^From \S+ (.*)/ ) )
Maybe a variable name should always have a $-prefix:
:0 4
* $SOMEHOST =~ ([^.]+)\.([^.]+)$(?# to do: 'co.uk' etc.)
{
$DOMAIN = $1
$TLD = $2
}
($DOMAIN, $TLD) = match( $SOMEHOST, /([^.]+)\.([^.]+)$/ )
$FROM_TIME = mktime( match( /^^From \S+ (.*)/ ) )
- ability to extract URL's and/or e-mail addresses from message bodies
(arguably could be implemented with the new PCRE support, but this
extraction would be builtin, faster, and syntax added for looping
through the addresses/URL's)
These URLs often need quite some decoding, because they contain all
kinds of tricks to not get recognized, including Unicode-lookalikes of
familiar characters.
- In addition to PCRE matching, implement "approximate matching",
ala String::Approx (on CPAN)
http://search.cpan.org/~jhi/String-Approx/Approx.pm
I think it is hard to make that practical, since it basically works on
short strings like single words. It would not work very well against the
variant spellings of 'viagra', because the 'Levenshtein edit distance'
can be made arbitrarily big.
- Add a "lint" option to procmail, to have it scan the rc file for
blatant errors and to diagnose possibly unsafe/incorrect recipes,
without actually executing the script.
Like perl -c, see `man perlrun` or `perldoc perlrun`.
- Improve procmail's performance by having it statically compile
scripts, where possible (recursive includes and includes that can't
be statically evaluated throw this out), and use the statically
compiled .pc-procmailrc as long as it is newer than .procmailrc,
for example.
Many recursive includes can be statistically evaluated.
Something else: the AND/OR issues of conditions.
Current way:
EITHER = '9876543210^0'
OR = $EITHER
OR_EVEN = $EITHER
NEITHER = '!'
NOR = $NEITHER
IF = ''
AND_ALSO = $IF
IF_NOT = '!'
AND_NOT = $IF_NOT
AND_ALSO_NOT = $IF_NOT
:0
*$ $EITHER condition-A
*$ $OR condition-B
*$ $OR_EVEN condition-C
action
:0
*$ $NEITHER condition-A
*$ $NOR condition-B
*$ $NOR condition-C
action
:0
*$ $IF condition-A
*$ $AND_ALSO condition-B
*$ $AND_ALSO condition-C
action
:0
*$ $IF_NOT condition-A
*$ $AND_NOT condition-B
*$ $AND_ALSO_NOT condition-C
action
I skipped the NAND(A,B,C), which is the same as OR(NOT A, NOT B, NOT C).
--
Grtz, Ruud
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail