Re: for John Hardin's wish list

On 21 August 1998, Philip Guenther <guenther(_at_)gac(_dot_)edu> wrote:

Liviu Daia <daia(_at_)stoilow(_dot_)imar(_dot_)ro> writes:

   Ok, how about this:

4. A more normal syntax: the _usual_ logical operators, the _usual_
arithmetical operators, the _usual_ control constructs (if-then-else,
do, while, break, continue, maybe switch too), the _usual_ {}
grouping, the _usual_ string functions, the _usual_ regexp matching
rules.  The _usual_ free format newline handling. ...


Umm, what is "usual"?  C?  Perl?  Python?


    You're missing my point.  _All_ these languages are "usual".  Add
also C++, Java, Pascal, Basic and Fortran (in its reasonably recent
incarnations).  Independently of the actual notations, they _have_ the
constructs above.  Both C and Fortran have an equality operator.  Even
if it's noted "==" in C and ".eq." in Fortran, it's still an equality
operator.  Conceptually, all these languages are instances of a kind
of generic pseudocode --- and pseudocode structures are one of the
first things taught in school.  No matter what language you're using
these days, you should still be familiar with the mental paradigm of
pseudocode, from your days in school.  This is so well educated, that
its constructs map almost directly to mental mechanisms for anyone
having took the first courses in programming.

    It doesn't really matter how you choose to denote the above
constructs.  I might argue that most people using Perl, Python and so on
are also likely to be at least somewhat familiar with C, so C notations
would be a good choice.  But, as I said, that doesn't really matter.
The important point is to _have_ those constructs in the language.

    Now, procmail doesn't have those constructs.  The effects of, say,
the numeric equality operator can be emulated most of the time, but
not _always_ (not without pain anyway).  Is this enough?  I claim it
isn't.  The reason is, of course, mental commodity.  All procedural
languages can be translated in code for, say, the Turing machine.  The
Turing machine is in fact stronger than most procedural languages out
there.  Yet, I didn't see many programs translated into Turing machine
code.  Why?  Because you lose the one-to-one mapping between your mental
model of the code you write and the actual code.  For any non-trivial
program, you basically end up by not understanding what the damn thing
does --- and that's what usually happens with procmail recipes too.
Yes, procmail has it's own unique language, but that language also has
almost all characteristics of a bad design (not all of them, because I
suppose procmail recipes are fairly easy to parse...).

The last, for example, does _not_ have "free format newline handling"
(and neither, strictly speaking, does Perl or awk).


    True.  My mistake.  Newline handling and {} grouping are mainly
notation details, and they are not really relevant to this discussion.

I happen to have written a couple procmailrcs that would have been
easy in Scheme, but most people appear to break out in hives at the
sight of that many parens, so I guess it's out.


    Incidentally, did you ask yourself what are the reasons for that
reaction?  You don't need any fancy classification theory to understand
that.  Scheme doesn't fit in the scheme above (minor pun intended)
because it's a descendant of Lisp, and Lisp was initially designed
as a string manipulation language, not as general purpose one (a
"pseudocode incarnation" that is).  Try to use common sense instead of
pure formalism here.

How about all the visual basic users out there?  VB is quite
structured, you know, so don't laugh.


    I'm not laughing, VB is yet another "usual" language in the sense
above.

                                    Something not unlike awk.  Keep
also the _usual_ operator precedence and associativity rules, and it
will look familiar to everybody.


Since you appear to have a particular "everybody" in mind, could you
clarify what background you're assuming they have?


    I think I already answered that: my "everybody" is formed of people
reasonably familiar with the pseudocode paradigm above (whether they
realize that or not).  IMO those people cover a significant part of
people knowledgeable enough to configure and use procmail.

   Code bloat?  If you insist, you can make it just as obfuscated as
it is now, and most people will think it's clever coding, not bloat.
Efficiency?  Just compile everything to bytecode (better yet: to
ASTs), and voila, you get something _much_ more efficient than what
you have now.


Given that >99% of all procmail invocations involve no looping but
rather just a single pass through an rcfile, I cannot believe that
compiling to bytecode will do anything but slow down procmail almost
all of the time.


    Again, you're trying hard to miss the point here.  You don't need
to compile the bytecode every time you run the program, you can do that
on demand, save the result on disc, and load the digested form when
you actually need to process a message.  Provided that you include the
compiled forms of the regexps in the bytecode, that would probably be
faster than what you have now most of the time.

Have you considered writing a tool that'll translate from some awkish
language into a set of procmailrcs?


    No, for two reasons:

(1) _That_ would be slow, simply because some awkish constructs would
    have to be translated to a lot of recipes (I don't even want to
    think about writing a code optimizer for procmail);

(2) Why stop there, since I can write my own filter.



    Regards,

    Liviu

-- 
Dr. Liviu Daia                   e-mail:   daia(_at_)stoilow(_dot_)imar(_dot_)ro
Institute of Mathematics         web page: http://www.imar.ro/~daia
of the Romanian Academy          PGP key:  finger 
daia(_at_)stoilow(_dot_)imar(_dot_)ro