nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] Request for sortm feature to implement arbitrary message ordering

2014-06-26 06:16:27
Hi Norm,

This request is for sortm to have a '-program path-name' feature

path-name would name a command which would be invoked by sortm. It
would be given 2 arguments, full path names of messages. Its standard
output would either

    begin with "+", meaning that the first argument is to be regarded
    as greater than the second,

    or

    begin with '-", meaning that the first argument is to be regarded
    as less then the second

    or

    be empty  or begin with some other character, meaning that sortm
    would use its usual mechanisms to relate the two arguments.

What if they are equal?  How about the result characters for a
comparison of foo and bar being `<', `=', and `>'.  Should the sort be
defined as stable given equal comparisons?  I think that would be useful
else each run could annoyingly permute equal emails into a new order.

A non-zero exit status would be a fatal to sortm.

I see Ken's point about using exit status, but I think it's too easy for
a script to `exit 1' without meaning to give a comparison result, e.g.
`set -e' is in place and grep, unexpectedly, doesn't find any matches.
So I think stdout is probably the better channel.  I'd like to see  it
be precisely defined as two bytes then EOF, second being `\n'.

sortm would assume, without checking, that the ordering imposed by the
command was transitive and anti-symmetric. That is, that a<b and b<c
implies a<c and that a<b implies b>a.

By implication, the comparison program is buggy if that doesn't hold?
sortm(1) punts to qsort(3) for the hard graft and that demands
consistency;  I think I'd like sortm to protect me from a buggy
comparison program.

kre has brought up the issue of fork/exec overhead, and I was wondering
whether a qsort-like cmp program is the right interface.  What about
taking a leaf out of Python's sort and providing a program to generate a
`key' line for each argument.  So, to sort by size of the mail file a
wrapper for stat(1) could be used.

    $ stat -c '%010s' /etc/passwd /etc/group
    0000002243
    0000001105
    $ 

The program could be invoked several times, at most once per mail
message, to gather all the keys.  They'd be compared by strcoll(3) by
default, not strcmp(3).  There must be one, possibly empty, line per
argument, and it must exit(0).

(I see subsort() and txtsort() use strcmp(3) rather than strcoll(3).
http://git.savannah.gnu.org/cgit/nmh.git/tree/uip/sortm.c#n495 Is this
still desired?)

If there were several  '-program path-name' pairs the last one would
be used.  -noprogram would cancel any previous '-program path-name'
pair.

I know this is in the MH tradition, but I find it a bit restrictive.

Taking sort(1)'s multiple -k options as inspiration, what about allowing
the sortm options multiple times, as kre suggests, including the new
key-program one?

    sortm -kp ./msgsize -kh x-mailer -kd date -keyd resent-date

I've always found sortm's logic over -textfield and -datefield very
contorted and not useful, e.g.

    With -textfield field, if -limit days is specified, messages with
    similar textfields that are dated within `days' of each other appear
    together.  Specifying -nolimit makes the limit infinity.  With
    -limit 0, the sort is instead made textfield-major, date-minor.

New -kheader, -kdate, -kprogram options could respectively strcoll()
decoded headers, compare dates, and kick off an external key-generating
program.  A character not acceptable in header names, colon?, could be
used to suffix flags, e.g. `-kh x-priority:nr' to have X-Priority's `42'
be sorted numerically in reverse order.  This allows -kdate to be
dropped;  it's -kheader with a `d' for date flag.  These flags apply to
-kp too, so stat(1) need not pad with 0s if `:n' is given.

The two new -k[hp] options would be mutually exclusive with the old
-{text,date}field;  both could not be given.  This allows for a clean
break with the old.

Does that cover your needs?  As kre said, this can all be done with the
anno+sortm+anno dance I showed in
http://lists.nongnu.org/archive/html/nmh-workers/2014-06/msg00164.html
but it would be nice to give nmh native modern sorting.

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>