nmh-workers
[Top] [All Lists]

Re: [nmh-workers] INCing of email archives

2019-07-25 03:19:30
Hi Bakul,

Once in a while I download email archives of some mailing list
and unpack them using "inc -file <archive-file>". But more
than once I have seen that inc gets confused and doesn't
unpack the whole thing. The cause seems to be a line starting
with From in some message body.

Then it isn't any of the four mbox formats described at
https://en.wikipedia.org/wiki/Mbox#Family ?

Ideally inc should look that a "From ..." line is immediately followed
by header lines.  And if this is not the case, assume it is in the
message body.

I agree that would be one heuristic to help, but it would also have
problems:

    From the outset, was clear we failed 42
    times: the first on attempting to read faulty input...

fix() {
      grep -n '^From .*[^0-9]$' $1 | sed 's/:.*/s|^|>|/' > ,$1
      if [ -s ,$1 ]; then echo wq >> ,$1; cat ,$1 | ed $1; fi
      rm ,$1
}

This prepends a > to any line beginning with "From "and not
ending with a digit.

    sed -i '/^From .*[^0-9]$/s/^/> /' "${1?}"

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>