nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] refile handling of corrupt .mh_sequences

2013-02-28 10:38:06
I frequently have corrupt .mh_sequences files, most likely due
to interaction between procmail (using rcvstore) and claws-mail
(which updates .mh_sequences, but seems to ignore rcvstore locks,
unsurprisingly).

I don't suppose you could get the developers of claws-mail to fix
that, could you?  Ah, okay, I see that Steve Rader tried to get people
interested in improving the sequence support, and was basically told
"screw you hippie, it kills performance".  Which I guess is why he
wrote MH-V :-)

[1] Perhaps there could be an option (yet-another-nmh-option!) to always
re-write poorly formatted files (delete lines that do not begin with a
sequence name ie., string without whitespace, terminated by a colon).

[2] Perhaps refile (and other things that read sequences files) could
treat lines that do NOT begin with a string terminated with a colon and
consist only of [0-9 -] as if they were a continuation of the previous
sequence.

You're not going to love my answer .. but both of these things are hard.

Why are they hard?  Well, the short answer is that a line without whitespace
but doesn't contain a colon isn't a valid RFC-822 header.

I suspect you're thinking, "Huh?  What does that have to do with sequences
files?"  Well, you may have noticed that a sequence file looks a lot like
an RFC-822 message header.  That's not a coincidence.  The same function
that reads email messages (m_getfld()) is used to read sequence files.
As long as it looks like an email header, it all works.  But otherwise ...
well, it's not clear what we can do when we hit an invalid header.  It
looks like if we return a FMTERR state, that's it for the m_getfld() state
machine; we can't call m_getfld() again.  It _may_ transition into the
BODY state, but that's not reversable.  So that would wipe out any sequences
after the corrupted part.  But I think you'll find that there is little
interest in messing with m_getfld() for this problem, as it would affect
how email is parsed.  Maybe we could add a flag or something that would
not cause the sequence code to bail if we hit the BODY state (if that is
indeed what is happening), but that's a few layers deep so it would be
significant code to change that (a global variable is possible, but I think
nmh has too many of those already).  A new function to change that behavior
is also an option.

As for [2], well, you _can_ continue sequence lines, as long as you follow
RFC-822 rules; the contination lines have to be prefixed by whitespace.
Again, fixing that would be hard.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>