procmail
[Top] [All Lists]

Re: need a sane approach to MIME attachment organization

2012-10-23 12:41:55
On 2012-10-21, LuKreme <kremels(_at_)kreme(_dot_)com> wrote:
 * you lose the ability to automatically process a file attachment
   and carry out realtime actions on that file the spot (unless you
   keep your MUA running 24/7 and code it to do the job of procmail).

Actually, you don’t. Processing the file does not necessitate
stripping the file.

Sure, but solving one problem causes another.  Either you lose the
ability to automate the management of your files, or you lose the
ability to avoid redundant copies.  An inherently flawed structure
forces you to choose a problem.  Theoretically, there is no reason we
can't have it all.

To my mind, the original message stays intact, period. Anything you
do after that should never change the messages that was received.

Why?  You seem to lack confidence that automated tools could handle
this, no?  Is a dedicated special purpose tool more likely to corrupt
a file than a fully featured MUA?

If doing verbatim bit-for-bit forensics, say for a law enforcement
office, it would be understandable to preserve precisely the original
message.  But for a large majority of users it makes sense to be
practical.

I have all kinds of rules adding various internal headers to messages
as they run through the scripts.  I would perhaps only be tempted to
take your position and not alter cryptographically signed portions of
content.  Beyond that, I'm quite willing to manipulate messages, in
the same way that I'm willing to scan and shred snail-mail.

   Thus wasting space,

As I said, if space is an issue, limit the size of
attachments. Encourage users to delete unneeded email.

This is another problematic hack that's driven by this idea of holding
email to a forensic standard.

My email archives, including attachments, for my personal mail are
under 5GB and go back to 1999, and I never delete anything but
spam. That’s a trivial amount of space.

Duplication is not just a waste of space.  There are update anomalies.
It's generally poor management of data to have multiple copies running
around.  When you need to destroy something, you potentially have to
find two copies to delete.. or just one, because you won't necessarily
know where the file came from or if it's replicated.  And when you
update the metadata on an external file, the mailed copy doesn't get
the update.

Now suppose I accept your idea that there should be two copies, one in
the email and a proper file that exists independantly on a filesystem.
What links or associates the proper file to the email containing the
duplicate attachment?  ATM, I've found no tools that make it trivial
to jump between the two.

unless you are still using mbox files, large attachments have
basically no impact on MUAs, only the count of messages in the
folder are an issue.

I would be all in favor of dumping mbox for a format that maintains a
relationship between the message and the file.  But I would not be
keen to replace mbox with something else that also does not adequately
establish links and metadata between the message and external content.

Suggestions?

I could of course invent a series of hacks, like sticking message ids
in PDF metadata, and adding custom mail headers that link back to the
file.  But then I would be putting myself in a sitution of having to
craft every tool I need.  Is there a standardized or perhaps common
approach?

   The space waste and potential for excessive mbox files is in
   part due to keeping the file as an ASCII payload.

No one who is serious about email should be using mbox files. They
are great when a really large mailspool was 1 megabyte. They are
quite terrible with mailboxes that are hundreds of megabytes or
more.

I'll give another example.  Having the file encoded in base64 ASCII,
and embedded in a message, is a lousy way to store a file.  Consider
having a need to search for text or some object embedded within your
files.  Your search tool must then be aware of your email archive
format, and it must extract each attachment, decode it, and search it.
It's CPU intensive, and human intensive as well if files are embedded
in encrypted messages.

And regarding the space- if you have extra space to throw around,
redundancy is more effective if you have proper par files or some
other form of parity data.  Having it as ascii does not leverage the
reliability that you can get from the redundancy.

 * the file must be extracted and potentially decrypted every single
   time it is opened from the message (if not duplicated).

This is a good thing.

Not if you've forgotten the password that decrypts the private key
needed to decrypt the file.  This has actually happened to me, btw.
And keys expire.

Public key crypto is good for transport, but not ideal for local
storage.  It's better to use symmetric key crypto on archives.

 * if you opt not to duplicate the file in the filesystem, then
   finding the file later on is non-trivial

Depends entirely on your MUA. finding attachments in my MUA is
trivial.

Hold on.  A /search/ is one way to find things, and
browsing/navigation is another.  Having a good search capability is
only half of it.  If you optimize your organization for browsing
attachments, you compromise on browsing for messages, and vice-versa.

In fact, I have particular uses where there is no need to deliver
the message at all.  I need to act on a file attachment contained
in an auto-generated message, process it, and throw away the email.

Yes, I have recipes like that as well. Your original message didn’t
lead me to believe you were talking about your personal mailspool.

Well I have a couple problems to solve involving attachments.  I wish
procmail were equipped for file handling.  There are some external
tools but they seem to turn the task into a hackjob.

So presumably this task is confined to a hackjob.  But I'd like to
know how others are approaching this hackjob.

That's a shame, because it means that procmail does not perform its
job well in circumstances where automation on inbound MIME objects
is required,

procmail was written long before MIME email became common.

Does this mean the language (and thus tool) does not evolve?  If I
wanted to join the project to expand the language and add some basic
features, would this be welcome?

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail