procmail
[Top] [All Lists]

Re: Undesirable characters in subject text

2003-05-27 19:16:39
At 15:41 2003-05-27 -0700, procmail(_at_)deliberate(_dot_)net wrote:
        Actually I did, thanks. I really do thoroughly read and
try to understand most of your posts here, Sean. Despite your
gruff nature

I think of it more as MOF - Matter Of Fact.

The implication was that if your problem was unix wildcards in the string, doublequoting it would resolve the specific problem you were reporting:

| echo "$SUBJECT" >> somefile

Seems to work just fine even when there is a unpaired quote within the string (even of the same type that you're enclosing the string in).

BTW, if your db is comma delimited, you'll likely need to contend with quoting strings which contain commas, and where there may be quotes, ensuring that they're not blasted by that same process.

=> You may want to use:
=> | echo -n "$SUBJECT" >> somefile
=> Which would omit the trailing newline which echo would otherwise tack onto
=> the emitted text.

        It's only the "odd" ones that I wanted to suppress, the
ones that are the result of the unescaped wildcards.

Once again, if you enclose it in quotes as shown, that seems to be resolved. I still don't follow what about that solution isn't working for you, which is what led me to believe you had not invoked the examples provided.

The suggestion to use -n was if you were building a db file where you might really want each field of the record to appear on a single line. if that isn't a concern for you, don't use the -n. I did include the explanation for the variation. Perhaps offering up that tidbit misled you to believe I was saying removal of the echo-provided linefeed was my proferred solution to your problem, which isn't the case - the doublequotes are.

        Guess I'm back to square one.  I don't know sed and I'm
not sure I want to incur the costs of the call.

The manpage for 'tr' is pretty straightforward, and for what you're likely to need - a simple character deletion or translation (versus string operations, inclusive of regexp and the like, for which you'd use the much larger sed).

If this were something big, in a production environment, the wise programmer might consider simply writing a daemon which uses a named pipe. This would resolve several things: procmail wouldn't invoke anything - it'd simply output to a file (which, because it is a named file, would never actually be written to disk, though your lockfile would be); no commandline invocation of anything, so no issues with wildcard, hibit, or control characters; no shell invocations; no additional process being fired up each time a message comes through (beyond all the processes already invoked as a matter of course in your system) - the daemon monitoring the named pipe is simply sitting there always running. Since you'd have a lockfile on its input, only one process at a time would be writing to it, so it handily keeps the input serialized.

For those who are blissfully unaware, named pipes are exactly how one creates a dynamic .sigfile - you run a program which creates a named pipe of ~/.signature, and each time that named pipe is read from, it writes a new signature (fortune, whatever) to it.

A certain beauty exists with the named pipe: one of the two programs in the equation (in this case, procmail) doesn't need to know that it's a named pipe - to that program it's simply a file - absolutely nothing special is performed by the process which doesn't control the pipe.

The only real risk with a named pipe is if the monitoring process dies: either the pipe isn't properly deleted, and so processes continue to attempt to write to something which isn't being read), or the pipe goes away, and the processes are now creating and writing to an actual file. Properly written, there shouldn't be much of a risk that the daemon will spontaniously die. The daemon could catch signals and properly close the named pipe, and you could have a cron task that periodically restarts the daemon as necessary, and when restarted, before creating the named pipe, it could see if a physical file exists and process the contents of that as if they were written to the pipe (when done, remove the file and open a pipe in its place). Chances are, most people don't bother with the extra precautions.

Just as a regular file would, the named pipe actually has an owner and file permissions, allowing the creator to limit who can write to it - indeed, processes owned by other users on the system could be permitted to write to it.

The technique is also useful for allowing listserv programs to tack on a standardized list footer, but with a variable web password (that say, cycles every 24 or 48 hours) for limiting access to the list archives to list subscribers only.

BTW, the mkfifo mechanism can be used to manage files used with an INCLUDERC in procmail (i.e. some program which reads a db of some nature and emits procmail code). Procmail doesn't invoke any special program - it simply opens the file that is being INCLUDERC'd. As changes are made to the db which generates the RCFILE (in memory, not on disk), the file is dynamically regenerated.

Also if I must, I'd rather be lazy and try a cook-book sed example given by
another than try to stretch my brain around more *nix stuff.

Well, then you're going to need to meet us halfway and at a minimum, present the character class which you either want to retain, or which you want to discard or translate. If you're translating, you'll need to define what you want disallowed symbols translated to.

To delete, say, all control chars, and all hibit chars:

| echo "$SUBJECT" | tr -d "[\000-\037][\200-\377]" >> somefile

Or, to translate the characters to some other symbol:

| echo "$SUBJECT" | tr "[\000-\037][\200-\377]" "%" >> somefile

If this invocation actually proves to be too much of a burden on your host, a CPU upgrade may be in order. Sadly, tr doesn't support having the original text to be translated provided to it by any means other than its stdin, so you still have to pipe from echo.

If you use 'time' a large saved mailbox, and a sandbox config, you could get an idea as to the CPU overhead for the additional processes (bear in mind that you'll have cacheing going on)

        Of course, the easiest solution might be to make the
program which periodically reads the file and writes it to the
MySQL database a bit more intellegent when it comes to handling
each line - as long as the wildcard problem is limited to an
extraneous embedded linefeed.

Ah, the "dbase" file you refer to isn't the live database, it is merely an interrim file? Why isn't the data submitted directly to the live SQL db? If you did that in realtime, the CPU cycles (as well as intermediate disk writes) you would save would probably make up for any added overhead of making the initial submitter a bit more intelligent, and you'd have just one helper app - the app which takes the provided data and outputs it to the SQL database.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail