From sendmail+per(_at_)Sendmail(_dot_)ORG Mon Apr 10 15:51:50 2000 Date: Fri, 7 Apr 2000 02:00:52 +0200 (MET DST) From: Per Hedeland To: mmokrejs(_at_)natur(_dot_)cuni(_dot_)cz Cc: sendmail+per(_at_)Sendmail(_dot_)ORG, sendmail-questions(_at_)Sendmail(_dot_)ORG Subject: Re: doc/op/op.ps Martin Mokrejs wrote: >How would I dig out this explanation from current docs? ;-) As far as the meaning of the flags goes, what I wrote is almost verbatim what is in op.ps. As for the background, it's a very small piece of what the RFCs and the (E)SMTP protocol spec are all about - the sendmail documentation can't reproduce all that, op.ps does reference all the relevant RFCs on the very first page though... >> If you wanted both, you'd need to give both, but it's highly unlikely >> that you'd want both: '8' is basically only meaningful on SMTP mailers, >> and as Neil wrote, it's almost always a mistake to use '9' for anything > >Because sendmail doesn't convert at all ... No (sendmail does convert of course), but because even if the remote announces 8BITMIME support, it may need to in turn hand the message over to an MTA that does *not* have 8BITMIME support, and thus undo your 7->8 conversion with an 8->7 conversion - and you will just have wasted cpu cycles, and increased the risk of an error getting introduced by the conversions. >Hmm, so the explanation of `8' and `9' flags applies only to body of >message, right? So the body is sometimes converted (by sendmail), right? Yes. >Thank you, I'll happily start using it, I've just received a similar one. >So your script converts MIME headers into iso-8859-1, right? No, and I'm afraid you're missing the whole issue with header conversions: My script (like the one you sent) converts "MIME headers", i.e. those where some parts are encoded according to RFC 2047, into *"bytes"*. The *meaning* of those bytes, i.e. the charset used, i.e. the information needed to translate the bytes into the appropriate glyphs on your screen, is *lost*. Of course *I*, when using this script, make the implicit assumption that any such headers in mail I receive will actually be iso-8859-1 - all the software I use is set up to translate bytes into glyphs as if the bytes were representing 8859-1 characters - which means that if you sent me a message that used 8859-2, parts of it would come up with the wrong characters on my screen - but I don't care as I probably wouldn't be able to read the correct characters anyway.:-) The alternative would be for my script to leave anything that isn't 8859-1 as-is, encoded - there's no way to turn your encoded 8859-2 into decoded 8859-1.:-) This is certainly feasible but I had no reason to bother with it. Likewise, you can when using the script assume that you will only receive 8859-2 encoded - see, it's a portable script.:-) >What do you think of the attached recipe. I just can't compare them. >Does it tha same as yours? ;) This is getting rather far outside the scope of sendmail support, and I don't really have the time to look into the details of what you sent, but of course I think mine is better.:-) Anyway a few notes: The script you sent also decodes message bodies, mine doesn't do that at all, I leave it to sendmail's '9' flag - I'm not sure if the script decodes more types of messages than sendmail does. I also don't know what the $MIME_BIN_QP is (the script is obviously part of some larger package, something *I* don't like), but if it's something that does "standard" QP decoding, it isn't quite correct - header encoding differs slightly from that, notably in the handling of whitespace where space can be (and typically is) encoded as '_' in an "encoded-word" - there are also some issues with whitespace between the encoded-words. Then of course your method is likely to be more processing-intensive, as it starts up multiple processes (shell, sed, $MIME_BIN_QP) for *each header*, whereas I handle all headers within a single invocation of perl - also your method only handles Subject: and From:, these encodings can be used in various other headers, personally I find To: and Cc: more annoying than From:. And it doesn't handle Base64-encoded headers. >What I wish is: >- to convert from MIME to iso-8859-1 or iso-8859-2, depending on some >arguments to these scripts. > (I'm aware that I don't speak technically correct, I want as a result >either us-ascii 7-bit or iso-8859-2 8-bit output, is that clear?) See above, the charset is what it is, encoded or not, no script can change that, only encode or decode. Your method limits itself to decoding iso-8859-[1-9], I don't see any particular point in that - either you do only one charset, the one the user "expects", or you might as well do them all (and not run into issues with broken charset specs:-). --Per