From sendmail+per(_at_)Sendmail(_dot_)ORG Mon Apr 10 15:51:50 2000
Date: Fri, 7 Apr 2000 02:00:52 +0200 (MET DST)
From: Per Hedeland <sendmail+per(_at_)Sendmail(_dot_)ORG>
To: mmokrejs(_at_)natur(_dot_)cuni(_dot_)cz
Cc: sendmail+per(_at_)Sendmail(_dot_)ORG, sendmail-questions(_at_)Sendmail(_dot_)ORG
Subject: Re: doc/op/op.ps

Martin Mokrejs <mmokrejs(_at_)natur(_dot_)cuni(_dot_)cz> wrote:
>How would I dig out this explanation from current docs? ;-)

As far as the meaning of the flags goes, what I wrote is almost verbatim
what is in op.ps. As for the background, it's a very small piece of what
the RFCs and the (E)SMTP protocol spec are all about - the sendmail
documentation can't reproduce all that, op.ps does reference all the
relevant RFCs on the very first page though...

>> If you wanted both, you'd need to give both, but it's highly unlikely
>> that you'd want both: '8' is basically only meaningful on SMTP mailers,
>> and as Neil wrote, it's almost always a mistake to use '9' for anything
>
>Because sendmail doesn't convert at all ...

No (sendmail does convert of course), but because even if the remote
announces 8BITMIME support, it may need to in turn hand the message over
to an MTA that does *not* have 8BITMIME support, and thus undo your 7->8
conversion with an 8->7 conversion - and you will just have wasted cpu
cycles, and increased the risk of an error getting introduced by the
conversions.

>Hmm, so the explanation of `8' and `9' flags applies only to body of
>message, right? So the body is sometimes converted (by sendmail), right?

Yes.

>Thank you, I'll happily start using it, I've just received a similar one.
>So your script converts MIME headers into iso-8859-1, right?

No, and I'm afraid you're missing the whole issue with header
conversions: My script (like the one you sent) converts "MIME headers",
i.e. those where some parts are encoded according to RFC 2047, into
*"bytes"*. The *meaning* of those bytes, i.e. the charset used, i.e. the
information needed to translate the bytes into the appropriate glyphs on
your screen, is *lost*.

Of course *I*, when using this script, make the implicit assumption that
any such headers in mail I receive will actually be iso-8859-1 - all the
software I use is set up to translate bytes into glyphs as if the bytes
were representing 8859-1 characters - which means that if you sent me a
message that used 8859-2, parts of it would come up with the wrong
characters on my screen - but I don't care as I probably wouldn't be
able to read the correct characters anyway.:-)

The alternative would be for my script to leave anything that isn't
8859-1 as-is, encoded - there's no way to turn your encoded 8859-2 into
decoded 8859-1.:-) This is certainly feasible but I had no reason to
bother with it. Likewise, you can when using the script assume that you
will only receive 8859-2 encoded - see, it's a portable script.:-)

>What do you think of the attached recipe. I just can't compare them.
>Does it tha same as yours? ;)

This is getting rather far outside the scope of sendmail support, and I
don't really have the time to look into the details of what you sent,
but of course I think mine is better.:-) Anyway a few notes: The script
you sent also decodes message bodies, mine doesn't do that at all, I
leave it to sendmail's '9' flag - I'm not sure if the script decodes
more types of messages than sendmail does. I also don't know what the
$MIME_BIN_QP is (the script is obviously part of some larger package,
something *I* don't like), but if it's something that does "standard" QP
decoding, it isn't quite correct - header encoding differs slightly from
that, notably in the handling of whitespace where space can be (and
typically is) encoded as '_' in an "encoded-word" - there are also some
issues with whitespace between the encoded-words.

Then of course your method is likely to be more processing-intensive, as
it starts up multiple processes (shell, sed, $MIME_BIN_QP) for *each
header*, whereas I handle all headers within a single invocation of perl
- also your method only handles Subject: and From:, these encodings can
be used in various other headers, personally I find To: and Cc: more
annoying than From:. And it doesn't handle Base64-encoded headers.

>What I wish is:
>- to convert from MIME to iso-8859-1 or iso-8859-2, depending on some
>arguments to these scripts.
> (I'm aware that I don't speak technically correct, I want as a result
>either us-ascii 7-bit or iso-8859-2 8-bit output, is that clear?)

See above, the charset is what it is, encoded or not, no script can
change that, only encode or decode. Your method limits itself to
decoding iso-8859-[1-9], I don't see any particular point in that -
either you do only one charset, the one the user "expects", or you might
as well do them all (and not run into issues with broken charset
specs:-).

--Per