From sendmail+per(_at_)Sendmail(_dot_)ORG Mon Apr 10 15:49:34 2000
Date: Thu, 6 Apr 2000 23:39:59 +0200 (MET DST)
From: Per Hedeland <sendmail+per(_at_)Sendmail(_dot_)ORG>
To: mmokrejs(_at_)natur(_dot_)cuni(_dot_)cz, sendmail-questions(_at_)Sendmail(_dot_)ORG
Cc: sendmail-questions(_at_)Sendmail(_dot_)ORG
Subject: Re: doc/op/op.ps

Neil W Rickert <sendmail+rickert(_at_)Sendmail(_dot_)ORG> wrote:
>Martin Mokrejs <mmokrejs(_at_)natur(_dot_)cuni(_dot_)cz> wrote:
>
>>So servers behaving as relay convert to 7-bit by default?
>
>The SMTP standards are for 7-bit.  Therefore 8-bit should only be
>sent when you know that the destination server will handle it
>correctly.

Just an addition that may help Martin's understanding of the '8' and '9'
flag issues: The primary way to "know" that the remote can handle 8-bit
is that it announces support for the 8BITMIME SMTP extension (RFC 1652)
- if it does, sendmail will not convert any messages it sends to this
remote to 7-bit.

However there are some cases where the remote doesn't announce 8BITMIME
support, but you still *know*, by means external to the SMTP protocol,
that it can handle 8-bit, and *then* it might be appropriate to use the
'8' flag. A typical example would be on an intranet where you have
control over the deployed MTAs and know their capabilites in this
respect, but haven't had time/opportunity to upgrade them all to
versions with RFC1652-support - this scenario is getting rapidly less
relevant as most all modern MTAs are capable of announcing 8BITMIME
support, though.

>>Hmm, so that's more or les my question above, anyway this sentence
>>doesn't help me much. In different words, are the features of option 9
>>available also under 8?
>
>That is not the part of the sendmail code I am most familiar with.
>I'll have to skip the question.

I'm reasonably familiar with it, but the question doesn't make much
sense - the '8'and '9' flags do completely different things:

'8': Send 8-bit, i.e. don't convert to 7-bit, even though the remote
doesn't announce 8BITMIME support.

'9': Decode text/plain messages with a 7-bit encoding (i.e. quoted-
printable or base64) into 8bit.

If you wanted both, you'd need to give both, but it's highly unlikely
that you'd want both: '8' is basically only meaningful on SMTP mailers,
and as Neil wrote, it's almost always a mistake to use '9' for anything
but local delivery (where it is default in the standard configs).

But anyway Martin, from your other messages it seems you are mostly
concerned with the *header* encoding, and as we discussed earlier on,
sendmail *never* does anything with this - neither encoding nor
decoding. You might also want to note that even with the 8BITMIME
extension, the standards *never* allow for 8-bit characters in the
*headers* in SMTP.

You specifically mentioned header encoding messing things up for
procmailrc recipes - so why not use a procmailrc recipe to decode the
headers? It's rather trivial actually, and it just so happens that I
have one that I've been using for a year or so without ill effects. It's
enclosed below - but this is *not* "official" sendmail.org stuff,
especially not the comments...:-)

--Per

-------------------------------------------------------------------

Procmailrc recipe - could theoretically be put in /etc/procmailrc to
"benefit" all users if you have procmail as LDA:

# De-mangle RFC 2047 header mangling
:0Hhfw
* =\?[^?]+\?[qb]\?[^?]+\?=
| $HOME/bin/dmmh

And the filter script:

#!/usr/bin/perl
#
# dmmh [ old-prefix ]
#
# De-Mangle MIME Headers (dmmh), more precisely headers that have been
# mangled according to RFC 2047 et al - i.e. the
# =?ISO-8859-1?Q?Qu=6ft=65d-Unr=65=61d=61bl=65_t=65xt?= stuff. Stdin is
# expected to be an RFC822-type message (or preferably the headers only,
# eg. called via a .procmailrc 'hf' recipe) - stdout is unchanged,
# except that any mangled headers have a prefix (default $PREFIX below,
# can be overridden by cmdline arg - a null arg means throw away the
# mangled header) prepended to their field name, and are followed by a
# de-mangled version of the header.
# QP/B64 decoding funcs shamelessly stolen from the MIME module by
# Gisle Aas (somewhat modified for header encoding etc).
#
# Per Hedeland <per(_at_)erix(_dot_)ericsson(_dot_)se> 99-03-08

$PREFIX="Old-";

$prefix = $#ARGV >= 0 ? $ARGV[0] : $PREFIX;

while (<STDIN>) {	# headers
    last if /^$/;
    if (/^\S/) {
	&do_hdr if $hdr;
	$hdr = $_;
	next;
    }
    $hdr .= $_;
}
&do_hdr if $hdr;
print if /^$/;		# separator (if any)
while (<STDIN>) {	# body (if any)
    print;
}

sub do_hdr {
    $new = "";
    $rest = $hdr;
    while (($pre, $enc, $code, $post) =
	    ($rest =~ /^(.*?)=\?[^?]+\?([qb])\?([^?]+)\?=(.*)$/is)) {
	$new .= $pre if $pre =~ /\S/;
	$new .= $enc =~ /q/i ? &decode_qp($code) : &decode_b64($code);
	$rest = $post;
    }
    if ($new) {
	$new .= $rest;
	print $prefix . $hdr if $prefix;
	# Unfold excessive(?) folding... (but don't re-fold - too hard:-)
	$max = 75;
	while (($pre, $middle, $post) =
	       ($new =~ /^(.*[\S])\s*\n\s+(.*)((\n|.)*)$/)) {
	    if (length($pre) + length($middle) > $max) {
		print $pre . "\n\t";
		$max = 67;
		$new = $middle . $post;
	    } else {
		$new = $pre . " " . $middle . $post;
	    }
	}
	print $new;
    } else {
	print $hdr;
    }
}

sub decode_qp {
    my $res = shift;

    $res =~ s/_/=20/g;		# code hex 20 may be encoded as '_'
    $res =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;
    $res;
}

sub decode_b64 {
    local($^W) = 0; # unpack("u",...) gives bogus warning in 5.001m

    my $str = shift;
    my $res = "";
   
    $str =~ tr|A-Za-z0-9+/||cd;             # remove non-base64 chars (padding)
    $str =~ tr|A-Za-z0-9+/| -_|;            # convert to uuencoded format
    while ($str =~ /(.{1,60})/gs) {
        my $len = chr(32 + length($1)*3/4); # compute length byte
        $res .= unpack("u", $len . $1 );    # uudecode
    }
    $res;
}