Hi Ken,
Probably the best way to do that is using mhbuild directives.
That is, you can today do stuff like:
#<text/plain; charset=utf-8
[... utf-8 text here ...]
#<text/plain; charset=iso-8859-1
[... iso-8859-1 text here ...]
#<text/html; charset=utf-8
[... HTML text here ...]
The input to mhbuild can be that, it's true, though a text editor might
only handle it in the C locale. And then nmh treats a NUL byte as end
of string, e.g. charset=ucs-2le doesn't work. Worse than just
truncating the UCS-2LE input, it causes corruption in earlier parts in
this experiment.
$ cat build
#! /bin/bash
(
printf '%s\n' \
'subject: Test.' \
'' \
'Disappears.' \
'#<text/plain; charset=iso-8859-1' \
$'Fiat: $ \xa3' \
'#<text/plain; charset=ucs-2le'
iconv -t ucs-2le <<<'† Footnote.'
) >draft
sed -n l draft
echo
cp draft mimed
mhbuild -list -realsize -headers -verbose mimed
echo
sed -n l mimed
$
$ ./build
subject: Test.$
$
Disappears.$
#<text/plain; charset=iso-8859-1$
Fiat: $ \243$
#<text/plain; charset=ucs-2le$
¹ \000F\000o\000o\000t\000n\000o\000t\000e\000.\000$
\000$
msg part type/subtype size description
0 multipart/mixed 99
boundary="----- =_aaaaaaaaaa0"
1 text/plain 34
charset="UTF-8"
2 text/plain 3
charset="ucs-2le"
subject: Test.$
MIME-Version: 1.0$
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"$
Content-ID:
<21398(_dot_)1623492782(_dot_)0(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
Content-Transfer-Encoding: 8bit$
$
------- =_aaaaaaaaaa0$
Content-Type: text/plain; charset="UTF-8"$
Content-ID:
<21398(_dot_)1623492782(_dot_)1(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
Content-Transfer-Encoding: 8bit$
$
² ain; charset=iso-8859-1$
Fiat: $ \243$
$
------- =_aaaaaaaaaa0$
Content-Type: text/plain; charset="ucs-2le"$
Content-ID:
<21398(_dot_)1623492782(_dot_)2(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
$
³ $
$
------- =_aaaaaaaaaa0--$
$
1. sed happily displays the NUL bytes in the draft.
2. The ‘Disappears’ part in the draft has vanished. The Fiat part
starts with part of the preceding directive. Altering the length of the
UCS-2LE part changes how far back this part erroneously starts;
I suspect some pointer subtraction.
3. All that makes it into the UCS-2LE part is the three spaces which
represent the first three-quarters of the U+2020 dagger and its
following U+0020 space.
This isn't a complaint, just passing on the observation having made the
effort.
--
Cheers, Ralph.