Re: Bug reported regarding Unicode handling in email address

Hi Ken,

Probably the best way to do that is using mhbuild directives.
That is, you can today do stuff like:

#<text/plain; charset=utf-8
[... utf-8 text here ...]
#<text/plain; charset=iso-8859-1
[... iso-8859-1 text here ...]
#<text/html; charset=utf-8
[... HTML text here ...]


The input to mhbuild can be that, it's true, though a text editor might
only handle it in the C locale.  And then nmh treats a NUL byte as end
of string, e.g. charset=ucs-2le doesn't work.  Worse than just
truncating the UCS-2LE input, it causes corruption in earlier parts in
this experiment.

    $ cat build
    #! /bin/bash

    (
        printf '%s\n' \
            'subject: Test.' \
            '' \
            'Disappears.' \
            '#<text/plain; charset=iso-8859-1' \
            $'Fiat: $ \xa3' \
            '#<text/plain; charset=ucs-2le'
        iconv -t ucs-2le <<<'† Footnote.'
    ) >draft
    sed -n l draft
    echo

    cp draft mimed
    mhbuild -list -realsize -headers -verbose mimed
    echo

    sed -n l mimed
    $
    $ ./build
    subject: Test.$
    $
    Disappears.$
    #<text/plain; charset=iso-8859-1$
    Fiat: $ \243$
    #<text/plain; charset=ucs-2le$
 ¹     \000F\000o\000o\000t\000n\000o\000t\000e\000.\000$
    \000$

     msg part  type/subtype              size description
       0       multipart/mixed             99
                 boundary="----- =_aaaaaaaaaa0"
         1     text/plain                  34
                 charset="UTF-8"
         2     text/plain                   3
                 charset="ucs-2le"

    subject: Test.$
    MIME-Version: 1.0$
    Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"$
    Content-ID: 
<21398(_dot_)1623492782(_dot_)0(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
    Content-Transfer-Encoding: 8bit$
    $
    ------- =_aaaaaaaaaa0$
    Content-Type: text/plain; charset="UTF-8"$
    Content-ID: 
<21398(_dot_)1623492782(_dot_)1(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
    Content-Transfer-Encoding: 8bit$
    $
 ²  ain; charset=iso-8859-1$
    Fiat: $ \243$
    $
    ------- =_aaaaaaaaaa0$
    Content-Type: text/plain; charset="ucs-2le"$
    Content-ID: 
<21398(_dot_)1623492782(_dot_)2(_at_)orac(_dot_)inputplus(_dot_)co(_dot_)uk>$
    $
 ³     $
    $
    ------- =_aaaaaaaaaa0--$
    $ 

1. sed happily displays the NUL bytes in the draft.

2. The ‘Disappears’ part in the draft has vanished.  The Fiat part
starts with part of the preceding directive.  Altering the length of the
UCS-2LE part changes how far back this part erroneously starts;
I suspect some pointer subtraction.

3. All that makes it into the UCS-2LE part is the three spaces which
represent the first three-quarters of the U+2020 dagger and its
following U+0020 space.

This isn't a complaint, just passing on the observation having made the
effort.

-- 
Cheers, Ralph.

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Bug reported regarding Unicode handling in email address, (continued) Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks Re: Bug reported regarding Unicode handling in email address, Robert Elz Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy <= Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Steffen Nurpmeso Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Steffen Nurpmeso Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Steffen Nurpmeso

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:	Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy
Next by Date:	Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy
Previous by Thread:	Re: Bug reported regarding Unicode handling in email address, Ken Hornstein
Next by Thread:	Re: Bug reported regarding Unicode handling in email address, Ken Hornstein
Indexes:	[Date] [Thread] [Top] [All Lists]

Previous by Date:

Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy

Next by Date:

Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy

Previous by Thread:

Re: Bug reported regarding Unicode handling in email address, Ken Hornstein

Next by Thread:

Re: Bug reported regarding Unicode handling in email address, Ken Hornstein

Indexes:

[Date] [Thread] [Top] [All Lists]