nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-12 04:04:50
Hi Valdis,

Your email was interesting.  Ken wrote

    ¯\_(ツ)_/¯

which in UTF-8 is

    $ hd <<<'¯\_(ツ)_/¯'
    00000000  c2 af 5c 5f 28 e3 83 84  29 5f 2f c2 af 0a  |..\_(...)_/...|
    0000000e
    $ 

and in Unicode is

    $ iconv -f utf-8 -t ucs-4le <<<'¯\_(ツ)_/¯' |
    > hexdump -ve '8/4 "% 8x" "\n"'
          af      5c      5f      28    30c4      29      5f      2f
          af       a                                                
    $

Your MIME email which quoted it arrived here containing

    Content-Type: text/plain; charset=utf-8
    Content-Transfer-Encoding: quoted-printable

    =AF=5C_(ツ)_/=AF

I think that's faulty.  The initial U+00AF has been QP'd as =AF when it
should be the UTF-8 =C2=AF.  The U+30C4 has been put in as the UTF-8 ツ
without being QP'd at all.

It doesn't display correctly here when decoded, e.g. the un-QP'd =AF
isn't valid UTF=8.

What sorry excuse for an MUA are you using over there?  :-)
And why doesn't it complain at you when it spots the attempt to send
these transgressions onto the wire?

-- 
Cheers, Ralph.

<Prev in Thread] Current Thread [Next in Thread>