nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] Troublesome messages

2017-10-14 19:06:12
Hi Jon,

Don't know if there's anything that can be done about this given
the nature of unicode and all, but I've been getting a lot of spam
recently that looks like this:

代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致

Not saying that it's not unicode, just that it makes a mess of my
window.  Using mate-terminal on linux, utf-8 local.
...
Screws up the display after message 5.

I poked about that email a bit on this UTF-8 xfce4-terminal.

    $ scan -width 0 -forma '%{from}\n%{subject}\n%{body}' .
    =?GB2312?B?wdbPyMn6?= <baoguan@hotmail.com>
    =?GB2312?B?tPq/qreixrE=?=
    ??Ʊ???????Żݡ?????֤?󸶿13651402207Ҷ???? ???һ??
    $

The `%{body}' output is nmh trying to take the GB2312 body as UTF-8,
struggling with many of the bytes, producing a `?' for them instead, but
some GB2312 bytes do happen to form a valid UTF-8 sequence so the odd
`Ʊ' gets invented.

    $ scan -width 0 -forma '%(decode{from})\n%(decode{subject})' .
    林先生 <baoguan@hotmail.com>
    代开发票
    $

`%(decode)' works.

    $ mhstore -outfile -
    ������Ʊ�������Żݡ�����֤�󸶿13651402207Ҷ���� ΢��һ��
    storing message 5 to stdout
    $

This time, nmh gets out the way and just flings the bytes at the TTY.
xfce4-terminal spots they're not valid and its U+FFFD `�' results; `Ʊ'
is still there.

    $ mhstore -outfile - | iconv -f gb2312
    storing message 5 to stdout
    代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
    $

It's valid GB2312 according to iconv(1) that's converted it to UTF-8.
uniq(1) says that's identical to the line you give above.

    $ mhshow | sed '$! d'
    代开发票。点数优惠。可验证后付款。13651402207叶先生 微信一致
    $

And that's the same line again, so nmh can do it too.

I think historically there's been various problems with sbr/fmt_scan.c,
e.g. its cpstripped(), and that could have included putting out partial
UTF-8, I don't recall.  You could capture the bytes from the scan that
messes up and send them here.  I've been using

    $ scan -version
    scan -- nmh-1.7-RC3 1.7-RC3-4-g3dfc049a built 2017-09-26 14:24:31 +0000 on 
orac

Also, try xterm instead.  I find it handy when another terminal's
quality is in doubt.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>