nmh-workers
[Top] [All Lists]

Re: mhfixmsg character set conversion

2022-02-12 05:51:53
Hi Steven,

I assume vim(1) will read up to a certain amount until it either
makes up its mind or assumes the default.

That makes sense.

Yes, but I was wrong...

        - Lines 85-110 are the text/plain portion, with

             Content-Transfer-Encoding: 8bit
             Content-Type: text/plain; charset="UTF-8"
             Mime-Version: 1.0

        - Lines 112-336 are the text/html portion, with

             Content-Transfer-Encoding: 8bit
             Content-Type: text/html; charset=iso-8859-1
             Mime-Version: 1.0

...so it seems that tr is reporting exactly what we'd expect to see.

Agreed.

The file has UTF-8 and later ISO 8859-1.  vim(1)'s logic is to keep
trying to parse the bytes of the file as one encoding after another,
stopping at the first which is successful.  The list of encodings comes
from ‘:se fileencodings?’ which defaults to
‘ucs-bom,utf-8,default,latin1’ here.

There's no BOM so ucs-bom fails.  The ISO 8859-1 bytes don't happen to
be valid UTF-8.  ‘default’ means use your environment, which is probably
UTF-8 again; fails.  Which means we arrive at ‘latin1’, AKA ISO 8859-1,
which is happy.

...but in bash, although the line gets pasted, the newline at the end
of it somehow doesn't.

Another difference is the pasted text is normally highlighted in some
way, e.g. inverse video, until it's committed with Enter.

-- 
Cheers, Ralph.

<Prev in Thread] Current Thread [Next in Thread>