nmh-workers
[Top] [All Lists]

Re: mhfixmsg character set conversion

2022-02-10 05:22:55
Hi Steven,

I expect the bad file has something earlier on which fixes vim's
idea of the encoding to ISO 8859-1

That does seem to be the case.  Do you have any idea what kind of
thing that might be?  (I know you can't diagnose a file you haven't
seen, but in general, what sorts of things should I look for?)

Non-ASCII bytes from the start of the file.  I assume vim(1) will read
up to a certain amount until it either makes up its mind or assumes the
default.

Try this to remove the boring ASCII bytes and see what's left.

    tr -d ' -~' <bad | env LC_ALL=C grep -n .

   $ grep -n ^Veuillez good | cut -c1-68
   108:Veuillez ne pas répondre au présent courriel. Il a été gén�
...
(The ‘�’ at the end is to be expected.)
...
Until now, I've only ever seen that glyph when a character doesn't
exist in the font being used

No, it's not related to a Unicode code point not being in the font, or
only historically.
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
describes ‘�’ and it's being seen above because cut(1) is cutting bytes
and the ‘108:’ at the start of the line has shifted the 68/69 cut-off
point to part-way through the UTF-8 for a single code point AKA rune.

   $ setenv LC_ALL C
   $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet
   Veuillez ne pas r<c3><a9>pondre au pr<c3><a9>sent courriel. Il a 
<c3><a9>t<c3><a9> g<c3><a9>n<c3><a9>r<c3><a9>

Good.

As expected, this returned pretty much instantly.  Then I tried this:

   $ sh
   $ LC_ALL=C
   $ echo $LC_ALL
   C
   $ perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

That's setting a local shell variable LC_ALL unless LC_ALL already
exists in the environment, and it probably doesn't.  Try

    sh
    LC_ALL=C; export LC_ALL
    locale
    perl -lpe 's/[^ -~]/sprintf "<%02x>", ord($&)/ge' good_snippet

Which in a way is good, because at least it means bash is behaving
consistently.

Beware that invoking bash(1) as ‘sh’ is not the same as running ‘bash’.
Might not make a difference in this case, but in general it's better to
run whichever is desired.

I propose to forget this particular clupea harengus of the crimson
variety unless you find it interesting in and of itself.

It is odd.  And odd might affect other things, including to do with nmh.
:-)

-- 
Cheers, Ralph.

<Prev in Thread] Current Thread [Next in Thread>