nmh-workers
[Top] [All Lists]

Re: mhfixmsg character set conversion

2022-02-12 09:17:17
Steven wrote:

   1) Replacing par does indeed fix one of the three failed tests.

Progress!

...so clearly I need to replace elinks in my html_to_text script, and doing
that will solve the problem that prompted this discussion, leaving the
following questions:

   1) What's the best replacement for elinks?

mhn.defaults.sh looks for text/html helpers in this order:
    1. w3m
    2. lynx
    3. elinks

I don't know if one is necessarily "better" than another.

If you have suggestions on how to improve the arguments that mhn.defaults.sh
uses for elinks, please let us know.

   2) Should I replace my 1.7.1 installation by the version I just built?
      Basically I'm asking what benefits the current snapshot has over
      1.7.1,

See docs/pending-release-notes.

      and how far away the next numbered release might be.

Unknown.  Ken appears to be busy.  One of us here could push it out.  It's
been almost 4 years so I think that would be a good idea.  Perhaps after
things here settle down a bit.

   3) How can I guarantee that messages will be saved with quoted-printable
      or base64 parts decoded, without patching mhfixmsg to deal with
      messages in which the decoded text would be more than 998 characters
      long?

I don't know your reason for patching mhfixmsg.  IIRC, you were using
-decodetext 8bit; binary instead of 8bit might help.  The mhfixmsg man
page might provide some insight.

      That raises some further questions:

         - Why wasn't the text/html part converted to utf-8?

mhfixmsg only converts the character set of text/plain.  That was a
design decision.  Other subtypes can be extracted with mhstore and run
through iconv.  If there's a use for converting them in place in
mhfixmsg, it wouldn't be difficult but I'm not sure how useful it
would be.

         - Regardless of the answer to the previous question, after a
           message has been refiled (and assuming I'm not planning to
           resend it to anyone), is there a practical difference between
           binary and 8bit encoding?

"Note that -decodetext binary can produce messages that are not compliant
with RFC 5322, §2.1.1."

         - Why are the headers of the decoded message identical to those
           of the input, despite the use of -decodeheaderfieldbodies?

           (...and yes, the unmodified version of the message does contain
            some encoded headers that my decode_headers program found and
            decoded; mhfixmsg appears not to have done so).

Is it a proper MIME message (does mhfixmsg return with a non-zero exit
status)?  If so, can you send it to me off-line?

The test suite has a case, boiled down a bit here:

$ cat test1
To: recipient@example.com
From: sender@example.com
Date: Wed, 28 Sep 2016 11:24:28 -0400
Subject: ?utf-8?B?dGhpcyBTdWJqZWN0IHdhcyBVVEYtOCBlbmNvZGVk?MIME-Version: 1.0
Content-Type: multipart/mixed; boundary 1a114dd3e8fe9c56053d92f414
Content-Transfer-Encoding: 8bit

--001a114dd3e8fe9c56053d92f414
Content-Type: text/plain; charsetUTF-8

This is a test.

--001a114dd3e8fe9c56053d92f414--
$ mhfixmsg -file test1 -out - -decodeheader utf-8 | diff - test1
4c4
< Subject: this Subject was UTF-8 encoded
---
Subject: ?utf-8?B?dGhpcyBTdWJqZWN0IHdhcyBVVEYtOCBlbmNvZGVk?
David


<Prev in Thread] Current Thread [Next in Thread>