nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?

2018-01-22 15:26:34
To answer your larger question (on the subject line):

- MH/nmh doesn't handle lines greater than 998 characters because such
 messages are not valid according to RFC 5322, and mhfixmsg isn't going
 to generate a message that nmh cannot handle.  Whether or not nmh SHOULD
 handle such messages is a different question.

Thank you, that helps.

And I won't presume to suggest what nmh should do, but I will point out
that I recently received a message with a text/html part which was one
single line of 42027 characters.  Clearly there are at least some senders
who have as much respect for RFC 5322 as Microsoft has for standards in
general. :-/

But I'm confused, because I didn't have any problems reading that message.
The structure on it is as follows:

 msg part  type/subtype              size description
   4       multipart/alternative    2213K
     1     multipart/related        2211K
     1.1   text/html                  41K
     1.2   image/jpeg                 28K
     1.3   image/jpeg                 42K
     [...]
     1.33  image/jpeg                 350
     2     text/plain                 808

...and part 1.1 has these headers:

   --Apple-Mail=_7C2BA5CB-FA71-4036-9FAD-C693FF38AF09
   Content-Type: multipart/related;
           type="text/html";
           boundary="Apple-Mail=_B4252506-2E52-4348-A3AD-C92C9A9FBD3D"

   --Apple-Mail=_B4252506-2E52-4348-A3AD-C92C9A9FBD3D
   Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html;
           charset=us-ascii

This part is 670 lines before decoding, and exactly one line afterward.
This arrived before I started using mhfixmsg, but given what I've just
learned I'd certainly expect mhfixmsg to refuse to decode it.


- The line length limit is imposed by m_getfld(), and that function is ...
 hairy.  I think changing that might have unexpected consequences; it
 might be fine, but I don't make any guarantees.  But the fact you said
 you could "easily modify" it suggests to me that you have not actually
 LOOKED at the code in question :-)

What I'd looked at was the content_encoding() function in uip/mhfixmsg.c,
where there are a few instances of literal 998 which really would be easy
to change.

You're right that I hadn't looked at the larger context, mostly because
I didn't know there was one.  This is the main reason why I asked before
doing anything.

I just took a quick look at sbr/m_getfld.c.  The first thing that struck me
was this comment at lines 158-163 (of the 1.7 version):

   [...] I considered
   using a Vax "scanc" to locate the end of the field followed by a
   "memmove" but the routine call overhead on a Vax is too large for this
   to work on short names.  If Berkeley ever makes "inline" part of the
   C optimiser (so things like "scanc" turn into inline instructions) a
   change here would be worthwhile.

I'm beginning to get a sense of (and becoming impressed by) just how old
this code base is.

But you're quite right that this code isn't easy to understand.  If I were
to modify uip/mhfixmsg.c without touching sbr/m_getfld.c, am I risking
anything other than generating messages that nmh won't be able to read?

     - Steven
-- 
___________________________________________________________________________
Steven Winikoff                |
Concordia University           | Celibacy is hereditary.  If your parents
Montreal, QC, Canada           | didn't have children, chances are you
Steven.Winikoff@concordia.ca   | won't either.

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>