ietf-822
[Top] [All Lists]

Re: Dealing with invalid UTF-8

2003-09-11 09:48:58

I have seen similarly misguided mailing list software which prepended an announcement to messages, regardless of the MIME structure. In many cases, because base64-decoding the announcement didn't correspond to an integral number of output bytes, the rest of the output ended up being bit-shifted. (Of course, the '=' padding at the end of the input wasn't correct, but it was too late for the decoder to recover).

One approach is to ignore lines which contain invalid base64 characters, rather than just ignoring the invalid characters. That would work in these cases, at any rate; maybe other GIGO cases would behave worse, though.

--Grant

On Sep 11, 2003, at 2:39, Arnt Gulbrandsen wrote:


Hi,

this morning I received a single-part mail message looking like this:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
...
b2VrcmlzIEVuZ2luZWVyaW5nLCB0ZWNobmljYWwgZGlzY3Vzc2lvbiBtYWlsaW5nIGxpc3Q NCgk+ IFt1bl1zdWJzY3JpYmU6IGh0dHA6Ly9saXN0cy5zb2VrcmlzLmNvbS9tYWlsbWFuL2xpc3R pbmZv
L3NvZWtyaXMtdGVjaA0KCT4NCgkNCgkNCg0K
_____________________________________________________________________
Soekris Engineering, technical discussion mailing list
[un]subscribe: http://lists.soekris.com/mailman/listinfo/soekris-tech

What's considered the best approach for dealing with something like this? ("Complain to GNU Mailman maintainer", while doubtless satisfying to some, is not my thing. I suppose I'm growing old.)

The RFC states that "Any characters outside of the base64 alphabet are to be ignored in base64-encoded data". That approach leads to appended garbage. Stopping decoding as soon as an illegal character is seen works in this case, but perhaps it might lead to truncation in other error cases?

Comments? Advice?

--Arnt



<Prev in Thread] Current Thread [Next in Thread>