Re: Dealing with invalid UTF-8

I have seen similarly misguided mailing list software which prependedan announcement to messages, regardless of the MIME structure. In manycases, because base64-decoding the announcement didn't correspond to anintegral number of output bytes, the rest of the output ended up beingbit-shifted. (Of course, the '=' padding at the end of the input wasn'tcorrect, but it was too late for the decoder to recover).

One approach is to ignore lines which contain invalid base64characters, rather than just ignoring the invalid characters. Thatwould work in these cases, at any rate; maybe other GIGO cases wouldbehave worse, though.


--Grant

On Sep 11, 2003, at 2:39, Arnt Gulbrandsen wrote:

Hi,

this morning I received a single-part mail message looking like this:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
...
b2VrcmlzIEVuZ2luZWVyaW5nLCB0ZWNobmljYWwgZGlzY3Vzc2lvbiBtYWlsaW5nIGxpc3QNCgk+IFt1bl1zdWJzY3JpYmU6IGh0dHA6Ly9saXN0cy5zb2VrcmlzLmNvbS9tYWlsbWFuL2xpc3RpbmZv
L3NvZWtyaXMtdGVjaA0KCT4NCgkNCgkNCg0K
_____________________________________________________________________
Soekris Engineering, technical discussion mailing list
[un]subscribe: http://lists.soekris.com/mailman/listinfo/soekris-tech
What's considered the best approach for dealing with something likethis? ("Complain to GNU Mailman maintainer", while doubtlesssatisfying to some, is not my thing. I suppose I'm growing old.)
The RFC states that "Any characters outside of the base64 alphabet areto be ignored in base64-encoded data". That approach leads to appendedgarbage. Stopping decoding as soon as an illegal character is seenworks in this case, but perhaps it might lead to truncation in othererror cases?
Comments? Advice?

--Arnt