nmh-workers
[Top] [All Lists]

[Nmh-workers] Only outputting "valid" characters

2014-07-09 19:05:44
We've got a long-standing bug report here:

    https://savannah.nongnu.org/bugs/?36056

It's hard to solve this easily, since we are now actually handling 8-bit
characters, but other things have recently occured that make me revisit it.

Mikhail brought up a badly-formatted message here:

    http://lists.nongnu.org/archive/html/nmh-workers/2014-06/msg00166.html

which was the fault of someone else, but it made me think of a number of
issues.  First off, it's really impossible for us to deal with this perfectly;
trying to guess that something is UTF-8 and handle it appropriately would
just lead to madness.

It seems like there are two core issues:

- Handle the case of invalid character set conversion properly (basically,
  substitute a character when you come across something that cannot be
  converted).  I assume this is non-controversial.  It's also pretty
  straightforward to implement.

- Don't output characters that are not valid in the target character set.
  Now, some people suggest that we assume that 8-bit characters should be
  in a particular configurable character set.  I'm not a fan of that solution,
  as a) it's inevitably going to be wrong some of the time, and b) because
  of a) you still need to deal with not outputting invalid characters.
  Fixing this would also solve the problem mentioned in the first bug report.

  The problem is here is that I'm not sure _how_ to solve this problem.
  I am unsure if there is a standards-base API that lets us detect invalid
  characters (I'm not interested in something like Recode for this).  I
  wonder if mbtowc() and friends would throw errors if they encounter an
  invalid character in the current locale.  More investigation is needed.

Thoughts?

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>