nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] nmh architecture discussion: format engine character set

2015-08-10 12:29:55
Hi Ken,

So ... what would that mean, exactly?  Ignore the locale setting
and always output UTF-8?

Well, yes, the code would be writing UTF-8, with the knowledge of
how many cells have been occupied, e.g. one for the combining `a⃞',
but it could complain about the non-UTF-8 locale setting, or try and
set up `fire and forget' converter on open and opening files if it
was easy enough to be worth the bother.

Help me out here, because I'm trying to translate your concepts into
actual code and I'm having some problems seeing how it would work.

Geez, how much hand-waving do you want a guy to do?  :-)

Assuming we don't bring in a library like ICU,

GNU's libunistring might be an alternative to ICU.
http://www.gnu.org/software/libunistring/

it's difficult for us to reliably determine the width of a Unicode
character.  Specifically:

- The POSIX standard functions for this, wcwidth() and wcswidth(), work
  on the current locale, which is not guaranteed to support UTF-8 (or
  even support 8-bit characters).

Agreed, POSIX is useless in this area.

- The xlocale functions which allow one to specify a specific a locale
  to functions like wcwidth() are not part of POSIX.

No.

- Even if we used xlocale (or just overrode the global locale in every
  nmh program) it turns out there's not a reliable UTF-8 compatible
  default we can use; we ran into this in the test suite, some people
  just don't install all of the locales, so we can't assume en_US.UTF-8
  (or en_GB.UTF-8, or whatever).

That wouldn't matter if we stopped on a non-UTF-8 locale?

I'm unclear how you wnated to use the iconv utility; is the idea just
output everything in UTF-8 and run iconv as a filter for all text
output?

Yes, as a last-ditch attempt if we carry on.

I think that might have unintended consequences, but putting
that aside there are other issues.  For one, iconv can't do character
substitution on conversion failure (at least the POSIX iconv cannot; I
am aware that GNU iconv can).  Even if it can, I am unsure we can
maintain the correct column position when dealing with things like
combining characters.

Yes, either iconv isn't bothered with, because it's too awkward and the
results are ropey, or it is used because it's good enough most of the
time for the small minority that want it.

But hey, if I'm wrong I'd be glad to hear about it.  I think it's a
much tougher problem than people realize.

I'm sure it is.

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>