nmh-workers
[Top] [All Lists]

[Nmh-workers] Anyone know of an UTF-8-compatible text formatter?

2012-04-05 13:41:31
As I've said in previous messages, I've been working on the "replyfilter"
Perl script to improve the functionality of replying to MIME messages.  So
far I am pretty happy with the results (check out the latest version if
you're interested, it's in $(srcdir)/docs/contrib/replyfilter), but I
have run into one annoying wrinkle.

Right now the script uses "par" to format long text in the reply
message.  But I have discovered that in some cases par mangles the
output when dealing with UTF-8.  Specifically, if the to-be-quoted
text contains a non-breaking space (U+00A0) that is encoded in UTF-8
as 0xc2 0xa0, and I guess that par sees the 0xa0 as a space and
replaces it with a 0x20, which results in an invalid UTF-8 sequence.
So far that's the only problem I've run into; other UTF-8 sequences work
fine.

My simple solution is to simply replace any occurences of U+00A0 with
a space, and that seems to solve the problem.  But I am thinking that
it is only a matter of time before I run into other UTF-8 that par handles
poorly.  I was wondering if anyone knows of any par-like utilities that
are UTF-8 aware?

Before people mention it ... yes, I am aware that there is a i18n patch
for par.  I tried that, but it did not help (a brief look at it leads me
to think that the core problem is that even with that patch par is calling
isspace(), where it should be calling iswspace()).

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>