nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

2014-06-18 08:21:26
That's not universally true anymore.  Some newer filesystems are
mandating that filenames are UTF-8 and enforcing normalization rules
(MacOS X and Solaris are two notable examples).

Thanks, I didn't know.  Haven't used Solaris in years, and never bought
Apple.

Let me amend this a bit; as I understand it, you have to enable that
behavior on Solaris.  It's the default behavior on MacOS X.

Solaris is better; the original bytes are preserved, but lookup is
done using normalized names so you can't have two filenames with the
same characters.

What about globbing, especially on Mac OS X?  Given your two examples on
Linux with bash,
[...]

So, clearly we need some userspace support.  AFAIK, the globbing isn't
Unicode-aware; it's just matching on whatever readdir() returns.  Should
a ? match on a byte?  A Unicode codepoint?  An abstract character?  I am
not sure, and I am not sure if anyone has decided on this from a standards
point of view.

Do you think NFKC would be better, so ? often matches what appears as a
single rune and fi matches ligature fi?

Hm.  I believe some network filesystems use NFKC, but I am neutral on
what should be done.  Should fi match fi?  I cannot decide; I see
arguments for both.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>