if not for file names?
The Unix kernel stores filenames as a run of bytes, not including `/'
and NUL.
That's not universally true anymore. Some newer filesystems are mandating
that filenames are UTF-8 and enforcing normalization rules (MacOS X and
Solaris are two notable examples). Obviously some charset conversion is
happening for non-UTF-8 locales. I think that's inevitable, given the
issues with composed and decomposed characters.
For example, let's say you see this:
% ls
Résumé.txt Résumé.txt
How can that be? Well, they aren't the same sequence of bytes. In the
first one the “é” is U+00E9. In the second, it's U+0065 U+0301 (a regular
“e” followed by a combining accent character). The only way of resolving
this is to use the normalization rules for Unicode and do filename
searching that way; MacOS X actually rewrites all of the filenames
using Normalization Form D (all characters in decomposed form, which
means the regular character followed by the combining accents) and I think
that sucks, but they didn't ask me. Solaris is better; the original bytes
are preserved, but lookup is done using normalized names so you can't
have two filenames with the same characters.
--Ken
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers