For posterity, I want to be sure that Barry's reply below gets into the
mailing list archive. - Mike
Mike -
That is generally the way these things proceed, but even that is still
intractable.
1: Some names/places my have multiple readings
2: The correct reading may not be known to the typist
3: Taking the reading from the IME input doesn't work either, because, I am
told, that many Japanese will type a shorter reading for the Kanji, maybe
even one that doesn't make sense, in order to save keystrokes. There is no
practical way to correct those, or even to know when you have them.
The best that can really be said is to set your client's expectations that
there may be errors because readings are not universally agreed on.
Oh, and one tractable matter that hasn't been mentioned here is that there
exist half-width and full-width copies of katakana in any Japanese capable
encoding. You can convert from one to the other, it doesn't matter, but you
can't mix them if you are relying on them as sort keys without complicating
things.
A great place to get solid info on all this is to go to Amazon.com and look
up Ken Lunde's most recent book on O'Reilley.
Best,
Barry