The most important thing is to get "surname, forename" so that I can
more easily query and transform that later (into more finely parsed
bibliographic records, for example). While there will be
exceptions to
this rule, I'm content enough to just say:
- a name is all caps
- within a name the last name is the surname
- anything before that are the forenames
- multiple names are delimited by either ", " or " and "
There are two approaches to this: do it all with regex analysis; or tokenize
it first into words, and then use for-each-group to group the words.
Titlecasing would be nice (though I note there's no such function in
XSLT 2.0).
Titlecasing is very sensitive to local rules. Rules that work for English
wouldn't work for German. In fact, rules that work for American English
wouldn't work for British English - in Britain, it would be unthinkable to
write "In" or "Is" in a headline, but I'm sure I've seen US newspapers that
do it, and certainly Microsoft Word (even the UK edition) does, though the
grammar checker then flags the result as being incorrect.
Michael Kay