xsl-list
[Top] [All Lists]

Re: [xsl] Inverting names with Jr and Sr considered

2012-11-06 03:58:11
Hopefully you won't have names like "Augustus De Morgan", which should
not be transformed to "Morgan, Augustus De".

And I think this is the time and the place to quote this article once more:
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

-W

On 06/11/2012, Mark <mark(_at_)knihtisk(_dot_)org> wrote:
I agree, my specification is likely not complete. However, my input is a
single document written by one person indexing a single journal. There is a

great deal of consistency to the data and I doubt that there are as many as

1000 names. That said:

I received an answer off the list (thus do not feel authorized to post it
here) that will help me discover what oddities I have not covered. It
explained the regex expressions it used so that perhaps if modification is
required, I may be able to do it.

Thanks for your time, Michael; as always this list provides the most
consistent and practical advice around, something you all can be proud of.

Mark

-----Original Message-----
From: Michael Kay
Sent: Tuesday, November 06, 2012 2:10 AM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Inverting names with Jr and Sr considered

I wouldn't even attempt to write any code based on this as the
specification. For this to work at all well, you're going to need to
iteratively adapt the solution to handle all the names in your dataset,
or at least a sample of a couple of thousand of them. There's just too
much variation in the names you might encounter. Are "Jr" and "Sr"
really the only suffixes, and are they always spelt this way, or do you
also get "III" and "Jnr" and "Jnr."?

If I'm wrong, and the names are all regular and in the pattern you
describe, then I think you can just tokenize on whitespace and do
something like

suffix := $tokens[last()][. = ('Jr', 'Sr')]
stem := if ($suffix) then remove($tokens, count($tokens)) else $tokens
value-of select="concat($stem[last()], ',']), remove($stem,
count($stem), if ($suffix) then concat('(', $suffix, ')') else '')"

Michael Kay
Saxonica

On 05/11/2012 23:45, Mark wrote:
This must have been done many times, so can some one show me where to find

the answer?

I have a series of personal names in natural order that I need to invert.

The surname is always last except when followed by ‘Jr’, or ‘Sr’ (either
of which may not be present). I want to represent:

J Allen Rogers –> Rogers, J Allen
Bill T Wilson Jr –> Wilson, Bill T (Jr)
A B Brown –> Brown, A B
John Victor Case Sr –> Case, John Victor (Sr)

and so on. There may be a single space or multiple spaces between some the

elements of the name.

It looks like <xsl:analyze-string> will do this, but I do not know how to

write regex.

Thanks,
Mark


--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--