xsl-list
[Top] [All Lists]

Re: [xsl] creating tags around a string

2006-04-18 08:53:30
On 4/18/06, TGolshan(_at_)computer(_dot_)org <TGolshan(_at_)computer(_dot_)org> 
wrote:
Wendell,

Thanks for the insight. Perhaps I need to explain myself a little more.

I'd recommend paying attention to Wendell.  He addressed at least one
of your problems.  You need to think about generating elements, not
"tags".  The code is a bit clearer when you do:

<fname><xsl:value-of select="." /></fname>

instead of

       <xsl:text>
&lt;fname&gt;</xsl:text>
                                                               <
xsl:value-of select="."/>
                                                       <xsl:text>
&lt;/fname&gt;</xsl:text>



I am taking an InDesign inx file and trying to build some structure (ie an
XML document) that I can then use later. I am working with an army of
editors who will not style first or last name in InDesign. They will
however style every name as author, so my inx file looks like this:

<AUTHOR>Al Stick, Tom She, Dick Burg, and Harry Ward</AUTHOR>

and I want to add <fname> and <lname> elements to the mix.

What is the best way to do this? I wrote the below function but realize
that this is difficult at best.

The reason you're not necessarily getting a ton of help on your
question is that it's a lot deeper and more complex than any simple
trick with XSLT.  This mailing list is concerned with XSLT, while your
problem is more a fundamental problem with markup systems and
publishing.  My gut feeling would be to do these things:

1) I know you say it's impossible to get the editors to format it
correctly, but this is really the easiest place to fix the problem. 
Do it at the source.  Perhaps treating anything without the proper
markup as an error and send it to them to fix.  This sounds like an
organization problem, not a technical problem.

2) Barring that, have them enter input to be normalized so that they
at least always enter first name initial last name.  You really need
at least something, since names tend to be ambiguous by nature and
vary by background.  Anne Marie Scott might consider her name "Anne
Marie".  What about someone who might go by a longer name, say Michael
Tyler Ryan Smith?

3) Use a language good at processing raw text.  XSLT is not a very
good match for this in my opinion, but I will be the first to admit my
XSLT 2.0 skills are weak.  Perl of course springs to mind.

4) Do some research into heuristics for identifying text.

I think the biggest thing right now is to try to narrow the focus the
problem.  Saying you need something that can automatically identify
the first and last parts of a name and mark them up is tricky
business.  I suppose if you want to keep it basic and error prone, do
something similar to what you were doing but modify it a bit:

1) strip out ands.  I don't think your previous code did that
2) split on the whitespace but grab the first and last tokens.  Assume
these to be the first and last name.

Are you sure names are always entered Tom Smith, Henry Foo and no one
ever does Smith, Tom; Foo, Henry?

Jon Goman

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--