Re: [xsl] The Oxford Comma - A Gift Worth Atleast 5 Cents
2008-06-20 19:37:54
Hi Wendell,
I agree and thank you for your input and time.
Cheers,
ac
Wendell Piez a wrote :
AC,
As it relates to XSL, the topic of natural language processing is in
scope. In the previous messages in this thread, it appeared to me that
the discussion had quickly and decisively moved away from XSLT towards
more philosophical questions -- which themselves weren't actually
being engaged with much of the sort of discipline that might warrant
the hope that they would come back to XSLT. (I admit we digress
sometimes, but we really shouldn't, and mostly when we do it's with at
least an eye on why we're here.)
At 01:43 PM 6/20/2008, you wrote:
Thank you for ensuring that we keep in scope. I instigated this with
an almost caricaturistic remark on human languages, at the end of my
initial post. What I was thinking, that was not that clear then, was
that we have to process human languages a lot. I did not check, but
it is possible that most of processing and issues that come through
this list are language processing related and it does not seem like
it is going to get any better soon. There are also many documents,
in many languages, and we need to process them, all of the time, more
and more.
I think it's necessary to make a distinction between processing
natural language inputs and generating natural language.
For example,
A.
<item>Cisco IP phone end user training</item>
<item>Cisco attendant console operator training</item>
<item>Cisco call center agent training</item>
B.
"Cisco IP phone end user training, Cisco attendant console operator
training, and Cisco call center agent training"
It makes a big difference whether A is to be transformed to B, or B is
to be transformed to A.
B to A is indeed the concern of an entire subdiscipline of
computational linguistics.
A to B is tractable in XSLT, and remains so (although it becomes more
complex) when working with arbitrary sets of items. But depending on
the requirements, one might have to adjust not only for different
numbers of items, but for anomalous inputs of various kinds. (For
example, one might want to defend against duplicate items in the set.
Or, if one avoids the Oxford comma in generating English, does one
restore it when given a final or penultimate item containing the word
"and"?)
XSLT2 is possibly our best tool so far.
For A to B, probably (though I note that your rewrite might as well be
XQuery). For B to A, probably not outside simple cases (though without
looking into it, I couldn't actually tell you what the
state-of-the-art NLP parsers are using). Someone could be ready to
prove me wrong, which would be very cool.
On this this track, we are primarily looking at processing the
Oxford comma, in English, with XSLT. As here, by law, we have to
support and process at least both English and French, back and forth
too, on input and output, and the rules are different for each case,
I may be a bit sensitive on the subject. I am sure that Ronnie took
some serious time to resolve the stylesheet like he did and I only
tried to optimize it further, for English in XSLT2. I am not sure if
I succeeded or better, how we can further improve on this case, but I
am sure that I spent some time on it and I am frightened by all that
is left do, wandering how the members of this list cope (ex: EU),
with so many documents and languages.
Probably with libraries, but we have list members with better
information on this topic.
Do we have the tools that we need for the job at hand? Yet, do not
worry, I tame my fears and that is why I also try to optimize the
logic and processing for the Oxford comma, hoping that the solution
is good, but better, and especially if it is, I hope that others can
optimize it further so that we can some day, settle this logic and
move beyond the Oxford comma. What do you think?
In balance, I think that even if we etched the solution in stone, we'd
continue to get questions about it from time to time. There are some
questions that are far more common; just recently the thread recurred
on how to parse raw markup in XSLT. And even when questions repeat,
sometimes answers change.
But fortunately for everyone, this isn't up to me. If the task of
publishing a general solution is worth the overhead of doing it, you
or someone else can put it on the web, and maybe in time (maybe soon)
it will get as many hits as, say, Jeni Tennison's page explaining
Muenchian grouping. There is actually a FAQ where Dave Pawson has
included many such nuggets.
OT on the general question: sometimes structured data maps into
natural language quite readily. More often, not. But that's what prose
is for, and prose plus typography when prose alone doesn't suffice.
(Or, if spoken language is what we need, we can add a whiteboard and
hand gestures. Or promise to send email.) The number of ways natural
language can be refactored for greater clarity or elegance of
expression can't be counted, but there isn't a machine that can do any
of it. Writing a good prose paragraph requires judgement, which is
more than any algorithm. It is harder than chess.
Cheers,
Wendell
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
|
|