xsl-list
[Top] [All Lists]

Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2

2016-04-29 14:44:27
Don't kick yourself too hard: the regex attribute is not a string value, it is a manifest value. Just take out the surrounding single quotes.

I hope this helps.

. . . . . . Ken

At 2016-04-29 18:38 +0000, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
I have my generated analyze-text approach working generally. However, some
of my regular expressions are not matching when I would expect them to.

For example, given this @regex value:


regex="'([©®℠™]+)|([¦²³¹¼&
#xbd;¾Ð×ÝÞðýþŠš∂
∏∑−∫≠≤≥]+)|([➤]+)'"
>

And this text:

"©®"

The regular expression does not match, even though the first group clearly
matches on \uA9 and \uAE.


However, this text:

"ÝÞ"

does match (second group).

If I copy the entire regex or any group from the @regex value and try it
in Oxygen against the same text I get the expected matches.

Have I made a stupid syntax mistake in my regular expression? Is there
some subtlety to matching groups that makes XSLT different from what
Oxygen is doing? I can't see any obvious syntax error in the regular
expression.

Thanks,

Eliot


----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 4/29/16, 11:54 AM, "Eliot Kimber ekimber(_at_)contrext(_dot_)com"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

>Dimitre,
>
>I see how that can work.
>
>Cheers,
>
>E.
>----
>Eliot Kimber, Owner
>Contrext, LLC
>http://contrext.com
>
>
>
>
>On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev(_at_)gmail(_dot_)com"
><xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>
>>I am at work and don't have the time for a complete/tested
>>implementation, but one can use the function string-to-codepoints()
>>and then perform on the result:
>>
>><xsl:for-each-group select="$theCodepoints"
>>group-adjacent=f:codepointToRange(.)>
>>
>> . . . . . . . .
>></xsl:for-each-group>
>>
>>Cheers,
>>Dimitre
>>
>>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber(_at_)contrext(_dot_)com
>><xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
>>> Using XSLT 2, I have a requirement to take text and group contiguous
>>> sequences of characters in markup according to a given character range
>>>the
>>> characters are in. This is to support the application of range-specific
>>> fonts to text in HTML.
>>>
>>> I have a static definition of the character ranges for a given national
>>> language and there shouldn't be any overlap between ranges. Given this
>>> static definition, I'm generating XSLT code to operate on text nodes in
>>> order to apply the range markup. The
>>>
>>> For example, given the text string "abcdefg" where range "R1" is "cde"
>>>and
>>> R2 is "g", the marked up result should be: abc<span
>>> class="R1">cde</span>f<span class="R2">g</span>
>>>
>>> My initial approach is to generate a template that takes the current
>>> language and the text node and then applies templates in a
>>> language-specific mode.
>>>
>>> For each language I'm then generating a template to do the range
>>>matching.
>>>
>>> My question, once I'm in a language-specific template for a text node,
>>> what is the most efficient and/or easiest to code way to map the string
>>>to
>>> ranges? Since I'm generating the code it doesn't have to be concise.
>>>
>>> I'm thinking along the lines of using analyze-string to match on any of
>>> the groups and then within the matching-substring clause have a choice
>>> group to determine which range actually matched. But it feels like I'm
>>> missing a more elegant way to determine the actual range.
>>>
>>> Or maybe there's a clearer/simpler/more efficient way using tail
>>>recursion?
>>>
>>> Thanks,
>>>
>>> Eliot
>>> ----
>>> Eliot Kimber, Owner
>>> Contrext, LLC
>>> http://contrext.com
>>>
>>>
>>
>>
>>
>>--
>>Cheers,
>>Dimitre Novatchev
>>---------------------------------------
>>Truly great madness cannot be achieved without significant intelligence.
>>---------------------------------------
>>To invent, you need a good imagination and a pile of junk
>>-------------------------------------
>>Never fight an inanimate object
>>-------------------------------------
>>To avoid situations in which you might make mistakes may be the
>>biggest mistake of all
>>------------------------------------
>>Quality means doing it right when no one is looking.
>>-------------------------------------
>>You've achieved success in your field when you don't know whether what
>>you're doing is work or play
>>-------------------------------------
>>To achieve the impossible dream, try going to sleep.
>>-------------------------------------
>>Facts do not cease to exist because they are ignored.
>>-------------------------------------
>>Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>>write all patents, too? :)
>>-------------------------------------
>>Sanity is madness put to good use.
>>-------------------------------------
>>I finally figured out the only reason to be alive is to enjoy it.
>>
>>
>
>



--
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ 
mailto:gkholman(_at_)CraneSoftwrights(_dot_)com |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>