xsl-list
[Top] [All Lists]

Re: find capital letters in string and split it

2003-02-10 05:44:18
---- "bryan" <bry(_at_)itnisk(_dot_)com> wrote:

In Rdf/Xml it's often the habit to camel-case strings in IDs and 
such. 

Let's suppose I want to split the string at the upper case letters, 
the easiest way I can see to do that (the only way that pops into my 
mind) is to parse the string twice, using translate() and replacing 
upper-case letters with a string sequence not very likely to occur 
normally, and then reparse the string splitting it at these 
occurrences. This is of course resource intensive and not foolproof. 
Anybody have any thoughts on how to do this?

Hi Bryan,

It seems to me that you want to preserve the capital letters? If *not*
so, then the following is a most straightforword solution using the
"str-split-to-words" template of FXSL:

This transformation:
-------------------
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";


   <xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
        testSplitToWords4.xml               
-->

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:variable name="vCaps" 
    select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
    
    <xsl:template match="/">
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="/*"/>
        <xsl:with-param name="pDelimiters" 
                        select="$vCaps"/>
      </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

when applied against this source.xml:

<t>thisIsACamelCasedWord</t>

Produces:

<word>this</word>
<word>s</word>
<word>amel</word>
<word>ased</word>
<word>ord</word>


In case you need to preserve the capital letters, the solution is
slightly different. One first pass is made on the string, which inserts
a space in front of every capital letter. The newly produced string is
then tokenised. In the first pass I also use the "str-map" template
from FXSL.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:myMark="f:MarkAnUppercase" 
 exclude-result-prefixes="myMark"


   <xsl:import href="str-map.xsl"/>
   <xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
        testSplitToWords4.xml               
-->

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:variable name="vCaps" 
    select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
    
    <myMark:myMark/>
    <xsl:template match="myMark:*">
      <xsl:param name="arg1"/>
      
      <xsl:if test="contains($vCaps, $arg1)">
        <xsl:text> </xsl:text>
      </xsl:if>
      <xsl:value-of select="$arg1"/>
    </xsl:template>
    
    <xsl:template match="/">
    
      <xsl:variable name="vSpaceDelimited">
        <xsl:call-template name="str-map">
          <xsl:with-param name="pFun" 
            select="document('')/*/myMark:*[1]"/>
          <xsl:with-param name="pStr" select="/*"/>
        </xsl:call-template>
      </xsl:variable>
      
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="$vSpaceDelimited"/>
        <xsl:with-param name="pDelimiters" 
                        select="' '"/>
      </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

when applied against the same source.xml produces:

<word>this</word>
<word>Is</word>
<word>A</word>
<word>Camel</word>
<word>Cased</word>
<word>Word</word>


Hope this helped.






=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>