xsl-list
[Top] [All Lists]

[xsl] Generate identifier

2009-12-29 10:15:09
Hello!

I need to convert a string into an identifier.
Earlier I was using the following function:

  <!--
    Creates an normalized name for a specified name components.
      $component - name components to generate normalized name for.
      $default-name - a default name in case a name cannot be built.
      Returns a normalized name (upper case first).
  -->
  <xsl:function name="t:create-name" as="xs:string?">
    <xsl:param name="components" as="xs:string*"/>
    <xsl:param name="default-name" as="xs:string?"/>

    <xsl:variable name="parts" as="xs:string*">
      <xsl:for-each select="$components">
        <xsl:analyze-string
          regex="(\p{{L}}|\d)+"
          flags="imx"
          select=".">
          <xsl:matching-substring>
            <xsl:sequence select="."/>
          </xsl:matching-substring>
        </xsl:analyze-string>
      </xsl:for-each>
    </xsl:variable>

    <xsl:choose>
      <xsl:when test="empty($parts)">
        <xsl:sequence select="$default-name"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="
          string-join
          (
            (
              for
                $i in 1 to count($parts),
                $part in $parts[$i]
              return
                if
                (
                  ($i = 1) and
                  (
                    for $c in substring($part, 1, 1) return
                      ($c ge '0') and ($c le '9')
                   )
                )
                then
                  (
                    ($default-name, 'name')[1],
                    upper-case($part)
                  )
                else
                  (
                    upper-case($part)
                  )
            ),
            '-'
          )"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

Now, I have to build a name with only containing [A-Za-z0-9] only.
My problem is that I often see characters with modifiers like
00E0 à LATIN SMALL LETTER A WITH GRAVE
00E1 á LATIN SMALL LETTER A WITH ACUTE
00E2 â LATIN SMALL LETTER A WITH CIRCUMFLEX
00E3 ã LATIN SMALL LETTER A WITH TILDE
00E4 ä LATIN SMALL LETTER A WITH DIAERESIS
...

My questions:
  is it acceptable, from the perspective of a western language, to replace 
those characters with a character without modifier;
  is there a way to do this in xslt;
  any better option?

Thanks
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>