Hello!
I need to convert a string into an identifier.
Earlier I was using the following function:
<!--
Creates an normalized name for a specified name components.
$component - name components to generate normalized name for.
$default-name - a default name in case a name cannot be built.
Returns a normalized name (upper case first).
-->
<xsl:function name="t:create-name" as="xs:string?">
<xsl:param name="components" as="xs:string*"/>
<xsl:param name="default-name" as="xs:string?"/>
<xsl:variable name="parts" as="xs:string*">
<xsl:for-each select="$components">
<xsl:analyze-string
regex="(\p{{L}}|\d)+"
flags="imx"
select=".">
<xsl:matching-substring>
<xsl:sequence select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:for-each>
</xsl:variable>
<xsl:choose>
<xsl:when test="empty($parts)">
<xsl:sequence select="$default-name"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="
string-join
(
(
for
$i in 1 to count($parts),
$part in $parts[$i]
return
if
(
($i = 1) and
(
for $c in substring($part, 1, 1) return
($c ge '0') and ($c le '9')
)
)
then
(
($default-name, 'name')[1],
upper-case($part)
)
else
(
upper-case($part)
)
),
'-'
)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Now, I have to build a name with only containing [A-Za-z0-9] only.
My problem is that I often see characters with modifiers like
00E0 à LATIN SMALL LETTER A WITH GRAVE
00E1 á LATIN SMALL LETTER A WITH ACUTE
00E2 â LATIN SMALL LETTER A WITH CIRCUMFLEX
00E3 ã LATIN SMALL LETTER A WITH TILDE
00E4 ä LATIN SMALL LETTER A WITH DIAERESIS
...
My questions:
is it acceptable, from the perspective of a western language, to replace
those characters with a character without modifier;
is there a way to do this in xslt;
any better option?
Thanks
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--