I suppose that there can't be a sequence of two or more ー
characters. If so, I'd just go ahead and replace all substrings with
the substring + #12540 and then, in a second call, replace all
#12540#12540 by #12540.
Sometimes it is simpler not to try to avoid to do something that can
be easily undone.
Below is the stylesheet. Substrings are sorted by descending length -
I don't know whether there are substrings similar to 'abcd' and 'bc',
where the suffix must be appended to 'abcd' but not to the 'bc'
within.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:wl="w.l">
<xsl:function name="wl:make-pattern" as="xs:string">
<xsl:param name="reps" as="xs:string*"/>
<xsl:variable name="sorted" as="xs:string*">
<xsl:perform-sort select="$reps" >
<xsl:sort select="string-length(.)" order="descending"/>
</xsl:perform-sort>
</xsl:variable>
<xsl:sequence select="concat('(',string-join($sorted,'|'),')')"/>
</xsl:function>
<xsl:function name="wl:rep-subs" as="xs:string">
<xsl:param name="text" as="xs:string"/>
<xsl:param name="pattern" as="xs:string"/>
<xsl:sequence select="replace(replace($text, $pattern,
'$1ー'), 'ーー', 'ー')"/>
</xsl:function>
<xsl:variable name="pattern"
select="wl:make-pattern(('ab', 'abcd', 'cd', 'bc'))"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text">
<xsl:copy>
<xsl:value-of select="wl:rep-subs(text(),$pattern)"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
On 6 August 2010 22:14, Hoskins & Gretton
<hoskgret(_at_)rochester(_dot_)rr(_dot_)com> wrote:
HI, I have to convert some Katakana strings from "original" to "new" by
adding ー (#x30fc;) a pronunciation character (see
http://www.fileformat.info/info/unicode/char/30fc/index.htm).
In Japanese, there aren't any word boundaries, so essentially all of my
search strings are substrings of the text of the current element.
When substring "a" is followed by the character ー I do not want to
make the replacement.
example: ブラウザ is a search string but it is
followed by ー already -- do nothing
When substring "a" is not followed by the character ー I want to make
the replacement to create "a" followed by ー.
example: ブラウザ is a search string but it is
not followed by #x30fc; already
add to the end to make it
ブラウザー
If I was going to just add the ー, I was able to do that with a regex
that contained the strings that I wanted to find by using regex and
analyze-string, where $regexSearch contains all of my search Katakana strings:
<xsl:analyze-string select="." regex="({$regexSearch})">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
<xsl:text>ー</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
However,I can't figure out how I should fit this in to an overall xslt, where
I need to check check ahead in the element text before I decide to make the
substitution. Currently, if there is a string:
ブラウザー
it becomes: ブラウザーー (doubling
the last character).
If someone has some experience with this type of search and replace problem,
I would appreciate some guidance.
Regards, Dorothy
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--