xsl-list
[Top] [All Lists]

Re: [xsl] Katakana substitution regex

2010-08-06 15:57:23
On 8/6/2010 3:14 PM, Hoskins & Gretton wrote:
HI, I have to convert some Katakana strings from "original" to "new"
by adding ー (#x30fc;) a pronunciation character (see
http://www.fileformat.info/info/unicode/char/30fc/index.htm).
In Japanese, there aren't any word boundaries, so essentially all of
my search strings are substrings of the text of the current element.
When substring "a" is followed by the character ー I do not want
to make the replacement.

example:        ブラウザ is a search string
but it is followed by ー already -- do nothing

When substring "a" is not followed by the character ー I want to
make the replacement to create "a" followed by ー.

example:        ブラウザ is a search string
but it is not followed by #x30fc; already
                add to the end to make it
                ブラウザー

If I was going to just add the ー, I was able to do that with a
regex that contained the strings that I wanted to find by using regex
and analyze-string, where $regexSearch contains all of my search
Katakana strings:

                <xsl:analyze-string select="." regex="({$regexSearch})">
                    <xsl:matching-substring>
                        <xsl:value-of select="regex-group(1)"/>
                        <xsl:text>&#12540;</xsl:text>
                    </xsl:matching-substring>
                    <xsl:non-matching-substring>
                        <xsl:value-of select="."/>
                    </xsl:non-matching-substring>
                </xsl:analyze-string>
However,I can't figure out how I should fit this in to an overall
xslt, where I need to check check ahead in the element text before I
decide to make the substitution. Currently, if there is a
string:                &#12502;&#12521;&#12454;&#12470;&#12540;
it becomes:     &#12502;&#12521;&#12454;&#12470;&#12540;&#12540;
(doubling the last character).

If someone has some experience with this type of search and replace
problem, I would appreciate some guidance.
Regards, Dorothy

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



How about
   select="replace(., '&#12470;([^&#12540;])', '&#12470;&#12540;$1')"
?

And if that fails to catch &#12470; when it occurs at the end of a text
node, wrap the result in
    replace(., '&#12470;$', '&#12470;&#12540;')

HTH,
Lars



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>