xsl-list
[Top] [All Lists]

[xsl] Re: replace() and translate() second try

2011-06-04 12:36:15
Thanks to Michael Kay and David Carlisle for their responses.

In an example like
        translate(string, "wxyz", "ABCD")

where w, x, y and z are supplementary characters, I find (using saxonhe9-3)
that it works if the supplementary characters are indicated with the hex-escape
&#xHHHHHHHH; notation, but _not_ if the supplementary characters are simply 
typed in
using a Unicode-savvy text editor that handles supplementary characters.  In 
case
the hex-escape sequence I just typed got garbled by email filters, it consists 
of
an ampersand, a pound/hash sign, an 'x', and a sequence of hex digits, 
terminated with a semicolon.
Thanks to David Carlisle for suggesting the hex-escape notation.

My original XML file (containing supplementary Unicode characters from
the Deseret Alphabet block) and my XSLT script are both in UTF-8 encoding.

So something like this works:

        translate(string, '𐐨𐐩𐐪𐐫' , 'ABCD')

In case things get garbled again by email filters, the second argument to 
translate() contains (without the spaces shown here)
four supplementary characters indicated by the hex code point values:

                & #x 10428 ;
                & #x 10429 ;
                & #x 1042A ;
                & #x 1042B ;

If, using a Unicode-savvy text editor, with UTF-8 encoding for the file, I 
simply type in the four supplementary characters
in the second string argument, this script does not work.  This is a shame 
because the script with the real characters
is far more readable (if you have a unicode editor that can display the 
supplementary character glyphs).

The same for the replace(string, 'orig', 'repl') function.  If the second 
argument contains supplementary characters, they need
to be indicated in the hex-escape notation or I get results that are at least 
inconsistent.

I have a little example that I would gladly forward to anyone who is interested.

Thanks,

Ken
                




----------------------------------------------------------------------
Date: Fri, 3 Jun 2011 00:00:47 -0600
To: xslt <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
From: Kenneth Reid Beesley <krbeesley(_at_)gmail(_dot_)com>
Subject: replace() and translate() second try
Message-Id: <ACE8B20D-B21E-4551-8852-4EEE0EF398A7(_at_)gmail(_dot_)com>

I see that my previous message got rather garbled.  Here's a simplified =
version of the question.
Assume we have an XSLT transform with something like

      translate(string, 'abcd', 'ABCD')

Obviously 'a' gets replaced with 'A', 'b' with 'B', etc.

Should this still work if the 'abcd' is replace by a string of 4  =
Unicode _supplementary_ characters?
That is, does translate() work with Characters (including supplementary =
characters) or just chars?

Thanks,

Ken

******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA

------------------------------

Date: Fri, 03 Jun 2011 08:05:08 +0100
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
From: Michael Kay <mike(_at_)saxonica(_dot_)com>
Subject: Re: [xsl] replace() and translate() second try
Message-ID: <4DE887A4(_dot_)8000905(_at_)saxonica(_dot_)com>

On 03/06/2011 07:00, Kenneth Reid Beesley wrote:
I see that my previous message got rather garbled.  Here's a simplified 
version of the question.
Assume we have an XSLT transform with something like

     translate(string, 'abcd', 'ABCD')

Obviously 'a' gets replaced with 'A', 'b' with 'B', etc.

Should this still work if the 'abcd' is replace by a string of 4  Unicode 
_supplementary_ characters?
That is, does translate() work with Characters (including supplementary 
characters) or just chars?


Yes, it should work correctly, and I have tests to show that it does, so 
please raise a bug report with a reproducible test case.

The replace() function should also work with all Unicode characters, 
though there may be question marks here about which version of Unicode 
the characters are defined in, especially if you are trying to match 
them against Unicode character categories such as \p{Ll}.

Michael Kay
Saxonica


------------------------------

Date: Fri, 03 Jun 2011 09:09:40 +0100
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
From: David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk>
Cc: Kenneth Reid Beesley <krbeesley(_at_)gmail(_dot_)com>
Subject: Re: [xsl] replace(), translate() and Unicode supplementary characters
Message-ID: <4DE896C4(_dot_)6060309(_at_)nag(_dot_)co(_dot_)uk>

On 03/06/2011 05:00, Kenneth Reid Beesley wrote:
Questions:  Are translate() and replace() supposed to work with Unicode 
supplementary characters?

yes

If so, what am I doing wrong?

hard to say as there is rather a large chance that the input you showed 
has been through several aggressive mail filters.

Chances are it's an encoding error and one way to avoid those is to code 
your stylesheet in ascii.

translate(
.,
'& #x10428& #x10429;& #x1042A& #x1042B;',
'& #x0069;& #x0065;& #x0251;& #x0254;'
)

without the spaces after the &
should work for the translation you cited.

David









--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>