xsl-list
[Top] [All Lists]

Re: [xsl] Unicode and child element

2008-08-29 12:15:36
On Fri, Aug 29 2008 16:01:59 +0100, davidc(_at_)nag(_dot_)co(_dot_)uk wrote:
So presumably it's okay to use them between consenting pieces of
software.

yes but 

either that's true, in which case it is OK for Ken to use them in
XSLT, except that it means his assumption that they never occur in XML
input is not valid, as other people may consent to use these characters
as well.

I really meant your own software.  Other people shouldn't send you
anything with those code points.

You are allowed to strip them if you're not expecting to see them.  Your
average XML software isn't going to do that automagically, so you'd have
to set up some sort of filter if you expected that you might see some.

You can then put in other instances of a noncharacter (you'd think that
Unicode had enough hyphens that they could spare one for this word) in
your internal processing, but they should come out before you
interchange the XML with anybody else.

or it is not true, in which case Ken's assumption that they do not
appear in input documents is valid, but it means that he can't use them
in XSLT either.

You could use them in your own XSLT, but you shouldn't be wanting to
interchange them with just anybody.

For you to put them in, say, an RSS feed would not be so fine.

So either way I don't think they should be used (even thought they work)
and using private use characters is safer (especially if you wander up
into the higher planes were there will be less legacy usage)

Using either is fraught with difficulty and requires all involved to
know what the code points mean.

If receiving a particular character in your input is going to break your
processing -- irrespective of whether it's a private use character or
noncharacter -- then you'd take precautions in proportion to how badly
it would affect you.  If it would cripple your multi-million dollar
business, then you'd take different precautions than if it was a one-off
XML file for a one-off stylesheet that you were just playing with.

For standards, the story is a bit different.  According to the Character
Model for the World Wide web [1]:

 C070 [S] Specifications SHOULD NOT arbitrarily exclude code points from
 the full range of Unicode code points from U+0000 to U+10FFFF
 inclusive.

 C079 [S] Specifications SHOULD NOT allow the use of codepoints reserved
 by Unicode for internal use.

 C038 [S] Specifications MUST NOT require the use of private use area
 characters with particular assignments.

 C039 [S] Specifications MUST NOT require the use of mechanisms for
 defining agreements of private use code points.

 C040 [S] [I] Specifications and implementations SHOULD NOT disallow the
 use of private use code points by private agreement.

C079 could be an argument for never using noncharacters, but what about
for internal use?

And for your content:

 C073 [C] Publicly interchanged content SHOULD NOT use codepoints in the
 private use area.

So you can send me a stylesheet with private use characters in it, but
you shouldn't put the same stylesheet on your public web site.

Regards,


Tony Graham                         
Tony(_dot_)Graham(_at_)MenteithConsulting(_dot_)com
Director                                  W3C XSL FO SG Invited Expert
Menteith Consulting Ltd
XML, XSL and XSLT consulting, programming and training
Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
Registered in Ireland - No. 428599   http://www.menteithconsulting.com
  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
xmlroff XSL Formatter                               http://xmlroff.org
xslide Emacs mode                  http://www.menteith.com/wiki/xslide
Unicode: A Primer                               urn:isbn:0-7645-4625-2

[1] http://www.w3.org/TR/2005/REC-charmod-20050215

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--