On Fri, Aug 29 2008 16:01:59 +0100, davidc(_at_)nag(_dot_)co(_dot_)uk wrote:
So presumably it's okay to use them between consenting pieces of
software.
yes but
either that's true, in which case it is OK for Ken to use them in
XSLT, except that it means his assumption that they never occur in XML
input is not valid, as other people may consent to use these characters
as well.
I really meant your own software. Other people shouldn't send you
anything with those code points.
You are allowed to strip them if you're not expecting to see them. Your
average XML software isn't going to do that automagically, so you'd have
to set up some sort of filter if you expected that you might see some.
You can then put in other instances of a noncharacter (you'd think that
Unicode had enough hyphens that they could spare one for this word) in
your internal processing, but they should come out before you
interchange the XML with anybody else.
or it is not true, in which case Ken's assumption that they do not
appear in input documents is valid, but it means that he can't use them
in XSLT either.
You could use them in your own XSLT, but you shouldn't be wanting to
interchange them with just anybody.
For you to put them in, say, an RSS feed would not be so fine.
So either way I don't think they should be used (even thought they work)
and using private use characters is safer (especially if you wander up
into the higher planes were there will be less legacy usage)
Using either is fraught with difficulty and requires all involved to
know what the code points mean.
If receiving a particular character in your input is going to break your
processing -- irrespective of whether it's a private use character or
noncharacter -- then you'd take precautions in proportion to how badly
it would affect you. If it would cripple your multi-million dollar
business, then you'd take different precautions than if it was a one-off
XML file for a one-off stylesheet that you were just playing with.
For standards, the story is a bit different. According to the Character
Model for the World Wide web [1]:
C070 [S] Specifications SHOULD NOT arbitrarily exclude code points from
the full range of Unicode code points from U+0000 to U+10FFFF
inclusive.
C079 [S] Specifications SHOULD NOT allow the use of codepoints reserved
by Unicode for internal use.
C038 [S] Specifications MUST NOT require the use of private use area
characters with particular assignments.
C039 [S] Specifications MUST NOT require the use of mechanisms for
defining agreements of private use code points.
C040 [S] [I] Specifications and implementations SHOULD NOT disallow the
use of private use code points by private agreement.
C079 could be an argument for never using noncharacters, but what about
for internal use?
And for your content:
C073 [C] Publicly interchanged content SHOULD NOT use codepoints in the
private use area.
So you can send me a stylesheet with private use characters in it, but
you shouldn't put the same stylesheet on your public web site.
Regards,
Tony Graham
Tony(_dot_)Graham(_at_)MenteithConsulting(_dot_)com
Director W3C XSL FO SG Invited Expert
Menteith Consulting Ltd
XML, XSL and XSLT consulting, programming and training
Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
Registered in Ireland - No. 428599 http://www.menteithconsulting.com
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
xmlroff XSL Formatter http://xmlroff.org
xslide Emacs mode http://www.menteith.com/wiki/xslide
Unicode: A Primer urn:isbn:0-7645-4625-2
[1] http://www.w3.org/TR/2005/REC-charmod-20050215
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--