David,
Thank you for the excellent reply. See my comments below.
On May 29, 2006, at 11:53 PM, David Carlisle wrote:
Currently my sanitizing function just escapes <, >, ', and " in the
If you are taking in a string and want to ensure that it is encoded in
XML as itself (in character data) rather than markup then you need
to escape < and & (and > if it follows ]]) you don't need to escape "
or
' unless you are putting the string in attribute values.
Excellent clarification. In some instances users are allowed to insert
XHTML and for those instances I'm running standard HTML input
sanitization routines (encoding potentially dangerous elements as
entities, for example).
Are these characters recognized by the XSLT engine
if they are hex or unicode encoded?
All XML text is unicode encodes in one way or another, so it's not
quite
clear what you mean there. Encoding issues are resolved by the XML
parser before XSLT really sees the input. If you are taking unknown
text
you should be escaping & as & so then a character ref such as &#a0;
would be escaped tp &#a0;.
It's not clear what I mean because the whole unicode/utf-n is unclear
to me, in spite of how much I read about it, but I understand what
you're saying and you seem to have understood where I'm coming from.
The bottom line is I want to avoid the kinds of attacks that are common
in URLs, where the less-than and greater-than symbols of a SCRIPT
element can be URL encoded and in some browsers/servers, go undetected.
but I was wondering if anyone knows of other vectors by which
attackers can enter
attacks are as likely to come from what is inserted into XML character
data as from any XML markup that is inserted. Specifically if the
stylesheets are generating html then if there is a danger of script
being inserted you need to quote (or disable) possible script syntax.
Yes. These situations are handled with standard HTML sanitizing
routines prior to insertion, but it did make me wonder what other doors
I might leaving open by providing users with completely valid XHTML on
the output. This article, in particular, opened my eyes to what is
possible with JavaScript. Now that more and more browsers are shipping
with XSLT processors built in (or could ship that way), it opens the
door for client-side processing with somewhat unpredictable results,
doesn't it?
http://www.webappsec.org/projects/articles/071105.shtml
Thanks again for your concise reply!
Ted Stresen-Reuter
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--