xsl-list
[Top] [All Lists]

Re: [xsl] using xsl:message with UTF-8 characters

2007-04-23 05:28:56
Michael Kay wrote:
I don't know how good Java is at getting the encoding right, for example
whether it will use a different encoding if you use configuration options
such as "cmd /u" identified by Abel. I'll do some experiments.

Java will choose the default encoding of the underlying system, which is, in the case of Windows, the codepage set in International and Regional settings. This codepage is never compatible with IBM-437 (or CP437) used for the command window, which is age old (1981). When the Regional settings are set to US or some Western European country, the codepage will default to CP1252 (windows-1252) (which is, like I said, incompatible with the codepage for the console, giving the weird characters in the U+0127+ range).

It is very awkward that Microsoft never chose to upgrade the default codepage of the DOS console to be the same as Windows, but you can set your default settings in the registry or in some system *.cmd file (I forgot the name) (but then again, you can't set it to default to whatever is in your Regional Settings...)

In Saxon, xsl:message by default uses a Java Writer, whereas "normal" result
documents use a Java OutputStream.

I'd like to argue in favor of defaulting to a particular encoding instead (i.e., UTF-8), because now it's like a lottery how the underlying system will determine what codepage it becomes (and build once run everywhere does not mean 'run everywhere and act equally' anymore, which I consider a pity). But such a discussion would be better suited on the Saxon list I believe.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--