xsl-list
[Top] [All Lists]

RE: encoding of text files

2002-11-14 04:31:56
Oh yes. Forgot to mention byte-order-marks. Wasn't sure if they applied
to UTF-8 or just UTF-16!

Rgds,

Dan.

-- 
Danny Yates
Technical Architect
Abbey National Treasury Services
E-mail: Danny(_dot_)Yates(_at_)ants(_dot_)co(_dot_)uk
Phone: +44 20 7756 5012
Fax: +44 20 7612 4342


-----Original Message-----
From: Julian Reschke [mailto:julian(_dot_)reschke(_at_)gmx(_dot_)de]
Sent: 14 November 2002 11:07
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] encoding of text files


Just a throught: it may make sense to prefix the text file with a UTF-8 BOM
(as far as I remember, at least Notepad on Windows honors this).

--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com]On Behalf Of 
Yates, Danny
(ANTS)
Sent: Thursday, November 14, 2002 11:49 AM
To: 'xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com'
Subject: RE: [xsl] encoding of text files


Hi Joerg,

If you are outputting UTF-8 then your a-umlaut will be written as
a two-byte sequence. If your output is serialised XML or HTML then
this is fine, as there are headers which can declare that the
content is UTF-8 encoded. If, however, you are writing a plain text
file (as you say you are), there is no way for the process which
reads it in to determine whether it is UTF-8, ASCII, iso-8859-1 or
whatever.

The first string you give would appear to indicate that there are,
as expected, two bytes in the output stream where you expect your
a-umlaut character to appear, and the program you are using to
view this file doesn't understand this.

When you ask XSLT to output using iso-8859-1, it know that in this
encoding there is a single byte representation of a-umlaut, and it
uses this and it is correctly intpretted by your viewing program.

So, if you must write out UTF-8 (and it's quite possible that you
may be able to survive with iso-8859-1 if you're just using a few
simple accented characters, such as French and German), then you
need to tell your viewing program that the byte stream you are
feeding it is a UTF-8 encoded character stream.

Regards,

Dan.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


***************************************************************************
This communication (including any attachments) contains confidential 
information.  If you are not the intended recipient and you have received this 
communication in error, you should destroy it without copying, disclosing or 
otherwise using its contents.  Please notify the sender immediately of the 
error.

Internet communications are not necessarily secure and may be intercepted or 
changed after they are sent.  Abbey National Treasury Services plc does not 
accept liability for any loss you may suffer as a result of interception or any 
liability for such changes.  If you wish to confirm the origin or content of 
this communication, please contact the sender by using an alternative means of 
communication.

This communication does not create or modify any contract and, unless otherwise 
stated, is not intended to be contractually binding.

Abbey National Treasury Services plc. Registered Office:  Abbey National House, 
2 Triton Square, Regents Place, London NW1 3AN.  Registered in England under 
Company Registration Number: 2338548.  Regulated by the Financial Services 
Authority (FSA).
***************************************************************************


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>