perl-unicode

Re: Encode::Guess fails on UTF-16BE string w/ newline characters

2003-04-13 06:30:05
Jay,

Thanks for your report. I have to confess UTF-16(BE|LE) as possible suspects.

On Sunday, April 13, 2003, at 02:59  AM, Jay Lawrence wrote:
Points
- what is the best way to open and read data that might be: UTF-8, UTF-16, UTF-16BE, or UTF-16LE? - is there a good way to chop the line endings reliably for the above 4 sets? - maybe detecting the flavour of unicode is better left to a different process?
                Encode::Guess::Unicode?

One possible solution is to detect the presence of \x00 and when detected we assume UTF-(16|32)(BE|LE). The ones with BOM is already supported.

Plz advise - perhaps just documentation expansion is necessary and can help w/ that based on this matter.

I definitely will.

Dan the Encode Maintainer