Encode::Guess fails on UTF-16BE string w/ newline characters

I am trying to decode strings of suspect UTF origins - Encode::Guessseems to be the way to go....

So I am opening a file "normally" and just reading line by line. I willpass the line of text through Encode::Guess which I have used thusly:


        use Encode::Guess qw(UTF-8 UTF-16BE);  #I may add more in future

Now what I read in is *usually* UTF-8 and all is good. But if aUTF-16BE string comes along here is what happens:


Encode/Guess.pm: 92
  DB<2> x $octet
0  
"\c(_at_)B\c@E\c(_at_)G\c@I\c(_at_)N\c@:\c(_at_)V\c@C\c(_at_)A\c@R\c(_at_)D\c@\cM\c(_at_)\cJ"

Encode/Guess.pm: 94
  DB<3> x $line
0  
"\c(_at_)B\c@E\c(_at_)G\c@I\c(_at_)N\c@:\c(_at_)V\c@C\c(_at_)A\c@R\c(_at_)D\c@"

*NOW* when it is testing the decode of a UTF-16BE string it will_always_ come up one byte short and will never match a successfuldecode even though that is what it really is.


we should have :
0  "\c(_at_)B\c@E\c(_at_)G\c@I\c(_at_)N\c@:\c(_at_)V\c@C\c(_at_)A\c@R\c(_at_)D"

changing the split to include "\000+" in the split fixes this problem.But it would break for UTF-16LE, right?


Points

- what is the best way to open and read data that might be: UTF-8,UTF-16, UTF-16BE, or UTF-16LE?- is there a good way to chop the line endings reliably for the above4 sets?- maybe detecting the flavour of unicode is better left to a differentprocess?

                Encode::Guess::Unicode?

Plz advise - perhaps just documentation expansion is necessary and canhelp w/ that based on this matter.

Jay