perl-unicode

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 06:22:13
Am 04.05.2010 um 13:06 schrieb Michael Ludwig:

Is it this (theoretically fragile) implicitness in handling character strings 
that makes \C a bad idea?

But probably not as bad an idea as relying on the default platform encoding 
in Java ("default charset" in Java API doc lingo), which may be different 
from country to country and from installation to installation.

http://java.sun.com/javase/6/docs/api/java/lang/String.html#String%28byte[]%29

Or, more symmetrically to encoding via \C in Perl:

http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes%28%29

  public byte[] getBytes()
    Encodes this String into a sequence of bytes
    using the platform's default charset, storing
    the result into a new byte array.

Much more serious and real than implicitly encoding via \C in Perl, given the 
fact that Java installations do not all use the same platform encoding, while 
all current Perl installations use the same internal encoding. (All Java 
installations use the same internal encoding of UTF-16, I think, but this fact 
is well hidden from the interface.)

-- 
Michael.Ludwig (#) XING.com