On 2002.03.10, at 23:37, Nick Ing-Simmons wrote:
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> writes:
We have to keep binmode for a while as that is what Camel-III describes.
But we could make it an alias or wrapper on something better if we
can agree what would be better.
IMHO we should also prepare OO layers as well. So we can go like
use FileHandle;
STDIN->encoding("iso-2022-jp");
or something like that. This one is definitely more intuitive than
binmode(). Or has it already been so? No mention to IO discipline can
be found on IO::Handle and FileHandle POD.
Another problem I would like to raise is that this "encoding on IO
layer" works when and only when you know what encoding to use a
priori. In real life that is rather rare: For most cases you have no
idea what encoding is appropriate until you peek the contents.
Suppose you want to write a program that mirrors web content. But
this time you want to convert any given text to UTF-8 so you can build a
multilingual index thereof. In this case you won't know what encoding
to use until you read the Content-Type: header.
Okay, so you decided to make your program read as ascii until header
ends and use binmode() to switch encoding accordingly to the header.
(I'm yet to test if this is possible but this idea comes naturally).
But even this will fail in many classical web sites in Japan where
started services since HTTP/0.9. In that case you have to resort to code
guessing.
My humble Jcode implements code guessing and it works fairly well
except for ambiguous cases between EUC-JP and Shift JIS. But this works
because Jcode hubristically assume that the string in question is at
least some sort of Japanese encoding.
My experience from Jcode tells that code guessing is possible so long
us you have some idea on what language is used. Japanese is among the
hardest so it should work on other languages. But once again, how are
you going to tell in what language your string is written? Some sort of
hinting is imperative and since we are dealing with *external* text
here, locale is useless.
We still have more then 3 weeks before April fool's day but how about
this?
study SCALAR LANGUAGE
Takes extra time to study SCALAR ("$_" if unspeci-
fied) in anticipation of doing many pattern matches and charset
conversion on the string before it is next modified.
This recycled reserved word makes me puke but at lease more intuitive
than binmode :).
Dan the Man with too Many Encodings to Deal with