perl-unicode

Re: use Encode; # on Japanese; LONG!

2002-01-09 20:52:02
jhi,

On 2002.01.10, at 11:18, Jarkko Hietaniemi wrote:
Yes, I've heard of you :-)

  Thanks.

(I'm CCing Nick Ing-Simmons, the original author of Encode, and
SADAHIRO Tomoyuki, who has worked on it a little bit, and who might
also know some Japanese :-)

  Therefore I'm cc'ing this to all recepients.

First off, I'm really thankful that you took a careful look the
current state of Encode as per Japanese encodings.

And my apology for not moving quick enough despite being a mainterner of Jcode. Well, I have an excuse. My 1st ($[ = 0, of course :) child was born 18 days ahead of ETA. She was supposed to be born on the 7th this month but "hello world"ed on the 20th last month. So my holiday season schedule was quite an disarray....

I won't (can't) comment on much the Encode details, since I'm pretty
unfamiliar the design or the implementation, all I've done is to add
some (eight-bit) encodings many moons ago.  I'm hoping Nick and Sadahiro
will join in and comment.

The surprising thing however broken it functioned somewhat. Not bad for a character set you have virtually no idea on. It's as miraculous as assembling the Machine out of the blueprint sent over the stars (Read/Seen 'Contact' by late Carl Sagan?).

How about "not at all"? :-)

How do you say that in Finnish? In Japanese it would be "Zenzen Wakarimasen".

don't have to; I don't grok Finnish either :).  It takes more than a
simple table lookup to handle Japanese well enough to make native
grokkers happy.  It has to automatically detect which of many charsets
are used, it has to be robust, and most of all, it must be documented in
Japanese :)  I can do all that.

Excellent.

Or is it? As a matter of fact Jcode POD contains no Japanese since pod parser groks no Japanese. It just has a web page in both languages and mailing list, however....

   If I submit Encode::Japanese, are you going to merge it standard
module?

Definitely, yes.  Implementation-wise you'll have to discuss with Nick
since whatever we use should work with the Tcl/Tk scheme (hence the
name Encode::Tcl, as you no doubt guessed.)  Sadahiro can comment on
both Encode and Japanese.

  I'm honored to be a gene donator of the beast!

Dan the Man with Too Many Charsets to Deal With

Sounds good :-)

One nit, though: the sooner you can start *and* finish the task,
the better.  For delivery dates, I would prefer "yesterday"... Why?
I want to release a 5.7.3 really, REALLY soon now, so that module
authors and users can test their stuff against it, so that 5.8.0 can
be released in a few months.  So I hope you haven't got any previous
commitmentents, like a day job or a family :-)

Okay, I'll move as quickly as possible but if the worse gets the worst I can still upload it to CPAN (I just want to make sure the name space remains untouched). If I just code a bridging module to Jcode that would be just a few hours away but I wouldn't want to do that knowing I can implement much simpler and more elegantly. I also believe the same scheme can be applied to other CJKV languages/charset. But once again I need some help to come that far. I know some chinese (perhaps enough to debug the code. I can at least tell if certain string is a sentence or line noise :) but I know little Korean and absolutely no Vietnamese...
  Well, enough mubling done.  Back to coding....

Dan the Man with Too Many Breed of Camels (that is, too many versions of Camels to babysit; I still have a customer that sticks with perl4, y'know).