perl-unicode

Terminology; byte? char? charset? encoding?

2002-03-19 15:17:21
On Wednesday, March 20, 2002, at 06:55 , Anton Tagunov wrote:
Hello, Dan!
Hello, Jarkko!
Hello, Nick!

I'm a bit confused with perl-unicode(_at_)perl(_dot_)org Is that a better place
for our conversation?  Is it alive? Has any traffic?

perl-unicode(_at_)perl(_dot_)org is good for me. Its traffic is moderate (I ought to subscribe to p5p and I once did but its traffic is too heavy on me. I may do so in future but for the time being perl-unicode is the place I use).

[snip]
   So
     "CES" === "coded character set"
     "CCS"  ne "coded character set"
     "CES"  ne "CCS"

When it comes to character handlings, we tend to be so frank about terminologies but I think that's okay for me so long as we can tell the difference. I try to be careful not to say 'char' to mean 'byte' but even I, living in a multibyte world, fails sometimes....
  As for CCS and CES, here is my implicit glossary.

byte                    = octet.  8 bytes
character = the smallest chunk of data that can be, ahem, supposed to be,
                        handled by text editors
CCS                     = character set, often abbreviated as 'charset'
CES                     = character encoding or simply encoding.

But as you see, even MIME headers are confised here, as in "Content-Type: text/plain; charset=iso-2022-jp". So we have to tweak perl's motto here; "There are more than one way to say it" :)


And have some sleep :-)))

My best regards, Anton

I just need that, too. It is seven fifteen in JST. I am a nocturnal creature but it has been long since the sun is up....

Dan the Man with Too Many Words to Define