perl-unicode

Re: Use of UTF-8 under Perl and Unix

1999-11-02 16:43:58
Then why use the very restricted Unicode BOMs, which can only signal the
various Unicode encodings, but nothing else. ISO 2022 provides ESC
sequences that you can place at the start of a file to signal EVERY
encoding in the ECMA registry. Several hundred different ASCII
extensions have registered ISO 2022 codes to announce them. If you want
to have a stateful encoding with all its uglinees, then better say so by
admitting that what you really want is ISO 2022. ISO 2022 is in no way
worse than BOMs. It has exactly the same problems.

I don't know ISO 2022.  The term "ESC sequences" worries me.  Does this mean
it is not a single unicode character, but a sequnce of unicode characters?
How many programs would interpret this as being part of the actual text,
instead of ignoring it?  That would be bad.  You would in fact have created a
new file type, which causes more trouble than it solves.  Hopefully I'm wrong
here.

ISO 2022 is the ISO standard for designating character-set information
in a byte stream.  Use of ISO 2022 would allow a file to contain not
just UTF-8 but any other internationally registered character-set.
ISO 2022 is what is used by ANSI X3.64 based terminals such as the DEC
VT line, SCO ANSI, Linux Console, ... to control character-set
display.

When using ISO 2022 a UTF8 byte stream would be prefaced by 

  <ESC> % G

To return to ISO 2022 mode, the byte stream would be followed by

  <ESC> % @


    Jeffrey Altman * Sr.Software Designer * Kermit-95 for Win32 and OS/2
                 The Kermit Project * Columbia University
              612 West 115th St #716 * New York, NY * 10025
  http://www.kermit-project.org/k95.html * 
kermit-support(_at_)kermit-project(_dot_)org


<Prev in Thread] Current Thread [Next in Thread>
  • Re: Use of UTF-8 under Perl and Unix, Jeffrey Altman <=