perl-unicode

Re: byte order mark

1999-10-06 12:34:20

    Larry> The alternative seems to be to make sure that Perl removes every
    Larry> byte-order mark from its input.  But if every string or file has to
    Larry> treated as a special case, why didn't we just send the information
    Larry> out-of-band in the first place?  Where does it stop?  There's just
    Larry> an awful lot of type information that can be associated with
    Larry> strings--just look at MIME.

My opinion is that BOM's should be stamped out and de facto practice be
strictly UTF-8 for interchange, and UTF-8 or UTF-16 internally (where
endianess doesn't matter).  Remove those suckers.  Otherwise life just gets
more complicated.

    : Having Perl come up with a good mixed-text-encoding solution and lead
    : the way would encourage similar support from other tools (e.g. BASH on
    : Linux).

You obviously are not aware of ISO 2022.  It allows mixing of encodings in one
file.  As a matter of fact, if you build the latest Emacs properly, you can
get some experience with it.  Having worked constantly with ISO 2022 encoded
text in many different ways over the last 10 years, my opinion is this: way,
way too complicated.

The Mozilla folks have acquired/developed automatic encoding recognizers that
work *most* of the time, but can still make mistakes.  Even they avoid the
problem of mixing encodings in one file (with a small handful of well-known
exceptions).

    : That is, a tool that used U+2029 to separate paragraphs would treat
    : U+2028 as "soft", free to rearrage as needed.

Assuming U+2028 is a "soft" line break that can migrate and U+2029 is the
"hard" line break (the convention of many commercial systems), U+2029 is the
only choice for line separators when writing code.  For plain text, I
personally prefer viewing them as being the same except in GUI environments
where the spacing provided by U+2029 can be set by the user.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            The first virtue is to restrain the tongue;
New Mexico State University       he approaches nearest to the gods who knows
Box 30001, Dept. 3CRL             how to be silent, even though he is in the
Las Cruces, NM  88003             right.    -- Cato the Younger (95-46 B.C.E)

<Prev in Thread] Current Thread [Next in Thread>