perl-unicode

Re: Correct use of UTF-8 under Unix

1999-11-04 10:34:38
Karlsson Kent - keka writes:
:       So, the interoperable line (or 'stronger') separators in
: "plain text" are:
: 
:       \X{2028}|\X{2029}|\r\n|\n|\r|\f|\v|\X{85}

This is slightly wrong if you broaden the picture to more than Unix.
You shouldn't really use \r\n to mean \015\012 because \n is (according
to K&R) a logical newline, not \012.  On a Mac, the "physical" meanings
of \r and \n are reversed, but the logical meanings are the same.  That
is, \n is newline, but it happens to be represented with \015.  So when
a Mac looks at a Windows newline, it sees \n\r, not \r\n.

: (I'm probably mixing Perl and C (and flex) syntax here.) Some
: of them are "stronger" in some senses than line separation,
: but for the purposes of counting logical lines, and deciding
: logical line begin and logical line end, there should be no
: difference.  A single logical line may be *dynamically* wrapped 
: into several displayed lines, but that is a different matter.
: 
:       Note that there are some "legacy" encodings which do not
: have any or all of \f|\v|\X{85}.

As I mentioned earlier, Perl doesn't count \f as a new line for line
counting purposes.  This seems to be how editors treat them (or don't
treat them, depending on how you look at it.)  It's also consistent with
what wc thinks.

The other interesting thing is that we *removed* support for \v from
Perl some time ago, since nobody we were acquainted with had any idea
what it really meant, or if anyone actually used it for anything.
There have been no complaints.  Paint \v dead.

As for \X{85}, I've never heard of it.  But then, I'm not of Latin
extraction.

:       (I still think the idea of having two different kinds
: of "plain text" is a bad idea.  I haven't heard anyone else
: entertain it either.)

I don't think we should entertain it here either, except to
repudiate it.  :-)

But I think we still need to agree on what's a line number, or confusion
will ensue.

Larry

<Prev in Thread] Current Thread [Next in Thread>