ietf
[Top] [All Lists]

RE: Last Call: draft-klensin-net-utf8 (Unicode Format for Network Interchange) to Proposed Standard

2008-02-10 18:36:32


--On Monday, 07 January, 2008 22:30 +0100 Kent Karlsson
<kent(_dot_)karlsson14(_at_)comhem(_dot_)se> wrote:

Comment on draft-klensin-net-utf8-07.txt:

--------------------------

"Network Virtual Terminal (NVT)" occurs first in Appendix A.
The explanation of the abbreviation should (also) be given at
the first occurence of "NVT" in the document.

Fixed in -09

--------------------------

Section 2, point 2, "Line-endings..."

       "discussion.  The newer control characters IND (U+0084)
and NEL        ("Next Line", U+0085) might have been used to
disambiguate the"

I have a hard time figuring out what IND was supposed to be
used for, but I don't think it was for line endings. Chain
printer "font" change is the closest I get...
(http://www.freepatentsonline.com/3699884.html).

As far as I can tell, and based on the comments that came from
those who suggested that I make that addition, it is an index
(same position on next line) function.
 
NEL is used in EBCDIC originally (IIUC), and still used in
EBCDIC...

This is just notation.   Whether the function are the same may
or may not be relevant.

The description "might have been used to disambiguate" is more
appropriate for U+2028 and U+2029.

That is why the next sentence says "Similar observations
apply...".  These things represent, as far as I can tell,
iterative attempts to get things right.
 
--------------------------

       "it, lines end in CRLF and only in CRLF.  Anything that
does not        end in CRLF is either not a line or is
severely malformed."

The sentence starting with "Anything" seems  severely
malformed... You don't really meant to say "Anything", I hope.
"Using other line ending or line separation conventions"
perhaps. And "severely malformed", I hope you did not mean
that either. "is lacking in conversion to
'net-utf8'/'net-Unicode'" perhaps.

Sentence has been rewritten into a conformance statement.

To be "rescrictive in what one emits and permissive/liberal in
what one receives" might be applicable here.

Upon reciept, the following SHOULD be seen as at least line
ending (or line separating), and in some cases more than that: 

LF, CR+LF, VT, CR+VT, FF, CR+FF, CR (not followed by NUL...),
NEL, CR+NEL, LS, PS
where
LF    U+000A
VT    U+000B
FF    U+000C
...

The reasons why the robustness principle should not be applied
as you are trying to apply it are an interesting philosophical
discussion that does not, IMO, help here.  The bottom line is
that this is a spec for a single standard format, not a whole
serious of variations that senders have the right to assume that
receivers will support.

I've elided comments below that seem to be just different ways
to pursue the theme of "why don't we support every character
that might imaginably be a line-ending as if it were one".

--------------------------

Section 2, point 3:

You have made an exception for FF (because they occur in
RFCs?). I think FF SHOULD be avoided, just like VT, NEL, and
more (see above). Even when it is allowed, it, and CR+FF,
should be seen as line separating.

No. See above.  The question of what characters should be on
that list has been discussed endlessly and the text has been
changed repeatedly to explain why various proposals.  If this
work is to be completed, we need to stop somewhere.

You have also (by implication) dismissed HT, U+0009. The
reason for this in unclear. Especially since HT is so common
in plain texts (often with some default tab setting). Mapping
HT to SPs is often a bad idea. I don't think a default tab
setting should be specified, but the effect of somewhat (not
wildly) different defaults for that is not much worse than
using variable width fonts.

An explanation appears in -08.
 
SP, U+0020, is nowadays not seen as a control character, not
even in your own text... (same paragraph).


--------------------------

   "However, because they were optional in NVT applications
   and this specification is an NVT superset, they cannot be
prohibited    entirely." 

Why not? Why must this be a strict NVT superset? I think it
would be rather important to rule these strange beasts out
from net-utf8. These were really ASCII (ISO 646) features, but
have been ruled out much before Unicode.

But you have argued that some of them should be treated as line
separators and any system that supports VT100 controls (i.e.,
U**x or almost any of its children) still require them.

--------------------------

     "[ISO10646]
              International Organization for Standardization,
              "Information Technology - Universal Multiple-
Octet Coded               Character Set (UCS) - Part 1:
Architecture and Basic               Multilingual Plane"",
ISO/IEC 10646-1:2000, October 2000."

That seems a bit old... Better with the current revision:

ISO/IEC 10646:2003   Information technology -- Universal
Multiple-Octet Coded Character Set (UCS)

with the amendments (which I don't think you should reference
explicitly): ISO/IEC 10646:2003/Amd 1:2005  Glagolitic,
Coptic, Georgian and other characters ISO/IEC 10646:2003/Amd
2:2006  N'Ko, Phags-pa, Phoenician and other characters (and
more amendments in the works).

Changed in -09.   I hope you like the new form better.

      john



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
http://www.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>
  • RE: Last Call: draft-klensin-net-utf8 (Unicode Format for Network Interchange) to Proposed Standard, John C Klensin <=