ietf
[Top] [All Lists]

Re: I-D file formats and internationalization

2005-11-30 19:35:03
At 5:59 PM -0800 11/30/05, Douglas Otis wrote:
On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote:
At 1:54 PM -0800 11/30/05, Douglas Otis wrote:

Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests,

That is not what the RFC suggests at all. The character set is Unicode. The encoding is UTF-8. That's it.

Unicode provides a unique number for every possible character within a current range of about 97,000 characters. These characters include punctuation marks, diacritics, mathematical and technical symbols, arrows, dingbats, etc. Displaying one of these characters requires a character-set (synonymous with a display system's font-set or character-repertoire), or using the unicode vernacular, a script. It is not just a matter of which character is displayed, which character-repertoire is used, but there are also Middle Eastern right-to-left issues as well.

It may be better to use a single vocabulary for discussing things such as internationalization and character sets. That's the purpose of RFC 3536.

Being able to review the ID as it would appear as an RFC would also seem to be a requirement.

That means changing the Internet Drafts process as well. Certainly possible, but more daunting that changing one process at a time.

As an ID becomes an RFC, it seems expecting last minute changes to the document would be even more daunting.

Yep, that's the tradeoff. We already make some automatic changes after in Internet Draft is approved by the IESG, and we allow others without IESG oversight. This would be another class. That scares some people, and not others. Having Internet Drafts use Unicode in UTF-8 instead of ASCII scares some people, and not others.

It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets.

Unicode is universally displayable if you have the right font(s). Regardless of that, however, any sane document author would not assume that every person reading the document could display it. They would put a legend or explanation near the example.

Assume such characters can not be displayed, at least not with the ASCII version that excludes the extended character-set allowed by unicode. An escape mechanism would be needed to accommodate alternative text, where displaying '?' for the unicode characters that extends beyond ASCII would not be a very satisfactory solution, as this would make the ASCII version less authoritative, to say the least, and break the way many use the RFC text files.

No escape mechanism is needed. Non-displayable characters are still in the RFC, they simply can't be displayed by everyone (but they can be displayed by many). This is infinitely simpler, and a much better long-term solution, than "an escape mechanism". Further, there would be no more "ASCII version" to be authoritative. The Internet Draft clearly says that there is a single RFC, and it has a single encoding.

I liked the idea that Frank suggested, use the HTML escape sequence to declare the unicode character. This allows the ASCII version to remain authoritative.

... as well as unreadable and unsearchable using normal search mechanisms. The purpose of the proposal is to allow RFCs to be readable and searchable using the encoding that is common on the Internet, without resorting to sorta-kinda-HTML or an "escape mechanism". Remaining with plain ASCII would be better than either of the latter.

--Paul Hoffman, Director
--VPN Consortium

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf