Re: I-D file formats and internationalization

At 5:59 PM -0800 11/30/05, Douglas Otis wrote:

On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote:
At 1:54 PM -0800 11/30/05, Douglas Otis wrote:
Rather than opening RFCs to text utilizing any character-setanywhere, as this draft suggests,
That is not what the RFC suggests at all. The character set isUnicode. The encoding is UTF-8. That's it.
Unicode provides a unique number for every possible character withina current range of about 97,000 characters. These charactersinclude punctuation marks, diacritics, mathematical and technicalsymbols, arrows, dingbats, etc. Displaying one of these charactersrequires a character-set (synonymous with a display system'sfont-set or character-repertoire), or using the unicode vernacular,a script. It is not just a matter of which character is displayed,which character-repertoire is used, but there are also MiddleEastern right-to-left issues as well.

It may be better to use a single vocabulary for discussing thingssuch as internationalization and character sets. That's the purposeof RFC 3536.

Being able to review the ID as it would appear as an RFC wouldalso seem to be a requirement.
That means changing the Internet Drafts process as well. Certainlypossible, but more daunting that changing one process at a time.
As an ID becomes an RFC, it seems expecting last minute changes tothe document would be even more daunting.

Yep, that's the tradeoff. We already make some automatic changesafter in Internet Draft is approved by the IESG, and we allow otherswithout IESG oversight. This would be another class. That scares somepeople, and not others. Having Internet Drafts use Unicode in UTF-8instead of ASCII scares some people, and not others.

It seems problematic for protocol examples to use non-ASCIIcharacters owing to there not being ubiquitously displayablecharacter-sets.
Unicode is universally displayable if you have the right font(s).Regardless of that, however, any sane document author would notassume that every person reading the document could display it.They would put a legend or explanation near the example.
Assume such characters can not be displayed, at least not with theASCII version that excludes the extended character-set allowed byunicode. An escape mechanism would be needed to accommodatealternative text, where displaying '?' for the unicode charactersthat extends beyond ASCII would not be a very satisfactory solution,as this would make the ASCII version less authoritative, to say theleast, and break the way many use the RFC text files.

No escape mechanism is needed. Non-displayable characters are stillin the RFC, they simply can't be displayed by everyone (but they canbe displayed by many). This is infinitely simpler, and a much betterlong-term solution, than "an escape mechanism". Further, there wouldbe no more "ASCII version" to be authoritative. The Internet Draftclearly says that there is a single RFC, and it has a single encoding.

I liked the idea that Frank suggested, use the HTML escapesequence to declare the unicode character. This allows the ASCIIversion to remain authoritative.

... as well as unreadable and unsearchable using normal searchmechanisms. The purpose of the proposal is to allow RFCs to bereadable and searchable using the encoding that is common on theInternet, without resorting to sorta-kinda-HTML or an "escapemechanism". Remaining with plain ASCII would be better than either ofthe latter.


--Paul Hoffman, Director
--VPN Consortium

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf