perl-unicode

Re: Re[4]: ISO 8859-11 (Thai) cross-mapping table

2002-10-09 06:30:05
Robert Allerstorfer <roal(_at_)anet(_dot_)at> writes:

I agree that perl should accept all the IANA names.
As for the default names _I_ decided to use MIME name as prefered name 
when it existed - they seemed to be more "usable" (less embedded or at 
least more systematic-looking punctuation, more familiar from e-mail 
and HTTP headers etc.) We can revisit that if people think it would 
help.

Yes, I also think that the MIME names, if existing, are prefered. But,
continuing my example of 'shiftjis' used as default name by Encode,
this is not true. 

Whoops - you are right I had missed the _ removal. I think this is 
a result of the historical fact that very early Encode was based 
on Tcl's data (and to a lesser extent code) and Tcl uses "shiftjis"
or rather their file is ".../library/encoding/shiftjis.enc".

Tcl has/had two things which added "spin" to its names:
  A. At least once-upon-a-time it was fitting in an 8.3 DOS-oid filename space
  B. Some of its encodings are targetted at X11 font encodings - hence 
     its  'jis0212' is a 16-bit fixed-length font-fiendly one 
     which "we" call 'jis0212-raw',


If you watch the entry of MIBenum 17 at
http://www.iana.org/assignments/character-sets
its preferred MIME name is 'Shift_JIS'. If there is a name marked as
'preferred MIME name' by IANA, this name is the recommended one. This
also meets the W3C guidelines. W3C also recommends to use them all in
lowercase. Since they are case insensitive, I don't see any advantage
in not using them in all lowercase. The only allowed aliases for
shift_jis approved by IANA are 'MS_Kanji' and 'csShiftJIS', but not
'shiftjis'.

I concur. We should change the name _in_ our .ucm file, possibly 
_of_ our .ucm file (thoug that is not really important to our scheme).


Another example where Perl meets IANA's convention as well as their
'preferred MIME name' is MIBenum 4 which official name is
'ISO_8859-1:1987' but the preferred MIME name is the alias
'ISO-8859-1'. I would find it useful if Encode would be revised to
know all names listed in the IANA list mentioned and default to their
preferred MIME names, all in lowercase. Maybe the unique ID number
("MIBenum") could also be taken into account.

I have no objection to that - and I doubt Dan will either.
Would you care to at least enumerate the cases we fail - or ideally 
provide patch(es) ?

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/