perl-unicode

Re: [Encode] Encode::Supported revised

2002-04-03 17:21:48
Hello, Dan!

DK>    Encode is near completion.  I am still bulding djgpp environment for
DK> possible fixes needed but anything else is over.

My congratulations! :-)

DK>    (*9) Nicknamed Latin0; Euro sign as well as  French and Finnish
DK>         letters that are missing from 8859-1 are added.
Hmmm.. There seems to be no (*9) footnote in the table, but there's
a dangling (*15)..

DK> in the Net.   L<Encode> comes with the following KOI charsets.  for
DK> gory details, See <http://czyborra.com/charsets/cyrillic.html> for
         ^^
DK> details.
    ^^


DK> "Encoding vs Charset"

Hmm.. I seem to have a "special opinion" on this!

Though I'm still rewriting this I'm making half-cooked variant
available:
http://tagunov.tripod.com/survey2.html
(under construction)
(http://tagunov.tripod.com/survye.html
(original variant, that
I have decided to nuke - complex and hash)

In short,
- [RFC 2130], [RFC 2278] have established CCS, CES terminology

- "Coded Character Set" sounds ambiguous with the ISO terminology,
  as cited by [RFC 1345].

- My opinion is that 'ISO Coded Character Set' = 'CES + CCS'

- CCS is not ambiguous with ISO terminology, as the abbreviation
  has first been introduced by RFC 2130 and seemed not to be
  used before
  
  I have already seen in some articles CCS being used to
  mean "Coded Character Set" in the [RFC 2130] meaning.

- note that [RFC 2278] (logical successor to [RFC 2130])
  recommends to tear apart the "charset" abbreviation from
  "character set" and "coded character set".

  It recommends to use "charset" in a meaning
  identical to CES.

So we have

'RFC 2130 Coded Character Set'             = CCS
'RFC 2130 Character Encoding Scheme'       = CES
                                           = encoding(?)
'MIME charset, as recommended by RFC 2278' = CES

'ISO Coded Character Set' = CCS+CES
'Coded Character Set', 'Character Set' are not clear
                                       outside of context


So maybe this heading should better become
"Encoding vs CCS" or "CES vs CCS"? I know it sounds less
understandable, but maybe it is a less controversial approach?

DK> However, the word I<charset> is casually used even in Internet
DK> Assigned Number Authority to actually mean I<encoding>.

Ooops! Haven't seen this when writing my prev. set of comments!
However CCS sounds more accurate (more scaring and less
understandable too) to me. I prefer accuracy, and you?

DK> Encode tries to soothe this misconception via aliases.

Hmm.. this leaves an impression that this is the only thing
that aliases do. Was that the intent?

Otherwise the description of the difference between CCS and CES
sounds _very_ good to me :-)


DK> The very dspecification of ISO-2022 is available from the link above.
            -dspecification
            + specification

DK> I could not find this page because the hostname doesn't resolve!

DK>   Brief description for most of the mentioned CJK encodings
DK> L<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>

Okay, let's nuke this!
- it is over-covered by cybozza
- I have no trouble resolving the host but the page fails to load
  to the end anyway

- Anton