ietf-822
[Top] [All Lists]

Re: charsets and glyphs

1993-02-17 18:28:21
"glyph" doesn't seem like the right term...but I'm not sure what is.  Maybe:
"...an algorithm for converting an octet stream into characters" is
sufficient.

I agree that the term "character" is better than "glyph".  The
question then becomes "What is a character?"  In standards like
ISO-8859-1, an e-acute is a single character, but in Unicode, e-acute
may be represented either as e-acute or as "e" followed by acute.  In
Unicode, acute is a character in its own right.

But I would like to argue that MIME should not be concerned with such
details.  Instead, it should allow the document(s) that define a
charset to use whatever definition of the term "character" they want.

In the world of networking, striving for interoperability is of utmost
importance.  If a sender uses "e" followed by acute, and the
receiver's software cannot cope with that representation, then we have
an *interoperability* problem.

But that would not be MIME's fault.  That would be the fault of the
document that defines the Unicode/10646-based charset.  That document
would have to be pretty damn clear about what is and what is not
permitted as far as those cute accents are concerned.

So, I suggest the following prose:

    The terms "character set" and "charset", where used in this document,
    refer to an algorithm for converting between octet streams and
    character sequences.  Definitions for the term "character" are found
    in the documents that define each charset.


Erik


<Prev in Thread] Current Thread [Next in Thread>