Re: Character-set header

Risto Kankkunen writes:

-- what I want is the ability to be able to distinguish character set
information for subtypes that I've never heard of (subject to having a
reasonable top-level type, of course).

But what can you do with the character set information?

I think it isn't safe to do character set conversions, for example,
without knowing the subtype. For example, if I have a TEXT-PLUS message
encoded in Latin 1, and I convert it to our national 7-bit character
set, O diersis ends up to be what is backslash character in ASCII.
Backslash might be a special character in that particular subtype and
need some kind of quoting.


This is a strawman argument. If you have a TEXT-PLUS message that contains O
dieresis's, and you convert it to your national variant, the assumption is that
your software is then prepared to operate under the assumption that it will be
presented with your national variant character set, and the character in the
position where backslash usually sits is now a O dieresis. If your software
is not prepared to operate under these conditions, you should have converted
the document.

Similarly, if I get a document coded in your national variant, I then know
that the character in the backslash slot is actually an O dieresis, and I
have to convert before my Latin-1-expecting software will work properly.

Also, I think you can't decide whether you can display the message
solely on the character set information. Some TEXT-PLUS subtypes you
have might not support that particular character set even if your
hardware were capable. Or the viewer for that subtype might be able to
show the message appropriately even if your hardware didn't support the
character set directly.


There are several possibilities for a given text-plus subtype:

(1) It can be one that's totally character-set ignorant. A good example
    would be DEC Standard RUNOFF, which, apart from a limited repetoire
    of prefix characters, simply does not care what character set it is
    dealing with -- all characters that are not "specials" are simply
    character data.

    Such a text-plus subtype can be converted to any character set within
    reason (an EBCDIC conversion would even be reasonable if there is an
    EBCDIC variant of DSR); the criteria here is that it has to be
    compatible with whatever output device is used. The software is just
    a (more or less) character-set independent filter.

(2) Other text-plus types are character-set specific. This breaks down into
    two subcases, one where the character set never changes, and another
    where the character set is implementation-defined. The latter case is
    one where conversion is not only useful, it is essential for proper
    operation (text-plus/tex is the example I previously posted). The former
    case is one where the specification is irrelevant; as far as the
    transport is concerned the data must remain untouched as much as possible,
    and a character set should not be specified since it is implied by the
    subtype itself.

(3) Finally, a text-plus type can specify a character-set or character-sets
    using internal rules. This is another case where an external specification
    is bound to be wrong and should not be allowed.

So, in cases (1) and (2b) a character set specification is not only useful,
it is the only way to insure that gateways do something reasonable. In cases
(2a) and (3) a character set specification is not a good idea, and gateways
probably should keep their hands off of the data if possible.

                                        Ned