ietf-822
[Top] [All Lists]

Re: Language tags and 10646

1993-03-07 16:03:37
In <731420798(_dot_)741006(_dot_)KLENSIN(_at_)INFOODS(_dot_)UNU(_dot_)EDU>, 
John wrote:
   (ii) We can use language tagging at the body part level and accept the
fact that this implies that poly-c/j/k doesn't go with text/plain and
must be handled either with multipart or with in-text language markup
(even if the character codes come from 10646).

Exactly right.

As John suggests, two important questions are

     1. What percentage of e-mail messages will require full
        multilingual capabilities, and

     2. How expensive is it to provide full multilingual
        capabilities?

I think the answers are "high" and "very," respectively.  (The
expense is not implementation, which in an ideal world would have
to be paid only once, but rather coming up with a specification
which everyone can agree with and then getting it deployed, which
is of course staggeringly difficult.)  I am by no means
suggesting that the full capabilities should not be pursued,
merely that their pursuit shouldn't impede progress on the easier
problems.

  This also suggests that a per-body-part language tag structure might
want to permit a short list of languages present in the message.  As
long as we define the tags as advice the sender is providing the
receiver in a canonical way (rather than something that the sender must
provide and the receiver must interpret), I don't see this as a problem.

I do.  Permitting a list seems to me to relegate the new tag to a
purely advisory capacity, upon which no automated decisions could
be based.

To return to an example I brought up a few days ago, <o-diaeresis> [ö]
can be transliterated as "oe" if it appears in German text, but
not in English, where it should be replaced with a single "o"
(assuming the diacritical form is not available).  In other words,
"coöperate" goes to "cooperate", but "schön" goes to "schoen".
(If you don't have a MIME-compliant reader, that was supposed to
be ``"co<o-diaeresis>perate" goes to "cooperate", but
"sch<o-diaeresis>n" goes to "schoen".'')

However, if I compose a message, mostly in English, which
contains both the English word "coöperate" ("cooperate") and
the German word "schön" ("schoen"), then a body-level language
tag of "English,German" (using the sufficiently concise ISO 639
encryptions), is not going to do the "right thing."

If the body-level language tag is limited to a single language, it
can be defined as "the only language, or the predominant language,
with which the message is composed", and a mail display program
can render the entire message using conventions appropriate to
that language without thwarting any expectations.

If, on the other hand, the body-level tag can contain a list, I'm
no longer sure what its proper definition is, I don't think
receiving software can make any automated use of it, and we would
be suggesting that software somewhere might be going to treat the
various languages within the message appropriately when (without
explicit, finer-grained tags) it obviously can't.

It seems to me that a body-level tag containing a list of
languages would be useful only to a human reader, yet we've agreed
that a human reader can usually distinguish languages by reading
them (and several people have further asserted that the
Content-Type and/or Language: lines aren't intended for human
consumption).

Perhaps a list of languages is intended to be of use for a
hypothetical heuristic Han demultiplexer, to tell it that the
message is worst-case?

(1) We reaffirm the principle that text/plain implies text that is not
expected to contain markup, for languages or anything else...
(2) We provide a body-part language tagging capability that can take a
list of languages...
(3) People who are worried about precise identification of multilingual
texts should go off and put together a definition...
(4) If lightweight intra-body-part language switching is needed, then
someone should make a specific proposal...

I agree completely, with the exception of the "list of
languages."  If the possibility of a list is perceived as
important and/or useful, I'd like to see some examples of how it
might be used (both on the composing and viewing ends).  I can
see how a single tag would be useful; at the moment a list tag
seems only to decrease the possibility of automated use and
increase the possibility of unmet expectations (i.e. the user
specifies "English,German" or "Japanese,Chinese" but the system
doesn't make use of that information in any useful way).

                                        Steve Summit
                                        scs(_at_)adam(_dot_)mit(_dot_)edu

<Prev in Thread] Current Thread [Next in Thread>