New content-language draft

(slightly red-faced: YES, I forgot to include it on the previous mail....)

draft                        Language Tag                       May 94


                 Language tags for MIME content portions

                     Fri May 13 09:50:25 MET DST 1994


                         Harald Tveit Alvestrand
                                 UNINETT
                       Harald(_dot_)Alvestrand(_at_)uninett(_dot_)no






    Abstract

    This document describes a Content-Language: header for use with
    body parts of MIME.

    It also describes a new parameter to the Multipart/Alternative
    type, to aid in the usage of the Content-Language: header.


    Status of this Memo

    This draft document is being circulated for comment.

    If consensus is reached it may be submitted to the RFC editor as a
    Proposed Standard protocol specificiation.

    Please send comments to the author, or to the IETF-822 mailing
    list <ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu>

    The following text is required by the Internet-draft rules:

    This document is an Internet Draft.  Internet Drafts are working
    documents of the Internet Engineering Task Force (IETF), its
    Areas, and its Working Groups. Note that other groups may also
    distribute working documents as Internet Drafts.

    Internet Drafts are draft documents valid for a maximum of six
    months. Internet Drafts may be updated, replaced, or obsoleted by
    other documents at any time.  It is not appropriate to use
    Internet Drafts as reference material or to cite them other than





Alvestrand                  Expires Nov 94                    [Page 1]

draft                        Language Tag                       May 94


    as a "working draft" or "work in progress."

    Please check the I-D abstract listing contained in each Internet
    Draft directory to learn the current status of this or any other
    Internet Draft.

    The filename of this document is draft-alvestrand-language-tag-
    01.txt









































Alvestrand                  Expires Nov 94                    [Page 2]

draft                        Language Tag                       May 94


    1.  The Language tag

    The language tag is composed of 2 parts: A language tag and a
    subtag.

    The syntax of this header in RFC-822 EBNF is:


    Language-Header = "Content-Language" ":" 1#Language
    Language ::= 1*8ALPHA [ '-' 1*8ALPHA ]

    Note that the Language-Header is allowed to list several languages
    in a comma-separated list.

    All tags are to be treated as case insensitive; there exist
    conventions for capitalization of some of them, but these should
    not be taken to carry meaning.

    The namespace of language tags and subtags is administered by the
    IANA. The following registrations are predefined:

    In the language tag:


    -    All 2-letter codes are interpreted according to ISO 639.

    -    All 3-letter codes are reserved for a (hopefully) forthcoming
         extension to ISO 639

    -    The value "IANA" is reserved for IANA-defined
         subregistrations

    -    The value "X" is reserved for private use. Subtags of "X"
         will not be registered by the IANA.

    -    No other registration is allowed.

    In the sublanguage tag:


    -    All 2-letter codes are interpreted as ISO 3166 country codes,
         according to the rules laid down in ISO 639.







Alvestrand                  Expires Nov 94                    [Page 3]

draft                        Language Tag                       May 94


    -    Codes of 3 to 8 letters may be registered with the IANA by
         anyone who feels a need for it. IANA has the right to reject
         registrations that are felt to be misleading.

    The information in the sublanguage tag may for instance be:


    -    Country identification, such as en-US (this usage is
         described in ISO 639)

    -    Dialect information, such as no-NYNORSK or en-COCKNEY

    -    Languages not listed in ISO 639, which can be registered with
         the IANA prefix, such as IANA-CHEROKEE


    If multiple languages are used in the MIME body part, they are
    listed with commas between them.

    NOTE: The ISO 639/ISO 3166 convention is that language names are
    written in lower case, while country codes are written in upper
    case. This convention is recommended, but not enforced; the tags
    are case insensitive.

    NOTE: ISO 639 defines a registration authority for additions to
    and changes in the list of languages in ISO 639. This authority
    is:


         International Information Centre for Terminology (Infoterm)
         P.O. Box 130
         A-1021 Wien
         Austria
         Phone: +43 1  26 75 35 Ext. 312
         Fax:   +43 1 216 32 72

    The following codes have been added in 1989 (nothing later): ug
    (Uigur), iu (Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi
    (Yiddish, replacing ji), and id (Indonesian, replacing in).










Alvestrand                  Expires Nov 94                    [Page 4]

draft                        Language Tag                       May 94


    2.  MEANING

    The meaning of the header is:


    -    For a single information object, it should be taken as the
         set of languages that is required for a complete
         comprehension of the complete object. Examples: Simple text.

    -    For an aggregation of information object, it should be taken
         as the set of languages used inside components of that
         aggregation.  Examples: Document stores and libraries.

    -    For information objects whose purpose in life is providing
         alternatives, it should be regarded as a hint that the
         material inside is provided in several languages, and that
         one has to inspect each of the alternatives in order to find
         its language or languages.  In this case, multiple languages
         need not mean that one needs to be multilingual to get
         complete understanding of the document. Examples: MIME
         multipart/alternative.

         EXAMPLES:

         NOTE: NONE of the sublanguage codes shown in this document
         have actually been assigned; they are used for illustration
         purposes only.

         Norwegian official document, with parallel text in both
         official versions of Norwegian. Both versions are readable by
         all Norwegians.

           Content-Language: no-nynorsk, no-bokmaal

         Voice recording from the London docks

           Content-Language: en-cockney

         Document in Sami, which does not have an ISO 639 code, and is
         spoken in several countries, but with about half the speakers
         in Norway

           Content-Language: iana-sami






Alvestrand                  Expires Nov 94                    [Page 5]

draft                        Language Tag                       May 94


         An English-French dictionary

           Content-Language: en, fr (This is a dictionary)

         An official EC document (in a few of its official languages)

           Content-Language: en, fr, de, da, el, it

         An excerpt from Star Trek dialogue

           Content-Language: x-klingon


    3.  Usage examples

    Examples of protocol usage of this header are:


    -    WWW selection of an appropriate version of information for
         display, based on a profile for the user listing languages
         that are understood

    -    MIME usage of alternate body parts in E-mail


    4.  The differences parameter to multipart/alternative

    As defined in RFC 1541, Multipart/Alternative only has one
    parameter: boundary.

    The common usage of Multipart/Alternative is to have more than one
    format of the same message (f.ex. PostScript and ASCII).

    The use of language tags to differentiate between different
    alternatives will certainly not lead all MIME UAs to present the
    most sensible body part as default.

    Therefore, a new parameter is defined, to allow the configuration
    of MIME readers to handle language differences in a sensible
    manner.

    Name: Differences
    Value: One or more of
         Content-Type





Alvestrand                  Expires Nov 94                    [Page 6]

draft                        Language Tag                       May 94


         Content-Language

    Further values can be registered with IANA; it must be the name of
    a header for which a definition exists in a published document.
    If not present, Difference=Content-Type is assumed.

    The intent is that the MIME reader can look at these headers of
    the message component to do an intelligent choice of what to
    present to the user, based on knowledge about the user preferences
    and capabilities.

    (The intent of having registration with IANA of the fields used in
    this context is to maintain a list of usages that a mail UA may
    expect to see, not to reject usages)

    (NOTE: The MIME specification [RFC 1521], section 7.2, states that
    headers not beginning with "Content-" are generally to be ignored
    in body parts. People defining a header for use with "difference="
    should take note of this)

    The mechanism for deciding which body part to present is outside
    the scope of this document.

    MIME EXAMPLE:

    Content-Type: multipart/alternative; difference=Content-Language;
              boundary="limit"
    Content-Language: en, fr

    --limit
    Content-Language: fr

    --limit
    Content-Language: en

    --limit--

    When composing a message, the choice of sequence may be somewhat
    arbitary. However, non-MIME mail readers will show the first body
    part first, meaning that this should most likely be the language
    understood by most of the recipients.








Alvestrand                  Expires Nov 94                    [Page 7]

draft                        Language Tag                       May 94


    5.  Security considerations

    Security considerations are not considered in this memo


    6.  Character set considerations

    Codes are always US-ASCII. The issue of deciding upon the
    rendering of a character set based on the language encoding is not
    addressed in this memo; however, the author cautions against
    thinking that such a decision can be made correctly for all cases
    (for example, a rendering engine that decides font based on
    Japanese or Chinese language will fail to work when a mixed
    Japanese-Chinese text is encountered)


    7.  Gatewaying considerations

    RFC 1327 defines a Language: header. This header is not
    recommended now, because it is defined to be a single 2-letter
    language code, and the X.400 header it is supposed to gateway is a
    list of language codes.

    It is suggested that RFC 1327 be updated to produce the Content-
    Language: header, and to turn this header into the ISO/CCITT
    specified Language components rather than the RFC-822-headers
    heading extension.


    8.  References


    [ISO 639]
          ISO 639:1988 (E/F) - Code for the representation of names of
         languages - The International Organization for
         Standardization, 1st edition, 1988 17 pages Prepared by
         ISO/TC 37 - Terminology (principles and coordination)


    [ISO 3166]
         ISO 3166:1988 - Codes for the representation of names of
         countries







Alvestrand                  Expires Nov 94                    [Page 8]

draft                        Language Tag                       May 94


    [RFC 1521]
         MIME Part One: Mechanisms for Specifying and Describing the
         Format of Internet Message Bodies - Borenstein and Freed -
         September 1993


    [RFC 1327]
         Mapping between X.400(1988) / ISO 10021 and RFC 822 - Kille -
         May 1992








































Alvestrand                  Expires Nov 94                    [Page 9]