(slightly red-faced: YES, I forgot to include it on the previous mail....)
draft Language Tag May 94
Language tags for MIME content portions
Fri May 13 09:50:25 MET DST 1994
Harald Tveit Alvestrand
UNINETT
Harald(_dot_)Alvestrand(_at_)uninett(_dot_)no
Abstract
This document describes a Content-Language: header for use with
body parts of MIME.
It also describes a new parameter to the Multipart/Alternative
type, to aid in the usage of the Content-Language: header.
Status of this Memo
This draft document is being circulated for comment.
If consensus is reached it may be submitted to the RFC editor as a
Proposed Standard protocol specificiation.
Please send comments to the author, or to the IETF-822 mailing
list <ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu>
The following text is required by the Internet-draft rules:
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its
Areas, and its Working Groups. Note that other groups may also
distribute working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use
Internet Drafts as reference material or to cite them other than
Alvestrand Expires Nov 94 [Page 1]
draft Language Tag May 94
as a "working draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other
Internet Draft.
The filename of this document is draft-alvestrand-language-tag-
01.txt
Alvestrand Expires Nov 94 [Page 2]
draft Language Tag May 94
1. The Language tag
The language tag is composed of 2 parts: A language tag and a
subtag.
The syntax of this header in RFC-822 EBNF is:
Language-Header = "Content-Language" ":" 1#Language
Language ::= 1*8ALPHA [ '-' 1*8ALPHA ]
Note that the Language-Header is allowed to list several languages
in a comma-separated list.
All tags are to be treated as case insensitive; there exist
conventions for capitalization of some of them, but these should
not be taken to carry meaning.
The namespace of language tags and subtags is administered by the
IANA. The following registrations are predefined:
In the language tag:
- All 2-letter codes are interpreted according to ISO 639.
- All 3-letter codes are reserved for a (hopefully) forthcoming
extension to ISO 639
- The value "IANA" is reserved for IANA-defined
subregistrations
- The value "X" is reserved for private use. Subtags of "X"
will not be registered by the IANA.
- No other registration is allowed.
In the sublanguage tag:
- All 2-letter codes are interpreted as ISO 3166 country codes,
according to the rules laid down in ISO 639.
Alvestrand Expires Nov 94 [Page 3]
draft Language Tag May 94
- Codes of 3 to 8 letters may be registered with the IANA by
anyone who feels a need for it. IANA has the right to reject
registrations that are felt to be misleading.
The information in the sublanguage tag may for instance be:
- Country identification, such as en-US (this usage is
described in ISO 639)
- Dialect information, such as no-NYNORSK or en-COCKNEY
- Languages not listed in ISO 639, which can be registered with
the IANA prefix, such as IANA-CHEROKEE
If multiple languages are used in the MIME body part, they are
listed with commas between them.
NOTE: The ISO 639/ISO 3166 convention is that language names are
written in lower case, while country codes are written in upper
case. This convention is recommended, but not enforced; the tags
are case insensitive.
NOTE: ISO 639 defines a registration authority for additions to
and changes in the list of languages in ISO 639. This authority
is:
International Information Centre for Terminology (Infoterm)
P.O. Box 130
A-1021 Wien
Austria
Phone: +43 1 26 75 35 Ext. 312
Fax: +43 1 216 32 72
The following codes have been added in 1989 (nothing later): ug
(Uigur), iu (Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi
(Yiddish, replacing ji), and id (Indonesian, replacing in).
Alvestrand Expires Nov 94 [Page 4]
draft Language Tag May 94
2. MEANING
The meaning of the header is:
- For a single information object, it should be taken as the
set of languages that is required for a complete
comprehension of the complete object. Examples: Simple text.
- For an aggregation of information object, it should be taken
as the set of languages used inside components of that
aggregation. Examples: Document stores and libraries.
- For information objects whose purpose in life is providing
alternatives, it should be regarded as a hint that the
material inside is provided in several languages, and that
one has to inspect each of the alternatives in order to find
its language or languages. In this case, multiple languages
need not mean that one needs to be multilingual to get
complete understanding of the document. Examples: MIME
multipart/alternative.
EXAMPLES:
NOTE: NONE of the sublanguage codes shown in this document
have actually been assigned; they are used for illustration
purposes only.
Norwegian official document, with parallel text in both
official versions of Norwegian. Both versions are readable by
all Norwegians.
Content-Language: no-nynorsk, no-bokmaal
Voice recording from the London docks
Content-Language: en-cockney
Document in Sami, which does not have an ISO 639 code, and is
spoken in several countries, but with about half the speakers
in Norway
Content-Language: iana-sami
Alvestrand Expires Nov 94 [Page 5]
draft Language Tag May 94
An English-French dictionary
Content-Language: en, fr (This is a dictionary)
An official EC document (in a few of its official languages)
Content-Language: en, fr, de, da, el, it
An excerpt from Star Trek dialogue
Content-Language: x-klingon
3. Usage examples
Examples of protocol usage of this header are:
- WWW selection of an appropriate version of information for
display, based on a profile for the user listing languages
that are understood
- MIME usage of alternate body parts in E-mail
4. The differences parameter to multipart/alternative
As defined in RFC 1541, Multipart/Alternative only has one
parameter: boundary.
The common usage of Multipart/Alternative is to have more than one
format of the same message (f.ex. PostScript and ASCII).
The use of language tags to differentiate between different
alternatives will certainly not lead all MIME UAs to present the
most sensible body part as default.
Therefore, a new parameter is defined, to allow the configuration
of MIME readers to handle language differences in a sensible
manner.
Name: Differences
Value: One or more of
Content-Type
Alvestrand Expires Nov 94 [Page 6]
draft Language Tag May 94
Content-Language
Further values can be registered with IANA; it must be the name of
a header for which a definition exists in a published document.
If not present, Difference=Content-Type is assumed.
The intent is that the MIME reader can look at these headers of
the message component to do an intelligent choice of what to
present to the user, based on knowledge about the user preferences
and capabilities.
(The intent of having registration with IANA of the fields used in
this context is to maintain a list of usages that a mail UA may
expect to see, not to reject usages)
(NOTE: The MIME specification [RFC 1521], section 7.2, states that
headers not beginning with "Content-" are generally to be ignored
in body parts. People defining a header for use with "difference="
should take note of this)
The mechanism for deciding which body part to present is outside
the scope of this document.
MIME EXAMPLE:
Content-Type: multipart/alternative; difference=Content-Language;
boundary="limit"
Content-Language: en, fr
--limit
Content-Language: fr
--limit
Content-Language: en
--limit--
When composing a message, the choice of sequence may be somewhat
arbitary. However, non-MIME mail readers will show the first body
part first, meaning that this should most likely be the language
understood by most of the recipients.
Alvestrand Expires Nov 94 [Page 7]
draft Language Tag May 94
5. Security considerations
Security considerations are not considered in this memo
6. Character set considerations
Codes are always US-ASCII. The issue of deciding upon the
rendering of a character set based on the language encoding is not
addressed in this memo; however, the author cautions against
thinking that such a decision can be made correctly for all cases
(for example, a rendering engine that decides font based on
Japanese or Chinese language will fail to work when a mixed
Japanese-Chinese text is encountered)
7. Gatewaying considerations
RFC 1327 defines a Language: header. This header is not
recommended now, because it is defined to be a single 2-letter
language code, and the X.400 header it is supposed to gateway is a
list of language codes.
It is suggested that RFC 1327 be updated to produce the Content-
Language: header, and to turn this header into the ISO/CCITT
specified Language components rather than the RFC-822-headers
heading extension.
8. References
[ISO 639]
ISO 639:1988 (E/F) - Code for the representation of names of
languages - The International Organization for
Standardization, 1st edition, 1988 17 pages Prepared by
ISO/TC 37 - Terminology (principles and coordination)
[ISO 3166]
ISO 3166:1988 - Codes for the representation of names of
countries
Alvestrand Expires Nov 94 [Page 8]
draft Language Tag May 94
[RFC 1521]
MIME Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies - Borenstein and Freed -
September 1993
[RFC 1327]
Mapping between X.400(1988) / ISO 10021 and RFC 822 - Kille -
May 1992
Alvestrand Expires Nov 94 [Page 9]