ietf-xml-mime
[Top] [All Lists]

MIME types and fragment identifiers in HTML and XML

1999-10-21 00:35:00
Here is my understanding of MIME types and fragment identifiers.  I think 
that the semantics of "type" attribute in HTML 4.0 [1] and "Associating 
Style Sheets with XML documents " [2] are unclear.


1. URI and URI reference

A URI does not have a fragment identifier, but a URI reference (as defined by 
RFC 2396) may have a fragment identifier.  (Note: HTML and XML very often 
say "URI" when it should actually say "URI reference".).  

1) URI

An entity is returned by some protocol  (here, the word "entity" is used as 
in RFC 2616).  The protocol should provide some mechanism for transmitting or 
inferring media type. In HTTP and email, this is done explicitly with the  
'content-type' header.  

2) URI reference

A URI is first constructed.  An entity is returned or accessed interactively 
by some protocol.  The protocol should indicate the media type of the entity.  
Then, the user agent for this media type may extract or locate some fragment 
of this entity by using the fragment identifier.

The protocol does not indicate the media type for that fragment.   Thus, it 
does not have content types, unless the fragment contains some other way of 
specifying media types. (For example, RFC 2397, "data" URL scheme, provides 
a way of including MIME content-type along with encoded data.)

2.  Media types specified by HTML or XML language constructs

HTML and XML provides many constructs which specifies both an URI reference 
and a media type.  The HTML and XML specifications are rather silent about 
the intended semantics.

1) URI

One could argue that the specified media type is used when the protocol does 
not indicate the content type of the entity.  One could even argue that 
the specified media type always override the content type indicated by the 
protocol.  (Note: Many implementations fail to indicate media types correctly.) 

One could also argue that the specified media type is used to predict or 
restrict the content type of the desired entity.  That is, if an A link 
contains 
a 'type' attribute and the resulting URI returns an entity with a different 
content-type, then an error has occurred.

This isn't so different as getting a '404 not found'.  Something happened which 
wasn't expected. There are various ways of recovering, but any attempt to 
override 
one piece of MIME data with something that's "fresher" and more authoritative 
seems wrong.

2) URI reference

If a construct in HTML or XML specifies a URI reference containing a fragment 
identifier, the construct also specifies a media type, and the protocol  
indicates the content type of the entity, what will happen?

One could argue that the specified media type is used for the desired fragment, 
unless the fragment contains some other way of specifying media types.

One could also argue that the fragment must indicate the media type and that 
it must coincide with the media type specified by the HTML or XML construct 
(fragment).  
If the fragment does not explicitly specify the same media type, an error has 
occurred.


[1] HTML 4.0

(http://www.w3.org/TR/html40/types.html#h-6.7)

6.7 Content types (MIME types)

Note. A "media type" (defined in [RFC2045] and [RFC2046]) specifies
the nature of a linked resource. This specification employs the term
"content type" rather than "media type" in accordance with current
usage. Furthermore, in this specification, "media type" may refer to
the media where a user agent renders a document.

This type is represented in the DTD by %ContentType;.

Content types are case-insensitive.

Examples of content types include "text/html", "image/png",
"image/gif", "video/mpeg", "audio/basic", "text/tcl",
"text/javascript", and "text/vbscript". For the current list of
registered MIME types, please consult [MIMETYPES].

Note. The content type "text/css", while not currently registered with
IANA, should be used when the linked resource is a [CSS1] style sheet.

http://www.w3.org/TR/REC-html40/present/styles.html#h-14.2.3

14.2.3 Header style information: the STYLE element

<!ELEMENT STYLE - - %StyleSheet        -- style info -->
<!ATTLIST STYLE
  %i18n;                               -- lang, dir, for use with title --
  type        %ContentType;  #REQUIRED -- content type of style language --
  media       %MediaDesc;    #IMPLIED  -- designed for use with these media --
  title       %Text;         #IMPLIED  -- advisory title --
  >


Start tag: required, End tag: required

Attribute definitions

type = content-type [CI] 

This attribute specifies the style sheet language of the element's
contents and overrides the default style sheet language. The style
sheet language is specified as a content type (e.g.,
"text/css"). Authors must supply a value for this attribute; there is
no default value for this attribute.


[2] Associating Style Sheets with XML documents

http://www.w3.org/TR/xml-stylesheet/

The following pseudo attributes are defined

href CDATA #REQUIRED
type CDATA #REQUIRED
title CDATA #IMPLIED
media CDATA #IMPLIED
charset CDATA #IMPLIED
alternate (yes|no) "no"

The semantics of the pseudo-attributes are exactly as with <LINK
REL="stylesheet"> in HTML 4.0, with the exception of the alternate
pseudo-attribute. If alternate="yes" is specified, then the processing
instruction has the semantics of <LINK REL="alternate stylesheet">
instead of <LINK REL="stylesheet">.

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata(_dot_)makoto(_at_)fujixerox(_dot_)co(_dot_)jp