ietf-822
[Top] [All Lists]

Re: the gap regarding Archived-At

2004-10-29 04:46:14

>1. There really are significant differences between
>    a) how email clients would use archived-at,
>    b) how humans would use it to cut-and-paste into HTML, and
>    c) how web browsers would use it.

I'm not really sure about the difference between b) and c), or
about c) as such. If you put an URI into an href attribute in
HTML, then ultimately, it will be used by a browser, anyway.

actually I miswrote that. I'm not sure there's a significant difference between b and c either - just that there's a significant difference between a and either b or c.

>2. For archived-at to be useful by email clients generally requires
>    one or two of the following, in addition to the obvious client
> support for the protocol and server support for the message format:
>
>    - mail archives that support IMAP access (or possibly NNTP)

My understanding (not being an IMAP expert) is that this would
not be too difficult. However, I can't see W3C to do that any
time soon.

I suspect the easiest way to do it would be to have the archiver store messages in maildir format; then just use an IMAP server that supports maildir and read-only access.

> - a specification for making collections of mail messages available
>      via HTTP (maybe WebDav) and/or FTP

I'm not sure I understand this. List-Archive: points to a full
list archive. Archived-At: points to a single message. It seems
to me you either have the two confused, or what you want is really
a format (or similar) that not only contains the actual message
(message/rfc822 or equivalent), but also some additional context
(that WebDav may be able to give you with some properties, for example).

That's it. I want to be able to refer to a single message in such a way that the surrounding context can also be found. List-Archive (and the other List-* fields) have some of the same problems as Archived-At in that the contents can be arbitrary and there's no way for the mail reader to use them. It's partially because of problems I've observed with List-* fields that I'm expressing concerns over Archived-At.

Please note that any kind of inference from the actual URI to
other URIs should be avoided, i.e. it would be a bad idea to
say that if Archived-At contained
   http://www.imc.org/ietf-822/mail-archive/msg05043.html
we may assume that things like
   http://www.imc.org/ietf-822/mail-archive/msg05042.html
and
   http://www.imc.org/ietf-822/mail-archive/msg05044.html
are somewhat related (by being in the same mailing list,
or whatever).

I share that concern to some degree. Which is why I would have a keyword in Archived-At that says "you can assume that other files in the same directory/collection/whatever are also part of the same archive". I don't think the MUA should assume this without an explicit indication.

>3. The obvious compromise that makes sense in the short term (let
>    archives be in other formats besides message/rfc822, and don't
>    require message/rfc822 support) is harmful in the long term.

Why? If there is uptake on mailers with such support, it's not
too difficult to also upgrade the servers.

if, by the time mailers get upgraded to support that, there are a lot of messages that have Archived-At pointing to archives that aren't usable by mail readers, the feature is too awkward/dysfunctional to use. either it always launches a web browser, or it's only supported by mailers that have built-in web browsers. it may also be too late to upgrade archives that were stored only in HTML.

>Best compromise I see at this point:  Define some sort of
>keywords for archived-at.  e.g.
>
>Archived-at: "<" URI ">" *(";" keyword [ "=" value ] )
>
>where keywords might include
>
>"native"  message available in native message/rfc822 format
>            (either because that's the only format available
>                 at this URI or via content-negotiation)
>
>"collection"      other messages in the same collection are also
>            accessible, where the collection is defined by
>            the IMAP folder, NNTP newsgroup, FTP directory,
>            WebDav collection, etc. indicated by the URI.
>
>            the value associated with this keyword would
>            indicate the name of the collection associated
>            with the URI (since a message might appear in
>            more than one collection)

Ah, I see, so there would essentially be two URIs, one for
the message itself, and one for the collection.

not what I was proposing.  here's a concrete example

Archived-At: <ftp://ftp.cs.utk.edu/pub/moore/mail-archives/ietf-822/20041030.2214>;
 native; collection=ietf-822

which would indicate (a) the message is available in rfc822 format (and perhaps others), (b) other messages in the "rfc822" collection could also be accessed (in this case by doing a LIST or whatever of the ftp directory).

I'd rather avoid having two URIs, though maybe it is cleaner overall. (would WebDav need two URIs? it's been too long...)

>This would (a) let tools record locations of ordinary HTTP/HTML archives
>such as are (too) often the only format available today,

What would the keyword be for that case?

the absence of a keyword could mean "you can make no assumptions about the content-type of this resource". if desired an "html" keyword could be added. (or we could go back to Arnt Gulbrandsen's suggestion of having a content-type parameter, which is looking better to me now than it did when he proposed it.)

>(b) encourage
>archives to provide messages in their original format without requiring
>them to do so, (c) give implementors a hint that better functionality
>can be had, (d) give email readers a clue as to whether they could
>actually make use of the URI internally or whether they needed to pass
>it to a separate browser (this is a common problem in HTML also)

Well, this is solved on the level of media types, not URIs, for browsers.

and the implication is that anything that uses URIs needs to be able to handle whatever comes back. this often loses if "whatever comes back" can't be handled by the browser and whatever program is launched to deal with "whatever comes back" needs information about the context. for instance I've seen links in PDF documents that couldn't be followed (apparently) because the PDF viewer didn't know what URI the document was retrieved from.

>and (e)
>leave room for expanded functionality without needing to define a new
>header field.  And if you don't use any keywords, they don't take up
>space.
>
>It might be better to defer the definition of any keywords to a
>separate document, because I can imagine needing to define specifics
>of how collections look within the context of various protocols.
>I can also imagine needing to define things like collections that
>consist of a directory of several mbox files, each containing
>multiple messages.

If that would mean that the only change to my current draft
would be something like "there may be keywords, but currently,
there are none defined", then I probably could live with that.
But I'm still not convinced that we need keywords, things
should work without them.

I think there's pretty good experience that things don't work without them, at least, not without requiring the email reader to also be a web browser. even then you can get into a conflict between what the email reader can do and users' preferences - what if the user wants HTML documents to be handled by a web browser? does he then lose the ability to handle relative links in HTML documents initially linked to from email?

Keith


<Prev in Thread] Current Thread [Next in Thread>