Re: the gap regarding Archived-At

>1. There really are significant differences between
>    a) how email clients would use archived-at,
>    b) how humans would use it to cut-and-paste into HTML, and
>    c) how web browsers would use it.

I'm not really sure about the difference between b) and c), or
about c) as such. If you put an URI into an href attribute in
HTML, then ultimately, it will be used by a browser, anyway.

actually I miswrote that. I'm not sure there's a significantdifference between b and c either - just that there's a significantdifference between a and either b or c.

>2. For archived-at to be useful by email clients generally requires
>    one or two of the following, in addition to the obvious client

> support for the protocol and server support for the messageformat:

>
>    - mail archives that support IMAP access (or possibly NNTP)

My understanding (not being an IMAP expert) is that this would
not be too difficult. However, I can't see W3C to do that any
time soon.

I suspect the easiest way to do it would be to have the archiver storemessages in maildir format; then just use an IMAP server that supportsmaildir and read-only access.

> - a specification for making collections of mail messagesavailable

>      via HTTP (maybe WebDav) and/or FTP

I'm not sure I understand this. List-Archive: points to a full
list archive. Archived-At: points to a single message. It seems
to me you either have the two confused, or what you want is really
a format (or similar) that not only contains the actual message
(message/rfc822 or equivalent), but also some additional context

(that WebDav may be able to give you with some properties, forexample).

That's it. I want to be able to refer to a single message in such away that the surrounding context can also be found. List-Archive (andthe other List-* fields) have some of the same problems as Archived-Atin that the contents can be arbitrary and there's no way for the mailreader to use them. It's partially because of problems I've observedwith List-* fields that I'm expressing concerns over Archived-At.

Please note that any kind of inference from the actual URI to
other URIs should be avoided, i.e. it would be a bad idea to
say that if Archived-At contained
   http://www.imc.org/ietf-822/mail-archive/msg05043.html
we may assume that things like
   http://www.imc.org/ietf-822/mail-archive/msg05042.html
and
   http://www.imc.org/ietf-822/mail-archive/msg05044.html
are somewhat related (by being in the same mailing list,
or whatever).

I share that concern to some degree. Which is why I would have akeyword in Archived-At that says "you can assume that other files inthe same directory/collection/whatever are also part of the samearchive". I don't think the MUA should assume this without an explicitindication.

>3. The obvious compromise that makes sense in the short term (let
>    archives be in other formats besides message/rfc822, and don't
>    require message/rfc822 support) is harmful in the long term.

Why? If there is uptake on mailers with such support, it's not
too difficult to also upgrade the servers.

if, by the time mailers get upgraded to support that, there are a lotof messages that have Archived-At pointing to archives that aren'tusable by mail readers, the feature is too awkward/dysfunctional touse. either it always launches a web browser, or it's only supportedby mailers that have built-in web browsers. it may also be too late toupgrade archives that were stored only in HTML.

>Best compromise I see at this point:  Define some sort of
>keywords for archived-at.  e.g.
>
>Archived-at: "<" URI ">" *(";" keyword [ "=" value ] )
>
>where keywords might include
>
>"native"  message available in native message/rfc822 format
>            (either because that's the only format available
>                 at this URI or via content-negotiation)
>
>"collection"      other messages in the same collection are also
>            accessible, where the collection is defined by
>            the IMAP folder, NNTP newsgroup, FTP directory,
>            WebDav collection, etc. indicated by the URI.
>
>            the value associated with this keyword would
>            indicate the name of the collection associated
>            with the URI (since a message might appear in
>            more than one collection)

Ah, I see, so there would essentially be two URIs, one for
the message itself, and one for the collection.


not what I was proposing.  here's a concrete example

Archived-At:<ftp://ftp.cs.utk.edu/pub/moore/mail-archives/ietf-822/20041030.2214>;

 native; collection=ietf-822

which would indicate (a) the message is available in rfc822 format (andperhaps others), (b) other messages in the "rfc822" collection couldalso be accessed (in this case by doing a LIST or whatever of the ftpdirectory).

I'd rather avoid having two URIs, though maybe it is cleaner overall.(would WebDav need two URIs? it's been too long...)

>This would (a) let tools record locations of ordinary HTTP/HTMLarchives
>such as are (too) often the only format available today,

What would the keyword be for that case?

the absence of a keyword could mean "you can make no assumptions aboutthe content-type of this resource". if desired an "html" keyword couldbe added. (or we could go back to Arnt Gulbrandsen's suggestion ofhaving a content-type parameter, which is looking better to me now thanit did when he proposed it.)

>(b) encourage
>archives to provide messages in their original format withoutrequiring
>them to do so, (c) give implementors a hint that better functionality
>can be had, (d) give email readers a clue as to whether they could
>actually make use of the URI internally or whether they needed to pass
>it to a separate browser (this is a common problem in HTML also)
Well, this is solved on the level of media types, not URIs, forbrowsers.

and the implication is that anything that uses URIs needs to be able tohandle whatever comes back. this often loses if "whatever comes back"can't be handled by the browser and whatever program is launched todeal with "whatever comes back" needs information about the context.for instance I've seen links in PDF documents that couldn't be followed(apparently) because the PDF viewer didn't know what URI the documentwas retrieved from.

>and (e)
>leave room for expanded functionality without needing to define a new
>header field.  And if you don't use any keywords, they don't take up
>space.
>
>It might be better to defer the definition of any keywords to a
>separate document, because I can imagine needing to define specifics
>of how collections look within the context of various protocols.
>I can also imagine needing to define things like collections that
>consist of a directory of several mbox files, each containing
>multiple messages.

If that would mean that the only change to my current draft
would be something like "there may be keywords, but currently,
there are none defined", then I probably could live with that.
But I'm still not convinced that we need keywords, things
should work without them.

I think there's pretty good experience that things don't work withoutthem, at least, not without requiring the email reader to also be a webbrowser. even then you can get into a conflict between what the emailreader can do and users' preferences - what if the user wants HTMLdocuments to be handled by a web browser? does he then lose theability to handle relative links in HTML documents initially linked tofrom email?


Keith