>1. There really are significant differences between
> a) how email clients would use archived-at,
> b) how humans would use it to cut-and-paste into HTML, and
> c) how web browsers would use it.
I'm not really sure about the difference between b) and c), or
about c) as such. If you put an URI into an href attribute in
HTML, then ultimately, it will be used by a browser, anyway.
actually I miswrote that. I'm not sure there's a significant
difference between b and c either - just that there's a significant
difference between a and either b or c.
>2. For archived-at to be useful by email clients generally requires
> one or two of the following, in addition to the obvious client
> support for the protocol and server support for the message
format:
>
> - mail archives that support IMAP access (or possibly NNTP)
My understanding (not being an IMAP expert) is that this would
not be too difficult. However, I can't see W3C to do that any
time soon.
I suspect the easiest way to do it would be to have the archiver store
messages in maildir format; then just use an IMAP server that supports
maildir and read-only access.
> - a specification for making collections of mail messages
available
> via HTTP (maybe WebDav) and/or FTP
I'm not sure I understand this. List-Archive: points to a full
list archive. Archived-At: points to a single message. It seems
to me you either have the two confused, or what you want is really
a format (or similar) that not only contains the actual message
(message/rfc822 or equivalent), but also some additional context
(that WebDav may be able to give you with some properties, for
example).
That's it. I want to be able to refer to a single message in such a
way that the surrounding context can also be found. List-Archive (and
the other List-* fields) have some of the same problems as Archived-At
in that the contents can be arbitrary and there's no way for the mail
reader to use them. It's partially because of problems I've observed
with List-* fields that I'm expressing concerns over Archived-At.
Please note that any kind of inference from the actual URI to
other URIs should be avoided, i.e. it would be a bad idea to
say that if Archived-At contained
http://www.imc.org/ietf-822/mail-archive/msg05043.html
we may assume that things like
http://www.imc.org/ietf-822/mail-archive/msg05042.html
and
http://www.imc.org/ietf-822/mail-archive/msg05044.html
are somewhat related (by being in the same mailing list,
or whatever).
I share that concern to some degree. Which is why I would have a
keyword in Archived-At that says "you can assume that other files in
the same directory/collection/whatever are also part of the same
archive". I don't think the MUA should assume this without an explicit
indication.
>3. The obvious compromise that makes sense in the short term (let
> archives be in other formats besides message/rfc822, and don't
> require message/rfc822 support) is harmful in the long term.
Why? If there is uptake on mailers with such support, it's not
too difficult to also upgrade the servers.
if, by the time mailers get upgraded to support that, there are a lot
of messages that have Archived-At pointing to archives that aren't
usable by mail readers, the feature is too awkward/dysfunctional to
use. either it always launches a web browser, or it's only supported
by mailers that have built-in web browsers. it may also be too late to
upgrade archives that were stored only in HTML.
>Best compromise I see at this point: Define some sort of
>keywords for archived-at. e.g.
>
>Archived-at: "<" URI ">" *(";" keyword [ "=" value ] )
>
>where keywords might include
>
>"native" message available in native message/rfc822 format
> (either because that's the only format available
> at this URI or via content-negotiation)
>
>"collection" other messages in the same collection are also
> accessible, where the collection is defined by
> the IMAP folder, NNTP newsgroup, FTP directory,
> WebDav collection, etc. indicated by the URI.
>
> the value associated with this keyword would
> indicate the name of the collection associated
> with the URI (since a message might appear in
> more than one collection)
Ah, I see, so there would essentially be two URIs, one for
the message itself, and one for the collection.
not what I was proposing. here's a concrete example
Archived-At:
<ftp://ftp.cs.utk.edu/pub/moore/mail-archives/ietf-822/20041030.2214>;
native; collection=ietf-822
which would indicate (a) the message is available in rfc822 format (and
perhaps others), (b) other messages in the "rfc822" collection could
also be accessed (in this case by doing a LIST or whatever of the ftp
directory).
I'd rather avoid having two URIs, though maybe it is cleaner overall.
(would WebDav need two URIs? it's been too long...)
>This would (a) let tools record locations of ordinary HTTP/HTML
archives
>such as are (too) often the only format available today,
What would the keyword be for that case?
the absence of a keyword could mean "you can make no assumptions about
the content-type of this resource". if desired an "html" keyword could
be added. (or we could go back to Arnt Gulbrandsen's suggestion of
having a content-type parameter, which is looking better to me now than
it did when he proposed it.)
>(b) encourage
>archives to provide messages in their original format without
requiring
>them to do so, (c) give implementors a hint that better functionality
>can be had, (d) give email readers a clue as to whether they could
>actually make use of the URI internally or whether they needed to pass
>it to a separate browser (this is a common problem in HTML also)
Well, this is solved on the level of media types, not URIs, for
browsers.
and the implication is that anything that uses URIs needs to be able to
handle whatever comes back. this often loses if "whatever comes back"
can't be handled by the browser and whatever program is launched to
deal with "whatever comes back" needs information about the context.
for instance I've seen links in PDF documents that couldn't be followed
(apparently) because the PDF viewer didn't know what URI the document
was retrieved from.
>and (e)
>leave room for expanded functionality without needing to define a new
>header field. And if you don't use any keywords, they don't take up
>space.
>
>It might be better to defer the definition of any keywords to a
>separate document, because I can imagine needing to define specifics
>of how collections look within the context of various protocols.
>I can also imagine needing to define things like collections that
>consist of a directory of several mbox files, each containing
>multiple messages.
If that would mean that the only change to my current draft
would be something like "there may be keywords, but currently,
there are none defined", then I probably could live with that.
But I'm still not convinced that we need keywords, things
should work without them.
I think there's pretty good experience that things don't work without
them, at least, not without requiring the email reader to also be a web
browser. even then you can get into a conflict between what the email
reader can do and users' preferences - what if the user wants HTML
documents to be handled by a web browser? does he then lose the
ability to handle relative links in HTML documents initially linked to
from email?
Keith