Re: Cacheing and mhonarc indexes

1998-02-19 16:19:38
Earl Hood wrote:

Basically, an index page is updated whever the archive is modified,
but if a user has cached an earlier copy of the index, that won't
refclect the changes.  So, every time a user looks at the index,
should he/she reload it?

When no Expires or Cache-Control header is sent by the server to the
client, the behavior of the client will depend on how it is
configured.  For example, IE and Netscape can be configured to not
check if page changed unless a new session is started (i.e.  browser is
exited and restarted), never, or every time.

Or is the http protocol clever enough to ensure that if a page is
requested fom a proxy server that returns a cached copy, the proxy
will first ensure that the cached page is up-to-date, and request a
fresh copy if necessary?

I think it depends on how the request proxy server is configured
to deal with data not given an expiration.

I was wondering about using the "expires" META tag to expire pages.
But unfortunately, from what I understand of the specs, this tag only
allows specification of an absilute date, not of an offset, which is
a bit limiting.    I would like to be able to specify a 24 hours"
value for "expires", because whilst an absolute day of now+24hours
would work fine whilst the archives are current, I don't like to
think of the ugliness of indexes to old periods being
permanently stamped as "expired".

You may want to see if you can specify a Cache-Control directive in a
META tag and see if it will work.  It does assume that the client
supports HTTP/1.1 and that Cache-Control directive is honored if in a
META tag.  Check the HTTP/1.1 spec on more information about
Cache-Control.  There seems to be no relative time specification
for Expires.

BTW, if you get something that does work, post your solution to
this group.  This topic looks worthy enough to include in the FAQ.

the following i found on microsoft's site:

Summary: the client has control over cacheing.  It is in their browser
I am still looking into this.  If I find more/different information I
will post.
To cache or not to cache
Dear Web Men: 

I am looking for a solution that will minimize the problem of lost ad
revenue due to page caching. I have seen references to the following

<meta http-equiv="pragma" content="no-cache"> 

<meta http-equiv="expires" content="0"> 

<meta name="expires" content="Wed, 01 Jan 1997 00:00:00 GMT"> 

Will any of these solutions work? 

Thanks in advance for your efforts. 

David Cost 

The Web Men reply: 

Well, David, this turned out to be a very interesting question. With
pipes in hand, we donned our Sherlock Holmes hats and went out snooping
for an answer. The general answer is, yes, it is possible to prevent
caching; unfortunately, the ways you have listed aren't exactly best to
accomplish the purpose. Indeed, a lot of folks have run across the same
references, but the fact is that you must be connected to a secure site
to achieve what you are looking for. For our discussion, we have
provided some general information on caching and some links to
information on working with secure sites. Hopefully, all of this
together will get you moving in the right direction. 

The term "caching" means different things to different folks. One might
look at caching as the processes of a browser looking into the cache,
which is a directory on your hard drive, to retrieve data when browsing
the Internet. If you have recently visited a page on the Internet and
you return to it, the browser can check this cache directory, and, if
none of the data has changed since your last visit, can get the data
from your hard drive -- thereby speeding up the access time. 

With Internet Explorer   you have control over how the caching will
work. Other browsers can handle caching a little differently, so we
encourage you to refer to your preferred browser's documentation for
more information. As for Internet Explorer, you can control caching in
Internet Explorer 4.0 by selecting Internet Options from the View Menu;
on the General Tab, under Temporary Internet Files, click the Settings
Button. (In Internet Explorer 3.0, choose Options from the View Menu;
click the Advanced Tab, and, under Temporary Internet Files, click the
Settings Button.) You have three options, which correspond to the
following processes: "Always", "Never" and "Once". 

"Always" means that Internet Explorer checks the cache and, if the page
is found, will check with the server to see if the page has changed. If
it has, Internet Explorer gets the page from the server; otherwise, it
pulls the cached copy of the page on your hard drive. 

"Never" means that Internet Explorer checks the cache and, if the page
is found, does not bother checking with the server; it automatically
pulls your cached copy of the page. 

"Once" means Internet Explorer checks the cache the first time a page is
browsed during that Internet Explorer session. If the page is found,
Internet Explorer checks with the server once to see if the page has
changed; however, for the rest of the current session that page is
handled as if the "Never" option had been set. 

Another way to regard "caching" is the way in which David is referring
to it above -- a way to prevent the browser from storing data, or
"caching", to the hard drive. The reason one would want to do this is to
prevent any type of secure data from residing on the hard drive during
or after the browser session is completed. 

The idea of preventing the browser from placing incoming data on the
hard drive is something that is handled by both the server and client.
First, the server must be set up to handle secure connections through a
technology such as the Secure Socket Layer (SSL). Then you must connect
to that server via "HTTPS" to initiate the secure connection. This will
allow a "pragma: no-cache" header to be passed to the server, preventing
data from being stored on the hard drive. You can find a wealth of
information on this subject in the Programming section of Site Builder
Workshop and at the Microsoft Security Advisor   site. 

We should stress again that none of the <META> tags suggested will
prevent data from being placed on the hard drive; this is only something
for a secure connection. You should also use caution when using <META>
tags because they may not always be handled as one might think. As the
HTML specification RFC1866,  mentions, not all headers will work as a
<META> tag: 

"HTTP servers may read the content of the document <HEAD> to generate
header fields corresponding to any elements defining a value for the
attribute HTTP-EQUIV. 

"NOTE - The method by which the server extracts document
meta-information is unspecified and not mandatory. The <META> element
only provides an extensible mechanism for identifying and embedding
document meta-information -- how it may be used is up to the individual
server implementation and the HTML user agent." 

With Internet Explorer, <META> tags are processed after the data has
been rendered to the hard drive, thereby defeating our purpose. For more
information on this RFC and HTML specifications, see the World Wide Web
Consortium   site.