Re: multipart/mixed filter for netscape mail

1998-07-27 16:43:45
On July 27, 1998 at 14:34, Joe Brennan wrote:

    ## Get/remove BASE url
    if ($data =~ s%(<base\s[^>]*>)%%i) {
        $tmp = $1;
        ($base) = $tmp =~ m%href\s*=\s*['"]([^'"]+)['"]%i;
        $base =~ s%(.*/).*%$1%;
    } elsif (defined($tmp = $fields{'content-base'}) ||
             defined($tmp = $fields{'content-location'})) {
        $base = $tmp;
        $base =~ s/['"]//g;
        $base =~ s%(.*/).*%$1%;

If you do any variation of
      ($base) = $tmp =~ s/['"]//g;
$base gets set to the exit value of the s/ function.  This one's got
me before.  Found by printing out what the variables were getting set to.

Another typo on my part.  The following will work:

        ($base = $tmp) =~ s/['"]//g;

This is what I meant.

I also put the last $base = inside the if/elsif so $base remains
undefined if neither is true.

Having it outside of the conditionals should be okay since if
$base is not set, it is just a noop ($base is initilaized to "").

Note.  According to RFC 2110, Content-Location is the page being
included in the mail, expressed either as an absolute URL or as a
relative URL to Content-Base or a BASE= tag.  Content-Base, like the
BASE= tag, is not a complete URL but the path to be prepended to other
relative URLs.  But when Netscape 4.05 does "send page" it puts the
absolute URL of the page being mailed into Content-Base, which is
wrong, to my understanding of RFC 2110 from a couple of readings.
Luckily, MHonArc handles this because it already knows to strip off
from the last "/" to the end.  Good!  Don't change that.

If Content-Base is to mirror the BASE HTML element, then including the
document as part of the URI is valid since the BASE HTML element can be
used to identify the "authoritative" location of the document.  And
base has been used this way since the early days (and the HTML 4.0 spec
confirms this).  Hence, MHonArc always stripped the trailing component
(as browsers do).

The RFC (2110) does not state explicit if Content-Base is to behave
exactly as the same as the BASE HTML element when Content-Base is
described, but I think that is the intent.  This intent is confirmed
later in the RFC by the following:

   If there is a Content-Base header, then the recipient MUST employ
   relative to absolute resolution as defined in RFC 1808 [RELURL] of
   relative URIs in both the HTML markup and the Content-Location header
   before matching a hyperlink in the HTML markup to a Content-Location
   header. The same applies if the Content-Location contains an absolute
   URI, and the HTML markup contains a BASE element so that relative
   URIs in the HTML markup can be resolved.

If you check RFC 1808, there is rule about removing the last segment from
the base (Step 6).


             Earl Hood              | University of California: Irvine
      ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu      |      Electronic 
Loiterer | Dabbler of SGML/WWW/Perl/MIME

<Prev in Thread] Current Thread [Next in Thread>