spf-discuss
[Top] [All Lists]

[spf-discuss] Re: Error in RFC4408: URL encoding

2006-10-01 07:48:32
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank Ellermann wrote:
wayne wrote:
it looks like we should never have mentioned "uric" at all.  Is
"unreserved" the correct set to use?

| 3986 UNRESERVED3: ALNUM           - . _ ~

With that we'd percent-encode everything minus LDH and "._~".

For Mail::SPF, I decided to do exactly that:

| use constant uri_unreserved_chars   => 'A-Za-z0-9\-._~';
|     # "unreserved" characters according to RFC 3986 -- not the "uric"
|     # chars!  This deliberately deviates from what RFC 4408 says.
|     # This is a bug in RFC 4408.

Characters that must be percent-encoded are CTL and SP.

And I thought that  " < > \ ^ ` { | } also "must" be encoded.  But RFC
4622 happily allows almost everything unencoded, the "must" depends on
the scheme, and the part of the URI. 

Right.  But remember that, within the SPF spec, URL-escaping AKA 
percent-encoding is only applied for upper-case macro expansion, which 
itself is usually only used in "exp=" explanation string expansion, e.g.

  exp  TXT  "http://www.%{d2}/why.html?s=%{S}&i=%{I}";

I that case, you don't want the expansion of %{S} to include any ? & % # 
characters, because that would clash with the other, literal ? & #s and 
any literal % percent codes in the raw explanation string.

The intention of 8.1/26 ("Uppercased macros...") is not merely to produce 
syntactically valid URLs, but to protect the expanded values of macros 
from interpretation by HTTP clients (e.g. web browsers) and client- or 
server-side CGI libraries.

The characters + / , ; = are affected by potentially unwanted interpreta- 
tion by HTTP clients and CGI libs for similar reasons.  I don't know 
about ! $ ' ( ) * @ [ ] -- those might actually be safe, but you never 
know what archaic implementations out there go crazy about them.

#-fragments are now (3986) considered as part of the URL.  But any #
before the fragment has to be encoded.  Unless you parse the URL
(depending on the scheme, some schemes have no concept of fragment) you
can't know what to do with a #.

The other difference betweeen 2396 and 3986 are [ and ], that's for
IPv6-literals in the "authority" part (host, port, etc.) of an URL. 
Unless you parse the URL (see above, some schemes have no concept of
authority, e.g. im:, pres:, mailto:)

You are always free to use literal # [ ] (and other non-"unreserved" 
characters) _literally_ in your explanation string:

  exp  TXT  "http://www.%{d2}/why.html?s=%{S}&i=[%{I}]#fragment";

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFH9T8wL7PKlBZWjsRAjbaAJ4miE/GJcN2lbHUDMAid0BBPiChUACfR+Nj
9p2xHgcfSrczKKhKK6HBICw=
=K/69
-----END PGP SIGNATURE-----

-------
Sender Policy Framework: http://www.openspf.org/
Archives at http://archives.listbox.com/spf-discuss/current/
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to 
http://v2.listbox.com/member/?listname=spf-discuss(_at_)v2(_dot_)listbox(_dot_)com