Pete Resnick wrote:
On 12/24/02 at 12:43 PM -0500, Bruce Lilly wrote:
That depends on the context. In the case of the List-Owner example and
a number of fairly widely-used MUAs, the URL is properly decoded when
using the List-Owner field content to generate a message to the list
owner, and some even decode the 2047 encoding for display.
Absolutely irrelevant. As far as 822/2822/2047 is concerned, List-Owner
does not contain a phrase.
[...]
Now,
the contents of a List-Owner field can be passed to a 1738/2396 parser
and, upon discovering a mailto: URL (as against any other kind of URL),
one might parse the URL according to 2368 and discover within the
scheme-specific part of the URL something that is a mailbox list. The
*result* could then be passed to a 822/2822 parser and *then* to a 2047
parser for display purposes.
We're saying the same thing different ways.
And as far as I can tell, you can't have anything in a URL in a
header field that a 2047 parser would recognize as a phrase or a
comment.
URLs can contain parentheses
But URLs can't contain "=" or "?", so a 2047 parser is not going to find
anything interesting in the parentheses.
Per 2396, an opaque_part can contain '='and/or '?', and opaque_part
can be part of an absoluteURI. Also, '?' can delimit a query as part
of a relativeURI, and a query can contain '=' and/or '?'.
Examples have been given in earlier messages in this thread; a
simplistic matching of header field text using regular expressions
might incorrectly match a "comment" where there is none.
But that wasn't the issue. The question that was being asked was whether
a simplistic regexp "parser" would accidentally find text which it
thought was 2047 syntax. Your claim was that such text could occur in a
field which contained a URL.
Yes, and I stand by that claim.
> Since "=" and "?" can't appear in a URL
according to 1738 and 2396, a 2047 parser should never be tripped up by
a URL in a header field.
See 2396 as referenced above.
If you can come up with a serious example of where one might find
something that looked like 2047 text in a field where it shouldn't find
any, I'd be significantly more concerned.
I've already given several. The bottom line is that simplistic matching
via regular expressions simply is inadequate by itself for parsing header
fields sufficiently to correctly identify RFC 2047 encoded-words. To
repeat an earlier example (slightly modified):
Content-Location:
http://users.erols.com/blilly/mailparse/(=?us-ascii?q?=3D?=)
That contains a valid RFC 2396 URI. A simplistic regular expression match
as proposed in Charles' Usefor draft would incorrectly identify a comment
and an encoded-word (neither exist in the example). A *correct* grammar-based
parser would not identify a comment (presuming that the inherent ambiguity
in RFC 2557 is resolved by changing CFWS to FWS in the ABNF); it would
identify a URI. The URI is parsed per 2396 as follows:
http -> scheme \
: -> ":" |
// -> "//" \ |
users.erols.com -> authority | |
/ -> "/" \ | |
blilly -> segment \ | | |
/ -> "/" | > abs_path > net_path >
absoluteURI
mailparse -> segment > path_segments | | |
/ -> "/" | | | |
(= -> segment / / / |
? -> "?" |
us-ascii?q?=3D?=) -> query /