ietf-822
[Top] [All Lists]

Re: Interpretation of RFC 2047

2003-01-05 15:38:47

Pete Resnick wrote:
On 12/24/02 at 12:43 PM -0500, Bruce Lilly wrote:


That depends on the context. In the case of the List-Owner example and a number of fairly widely-used MUAs, the URL is properly decoded when using the List-Owner field content to generate a message to the list owner, and some even decode the 2047 encoding for display.


Absolutely irrelevant. As far as 822/2822/2047 is concerned, List-Owner does not contain a phrase.
[...]
Now, the contents of a List-Owner field can be passed to a 1738/2396 parser and, upon discovering a mailto: URL (as against any other kind of URL), one might parse the URL according to 2368 and discover within the scheme-specific part of the URL something that is a mailbox list. The *result* could then be passed to a 822/2822 parser and *then* to a 2047 parser for display purposes.

We're saying the same thing different ways.

And as far as I can tell, you can't have anything in a URL in a header field that a 2047 parser would recognize as a phrase or a comment.


URLs can contain parentheses


But URLs can't contain "=" or "?", so a 2047 parser is not going to find anything interesting in the parentheses.

Per 2396, an opaque_part can contain '='and/or '?', and opaque_part
can be part of an absoluteURI.  Also, '?' can delimit a query as part
of a relativeURI, and a query can contain '=' and/or '?'.

Examples have been given in earlier messages in this thread; a simplistic matching of header field text using regular expressions might incorrectly match a "comment" where there is none.


But that wasn't the issue. The question that was being asked was whether a simplistic regexp "parser" would accidentally find text which it thought was 2047 syntax. Your claim was that such text could occur in a field which contained a URL.

Yes, and I stand by that claim.

> Since "=" and "?" can't appear in a URL
according to 1738 and 2396, a 2047 parser should never be tripped up by a URL in a header field.

See 2396 as referenced above.

If you can come up with a serious example of where one might find something that looked like 2047 text in a field where it shouldn't find any, I'd be significantly more concerned.

I've already given several.  The bottom line is that simplistic matching
via regular expressions simply is inadequate by itself for parsing header
fields sufficiently to correctly identify RFC 2047 encoded-words. To
repeat an earlier example (slightly modified):

   Content-Location: 
http://users.erols.com/blilly/mailparse/(=?us-ascii?q?=3D?=)

That contains a valid RFC 2396 URI. A simplistic regular expression match
as proposed in Charles' Usefor draft would incorrectly identify a comment
and an encoded-word (neither exist in the example).  A *correct* grammar-based
parser would not identify a comment (presuming that the inherent ambiguity
in RFC 2557 is resolved by changing CFWS to FWS in the ABNF); it would
identify a URI.  The URI is parsed per 2396 as follows:

http              -> scheme                                           \
:                 -> ":"                                               |
//                -> "//"                                 \            |
users.erols.com   -> authority                             |           |
/                 -> "/"                      \            |           |
blilly            -> segment \                 |           |           |
/                 -> "/"      |                 > abs_path  > net_path  > 
absoluteURI
mailparse         -> segment   > path_segments |           |           |
/                 -> "/"      |                |           |           |
(=                -> segment /                /           /            |
?                 -> "?"                                               |
us-ascii?q?=3D?=) -> query                                            /


<Prev in Thread] Current Thread [Next in Thread>