[Top] [All Lists]

Re: Parsing a header

2007-12-03 07:02:14

At 12:29 03/12/2007, you wrote:
Paul Smith wrote:
How do you think you should parse this header according to RFC 2822?
From: Joe \(Joseph\) Bloggs <joe(_at_)joe(_dot_)com>
My reading of RFC 2822 says that this is three "tokens", 'Joe', '\' and an unfinished comment '(Joseph) Bloggs <joe(_at_)joe(_dot_)com>' As far as I can see you can have quoted pairs like '\)' in comments, but not outside comments, so the '\' before the '(' isn't a quoting '\' but a real '\'. Outside comments you use " characters for quoting.
Is this right, or am I missing something?
AFAICS, syntactically correct formations of the line could be:
From: "Joe (Joseph) Bloggs" <joe(_at_)joe(_dot_)com>
From: (Joe \(Joseph\) Bloggs) joe(_at_)joe(_dot_)com
or even
From: Joe (\(Joseph\)) Bloggs <joe(_at_)joe(_dot_)com>

Hi Paul,

I am not sure the second one is valid legacy format.

Why not?

My view is why are you parsing the display name?

To display in an email client. I want to split the field into two parts - the address and the display name.

There is really just two parts here:

    angle-addr   -->  <joe(_at_)joe(_dot_)com>
    display-name -->  everything else, who cares!

Generally, you have this two formats to check:

    displayname <addr-spec>      current
    addr-spec (displayname)      legacy

Generally - but it isn't that simple AFAICS. In the second, the '(displayname)' is simply a comment (included in the CFWS part of 'obs-mbox-list')

obs-mbox-list in RFC 2822 is defined as

obs-mbox-list   =       1*([mailbox] [CFWS] "," [CFWS]) [mailbox]

So, for all except the last mailbox, the comment should come after the mailbox, but for the last one, it should come before. It also looks like you also aren't allowed a comment if you only have one mailbox??

Our parser will handle the comment either before or after the mailbox in both cases, as we've seen both of these.

AFAICS RFC 822 just allows comments anywhere, so before or after are both allowed by RFC 822. Potentially, RFC 822 would allow
From: (Joe) joe(that's Joseph)@joe(that's joe's company (oh yes it is)).com

Also the form
displayname <addr-spec>
says that 'displayname' is a 'phrase',  which can be 1*word or obs-phrase
and obs-phrase can contain CFWS

So, RFC 2822 lets you have

From: Joe (Joseph) Bloggs <joe(_at_)joe(_dot_)com>

Here, "Joseph" is an allowed comment in the displayname part of the header field. Comments support quoted-pairs, but the rest of the displayname part doesn't seem to. Hence my original question :)

If you don't handle this correctly, you'd have problems with:

From: Joe (Joseph \)  Bloggs <fred(_at_)fred(_dot_)com> \(my company) 

Here, the display name is "Joe", with a comment "Joseph ) Bloggs <fred(_at_)fred(_dot_)com> (my company", and an email address <joe(_at_)joe(_dot_)com> You have to parse the comments to know that the '\)' after the '(' is not a closing of the comment, but a quoted-pair character.

Also, simply checking for <address> first here would break things, as <fred(_at_)fred(_dot_)com> isn't the address.

Also, '<addr-spec>' is angle-addr, which can be obs-angle-addr which is defined as
[CFWS] "<" [obs-route] addr-spec ">" [CFWS]

So, you could have

From: Joe (Joseph) Bloggs <joe(_at_)joe(_dot_)com> (Another comment)

or (to be really awkward)
From: Joe (Joseph\) Bloggs <fred(_at_)fred(_dot_)com> ) <joe(_at_)joe(_dot_)com> (Another comment \) <bloggs(_at_)bloggs(_dot_)com> )

Since comments are allowed after the <addr-spec> in obs-angle-addr

So - parsing it isn't that simple...

To make things even more fun, you can have comments inside the <> as well in RFC 2822 - see obs-route. You're unlikely to have them since people don't generally use source routes any more, but you need to be able to parse them.

<Prev in Thread] Current Thread [Next in Thread>