At 12:29 03/12/2007, you wrote:
Paul Smith wrote:
How do you think you should parse this header according to RFC 2822?
From: Joe \(Joseph\) Bloggs <joe(_at_)joe(_dot_)com>
My reading of RFC 2822 says that this is three "tokens", 'Joe', '\'
and an unfinished comment '(Joseph) Bloggs <joe(_at_)joe(_dot_)com>'
As far as I can see you can have quoted pairs like '\)' in
comments, but not outside comments, so the '\' before the '(' isn't
a quoting '\' but a real '\'. Outside comments you use " characters
Is this right, or am I missing something?
AFAICS, syntactically correct formations of the line could be:
From: "Joe (Joseph) Bloggs" <joe(_at_)joe(_dot_)com>
From: (Joe \(Joseph\) Bloggs) joe(_at_)joe(_dot_)com
From: Joe (\(Joseph\)) Bloggs <joe(_at_)joe(_dot_)com>
I am not sure the second one is valid legacy format.
My view is why are you parsing the display name?
To display in an email client. I want to split the field into two
parts - the address and the display name.
There is really just two parts here:
angle-addr --> <joe(_at_)joe(_dot_)com>
display-name --> everything else, who cares!
Generally, you have this two formats to check:
displayname <addr-spec> current
addr-spec (displayname) legacy
Generally - but it isn't that simple AFAICS. In the second, the
'(displayname)' is simply a comment (included in the CFWS part of
obs-mbox-list in RFC 2822 is defined as
obs-mbox-list = 1*([mailbox] [CFWS] "," [CFWS]) [mailbox]
So, for all except the last mailbox, the comment should come after
the mailbox, but for the last one, it should come before. It also
looks like you also aren't allowed a comment if you only have one mailbox??
Our parser will handle the comment either before or after the mailbox
in both cases, as we've seen both of these.
AFAICS RFC 822 just allows comments anywhere, so before or after are
both allowed by RFC 822. Potentially, RFC 822 would allow
From: (Joe) joe(that's Joseph)@joe(that's joe's company (oh yes it is)).com
Also the form
says that 'displayname' is a 'phrase', which can be 1*word or obs-phrase
and obs-phrase can contain CFWS
So, RFC 2822 lets you have
From: Joe (Joseph) Bloggs <joe(_at_)joe(_dot_)com>
Here, "Joseph" is an allowed comment in the displayname part of the
header field. Comments support quoted-pairs, but the rest of the
displayname part doesn't seem to. Hence my original question :)
If you don't handle this correctly, you'd have problems with:
From: Joe (Joseph \) Bloggs <fred(_at_)fred(_dot_)com> \(my company)
Here, the display name is "Joe", with a comment "Joseph ) Bloggs
<fred(_at_)fred(_dot_)com> (my company", and an email address <joe(_at_)joe(_dot_)com>
You have to parse the comments to know that the '\)' after the '(' is
not a closing of the comment, but a quoted-pair character.
Also, simply checking for <address> first here would break things, as
<fred(_at_)fred(_dot_)com> isn't the address.
Also, '<addr-spec>' is angle-addr, which can be obs-angle-addr which
is defined as
[CFWS] "<" [obs-route] addr-spec ">" [CFWS]
So, you could have
From: Joe (Joseph) Bloggs <joe(_at_)joe(_dot_)com> (Another comment)
or (to be really awkward)
From: Joe (Joseph\) Bloggs <fred(_at_)fred(_dot_)com> ) <joe(_at_)joe(_dot_)com> (Another
comment \) <bloggs(_at_)bloggs(_dot_)com> )
Since comments are allowed after the <addr-spec> in obs-angle-addr
So - parsing it isn't that simple...
To make things even more fun, you can have comments inside the <> as
well in RFC 2822 - see obs-route. You're unlikely to have them since
people don't generally use source routes any more, but you need to be
able to parse them.