++ 16/10/99 08:59 -0500 - Philip Guenther:
While it true that the high-level syntax of a Message-Id: header does
not mention comments or whitespace, this is because they both disappear
during the lexical analysis. To quote rfc822, section 3.1.2:
Note: Any field which has a field-body that is defined as
other than simply <text> is to be treated as a struc-
tured field.
Then in section 3.1.4:
To aid in the creation and reading of structured fields, the
free insertion of linear-white-space (which permits folding
by inclusion of CRLFs) is allowed between lexical tokens.
Then follows a percise listing of the lexical tokens of a structured
header field.
Wow... i have never noted that before! So, to say it in less 'expensive'
words, another way to say this is that one is allowed to spaces to
increase readabilty? And even another way would be:
Message-ID: < local_part @ domain_part >
...is a RFC valid Message-ID?
Additionally, but not very important, should this trailing 'D' be
capitalized or not? In the RFC i only see it with a capitalized 'D' at
the end, you almost always write it with a small 'd'.
The reason your condition is match too often is that the at-sign is
doubled in it:
*$ ! ^Message-Id:$ws<$ws$local_part$ws@@$ws$domain$ws>
^^
Remove one of those.
Stupid... typo.
Finally, I'll note that rfc822 actually allows comments in Message-Id:
headers (indeed, comments are one of the lexical tokens listed in section
4.1.4). While it is impossible to match arbitrarly nested parens with
a regular expression, it is simple to match one level of parens, and
given that there's a Banyan Vines MTA that includes a comment in the
local part of the Message-Id: header, I would recommend changing the
'ws' definition to the following:
ws="[ ]*(\([^()]*\)[ ]*)?"
(Yes, that _could_ be
ws="[ ]*(\([^()]*\)[ ]*)*"
but I have yet to see a Message-Id: header with two comments in a row,
and I don't feel like that much slack to a loser MTA/MUA writers.)
But strictly taken (the way the RFC tells it us) it is possible to have
two comments in a row. Correct? If so, i prefer the latter one.
Also, the RFC allows comments in both the local and the domain parts? If
not i'll change the regexp a little so it'll will only match comments in
the local part.
Thanks for the clarification.
-Rejo.
--
= Rejo Zenger [Sister Ray Crisiscentrum]
rejo(_at_)sisterray(_dot_)xs4all(_dot_)nl
= http://mediaport.org/~sister PGP: RSA FAE40065, DSS/DH 2C8059B5
--------------------------------------------------------------------------------