Pete Resnick wrote:
Hi Bruce,
As I said on the ietf-822 list, I was getting started on updates to
RFC 2822 to move it along to draft and was looking at:
<http://users.erols.com/blilly/mparse/rfc2822grammar_simplified.txt>
I notice that the ABNF in there has a few things that are non-2822
such as encoded words. Do you have a copy of the ABNF which is purely
the 2822 replacement that we can post to the ietf-2822 list?
pr
I have an older version which does not have the encoded-word grammar
(full text below).
The rationale for adding encoded-word grammar is:
a) 822 (as amended by RFC 2047 section 5 and as further amended by RFC
2231) had it,
though spread out over three documents
b) it is necessary for MIME-conforming implementations
c) the encoded-word rules are rather complex -- I believe that the
grammar in the current
document (URI above) covers everything except the rule prohibiting
encoded-words in
Received fields. In particular, the rules regarding adjacent linear
whitespace are quite
complex.
From an implementor's perspective, I'd like to see all of the relevant
base grammar (i.e.
base field and supporting grammar) in a single document; indeed, one of
the benefits of
2822 is that it consolidated most of the "... amends RFC 822" piecemeal
details into a
single document (obviously, the 2047/2231 amendments somehow didn't make
it into
2822). I don't believe there is any harm in including the encoded-word
grammar as
encoded-words appear in the higher-level constructs as alternatives to
ccontent, word,
and utext.
Following is the text of the modified grammar w/o encoded-word grammar,
interspersed
with some notes:
rfc2822grammar_simplified.txt version 0.13 2001/08/08 16:02:35
excerpted from RFC 2822 and modified by Bruce Lilly
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; that do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and white space characters
%d127
text = %d1-9 / ; Characters excluding CR and LF
%d11 / %d12 / %d14-127 / obs-text
specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" / ":" / ";" / "@" / "\" / "," / "." /
DQUOTE
quoted-pair = ("\" text)
[N.B. had redundant obs-qp alternative]
FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space
obs-FWS
ctext = NO-WS-CTL / ; Non white space controls
%d33-39 / ; The rest of the US-ASCII
%d42-91 / ; characters not including "(",
%d93-126 ; ")", or "\"
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = *([FWS] comment) (([FWS] comment) / FWS)
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" /
"^" / "_" / "`" / "{" / "|" / "}" / "~"
atom = 1*atext [CFWS]
dot-atom = dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
qtext = NO-WS-CTL / ; Non white space controls
%d33 / ; The rest of the US-ASCII
%d35-91 / ; characters not including "\"
%d93-126 ; or the quote character
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
qcontent = qtext / quoted-pair
quoted-string = DQUOTE [FWS] *(qcontent [FWS]) DQUOTE [CFWS]
word = atom / quoted-string
phrase = 1*word / obs-phrase
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
obs-utext
unstructured = *(utext [FWS])
date-time = ([ day-name "," [FWS]] date FWS time [CFWS]) /
obs-date-time
day-name = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" /
"Sun"
date = day FWS month-name FWS year
year = 4*DIGIT
month-name = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" /
"Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
day = 1*2DIGIT
time = time-of-day FWS zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT
minute = 2DIGIT
second = 2DIGIT
zone = ( "+" / "-" ) 4DIGIT
[N.B. no CFWS between +- and 4DIGIT]
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = ("<" [CFWS] addr-spec ">" [CFWS]) / obs-angle-addr
group = display-name ":" [CFWS] [mailbox-list] ";" [CFWS]
display-name = phrase
mailbox-list = (mailbox *("," [CFWS] mailbox)) / obs-mbox-list
address-list = (address *("," [CFWS] address)) / obs-addr-list
addr-spec = local-part "@" [CFWS] domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = "[" [FWS] *(dcontent [FWS]) "]" [CFWS]
dcontent = dtext / quoted-pair
dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or "\"
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
message = (fields / obs-fields) [CRLF body]
body = *(*998text CRLF) *998text
fields = *(trace *(resent-date / resent-from /
resent-sender / resent-to / resent-cc / resent-bcc / resent-msg-id))
*(orig-date / from / sender / reply-to / to / cc
/ bcc / message-id / in-reply-to / references / subject / comments /
keywords / optional-field)
orig-date = "Date:" [FWS] date-time CRLF
from = "From:" [CFWS] mailbox-list CRLF
sender = "Sender:" [CFWS] mailbox CRLF
reply-to = "Reply-To:" [CFWS] address-list CRLF
to = "To:" [CFWS] address-list CRLF
cc = "Cc:" [CFWS] address-list CRLF
bcc = "Bcc:" [CFWS] [address-list] CRLF
message-id = "Message-ID:" [CFWS] msg-id CRLF
in-reply-to = "In-Reply-To:" [CFWS] 1*msg-id CRLF
references = "References:" [CFWS] 1*msg-id CRLF
msg-id = ( "<" id-left "@" id-right ">" [CFWS]) / obs-msg-id
id-left = dot-atom-text / no-fold-quote
id-right = dot-atom-text / no-fold-literal
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
no-fold-literal = "[" *(dtext / quoted-pair) "]"
subject = "Subject:" [FWS] [("cmsg" / "Re: ") [FWS]]
unstructured CRLF
[ RFC 1036 sect. 2.2.6 "cmsg" Subject hack, sect. 2.1.4 "Re: " ]
comments = "Comments:" [FWS] unstructured CRLF
keywords = "Keywords:" [CFWS] phrase *("," [CFWS] phrase) CRLF
resent-date = "Resent-Date:" [FWS] date-time CRLF
resent-from = "Resent-From:" [CFWS] mailbox-list CRLF
resent-sender = "Resent-Sender:" [CFWS] mailbox CRLF
resent-to = "Resent-To:" [CFWS] address-list CRLF
resent-cc = "Resent-Cc:" [CFWS] address-list CRLF
resent-bcc = "Resent-Bcc:" [CFWS] [address-list] CRLF
resent-msg-id = "Resent-Message-ID:" [CFWS] msg-id CRLF
trace = [return] 1*received
return = "Return-Path:" [CFWS] path CRLF
path = ("<" [CFWS] [addr-spec] ">" [CFWS]) / obs-path
received = "Received:" [CFWS] name-val-list ";" [FWS]
date-time CRLF
name-val-list = [*(name-val-pair CFWS) name-val-pair]
[N.B. 2822 specification does not provide for mandatory CFWS at end of
list (as opposed to RFC 821 (required <SP>) and 2821)
[name-val-pair CFWS *(name-val-pair CFWS)]
]
name-val-pair = item-name CFWS item-value
item-name = ALPHA *(["-"] (ALPHA / DIGIT))
item-value = 1*angle-addr / addr-spec / atom / domain / msg-id
optional-field = field-name ":" [FWS] unstructured CRLF
field-name = 1*ftext
ftext = %d33-57 / ; Any character except
%d59-126 ; controls, SP, and
; ":".
obs-qp = "\" (%d0-127)
[N.B. unnecessary]
obs-text = %d0-127
[N.B. original 2822 specification was as obs-utext in this file, which
permitted multiple characters]
obs-char = %d0-9 / %d11 / ; %d0-127 except CR and
%d12 / %d14-127 ; LF
obs-utext = *LF *CR *(obs-char *LF *CR)
[N.B. was obs-text]
obs-phrase = word *(word / ("." [CFWS]))
obs-phrase-list = phrase / (1*([phrase] "," [CFWS]) [phrase])
obs-FWS = 1*WSP *(CRLF 1*WSP)
obs-date-time = [ day-name [CFWS] "," [CFWS]] obs-date [CFWS]
FWS [CFWS] obs-time [CFWS]
[N.B. obs- rule does not provide for adjacent date and time permitted by
RFC 822]
obs-date = day CFWS month-name CFWS obs-year
[N.B. obs- rule does not permit (e.g.) 1Jan2001 which was permissible
under RFC 822]
obs-year = 2*DIGIT
obs-time = obs-time-of-day CFWS (zone / obs-zone)
[N.B. obs- rule does not permit adjacent time and zone, which was
permissible under RFC 822]
obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] ":"
[[CFWS] second]
obs-zone = "UT" / "GMT" / ; Universal Time
; North American UT
; offsets
"EST" / "EDT" / ; Eastern: - 5/ - 4
"CST" / "CDT" / ; Central: - 6/ - 5
"MST" / "MDT" / ; Mountain: - 7/ - 6
"PST" / "PDT" / ; Pacific: - 8/ - 7
%d65-73 / ; Military zones - "A"
%d75-90 / ; through "I" and "K"
%d97-105 / ; through "Z", both
%d107-122 ; upper and lower case
obs-angle-addr = "<" [CFWS] [obs-route] addr-spec ">" [CFWS]
obs-route = obs-domain-list ":" [CFWS]
obs-domain-list = "@" [CFWS] domain *(1*("," [CFWS]) "@" [CFWS]
domain)
obs-local-part = word *("." [CFWS] word)
obs-domain = atom *("." [CFWS] atom)
obs-mbox-list = 1*([mailbox] "," [CFWS]) [mailbox]
obs-addr-list = 1*([address] "," [CFWS]) [address]
obs-fields = *(obs-return / obs-received / obs-orig-date /
obs-from / obs-sender / obs-reply-to / obs-to / obs-cc / obs-bcc /
obs-message-id / obs-in-reply-to / obs-references / obs-subject /
obs-comments / obs-keywords / obs-resent-date / obs-resent-from /
obs-resent-send / obs-resent-rply / obs-resent-to / obs-resent-cc /
obs-resent-bcc / obs-resent-mid / obs-optional)
obs-orig-date = "Date" *WSP ":" [CFWS] date-time CRLF
obs-from = "From" *WSP ":" [CFWS] mailbox-list CRLF
obs-sender = "Sender" *WSP ":" [CFWS] mailbox CRLF
obs-reply-to = "Reply-To" *WSP ":" [CFWS] address-list CRLF
obs-to = "To" *WSP ":" [CFWS] address-list CRLF
obs-cc = "Cc" *WSP ":" [CFWS] address-list CRLF
obs-bcc = "Bcc" *WSP ":" [CFWS] [address-list] CRLF
obs-message-id = "Message-ID" *WSP ":" [CFWS] msg-id CRLF
obs-in-reply-to = "In-Reply-To" *WSP ":" [CFWS] *(phrase / msg-id)
CRLF
obs-references = "References" *WSP ":" [CFWS] *(phrase / msg-id) CRLF
obs-msg-id = "<" [CFWS] addr-spec ">" [CFWS]
obs-subject = "Subject" *WSP ":" [FWS] [("cmsg" / "Re:")
[FWS]] unstructured CRLF
[ RFC 1036 sect. 2.2.6 "cmsg" hack, 2.1.4 "Re:" (w/ or w/o space) ]
obs-comments = "Comments" *WSP ":" [FWS] unstructured CRLF
obs-keywords = "Keywords" *WSP ":" [CFWS] obs-phrase-list CRLF
obs-resent-from = "Resent-From" *WSP ":" [CFWS] mailbox-list CRLF
obs-resent-send = "Resent-Sender" *WSP ":" [CFWS] mailbox CRLF
obs-resent-date = "Resent-Date" *WSP ":" [CFWS] date-time CRLF
obs-resent-to = "Resent-To" *WSP ":" [CFWS] address-list CRLF
obs-resent-cc = "Resent-Cc" *WSP ":" [CFWS] address-list CRLF
obs-resent-bcc = "Resent-Bcc" *WSP ":" [CFWS] [address-list] CRLF
obs-resent-mid = "Resent-Message-ID" *WSP ":" [CFWS] msg-id CRLF
obs-resent-rply = "Resent-Reply-To" *WSP ":" [CFWS] address-list CRLF
obs-return = "Return-Path" *WSP ":" [CFWS] path CRLF
obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";"
[CFWS] obs-date-time ] CRLF
[N.B. RFC 822 required date-time stamp]
[N.B. reference online version of 2822 specification does not permit WSP
before colon if date-time stamp is used; RFC 822 permitted (nay, required)
"Received" *WSP ":" [CFWS] name-val-list ";" [CFWS]
obs-date-time CRLF
]
obs-path = obs-angle-addr
obs-optional = field-name *WSP ":" [FWS] unstructured CRLF
--------------------------------------------------------------------------------
Notes not part of modified grammar:
For LR(1) parser compatibility, lexical tokens are grouped such that
trailing
WS, FWS, or CFWS is associated with its preceding lexical token. Therefore,
no lexical token handled by the higher-level parser grammar rules has any
ambiguity associated with optional WS, FWS, or CFWS. So, where this revised
grammar has:
obs-mbox-list = 1*([mailbox] "," [CFWS]) [mailbox]
that is handled by the implementation as:
obs-mbox-list = 1*([mailbox] ("," [CFWS])) [mailbox]
Additional rules such as:
start = (":" [FWS]) / obs-start
obs-start = *WSP ":" [FWS]
cstart = (":" [CFWS]) / obs-cstart
obs-cstart = *WSP ":" [CFWS]
dstart = start / obs-cstart
can be used to reduce the number of rules, e.g.:
orig-date = "Date" dstart date-time CRLF
(eliminating obs-orig-date (also applies to resent-date))
subject = "Subject" start ["cmsg" [FWS]] unstructured CRLF
(eliminating obs-subject (start also applies to comments and
optional-field))
from = "From" cstart mailbox-list CRLF
(eliminating obs-from (cstart applies to remaining header fields))
etc., allowing all of the obs- header fields to be eliminated, and
obs-fields to
be simplified.
And adding:
resent = "Resent-"
allows:
resent-from = resent from
etc., allowing the resent- fields to be simplified and ensuring that the
definitions remain in sync between base and resent- versions.