RFC 822 says that structured field bodies are parsed as follows:
characters
(e.g., "6 Feb 1999")
|
| tokenize
V
spaces, tabs, comments,
"<", ">", ",", ";", ":", "@", ".", atoms, domain literals, quoted strings
(e.g., atom 6, space, atom Feb, space, atom 1999)
|
| remove spaces, tabs, comments
V
"<", ">", ",", ";", ":", "@", ".", atoms, domain literals, quoted strings
(e.g., atom 6, atom Feb, atom 1999)
|
| parse
V
higher-level data
It's easy to give precise English descriptions of each of these steps.
See http://pobox.com/~djb/proto/immhf.html.
Pete Resnick, over the objections of several implementors on DRUMS,
threw away the RFC 822 tokenizer. He wrote a new ABNF grammar that
starts from sequences of characters, rather than sequences of tokens.
ABNF is a weak programming language in which simple lexing steps such as
Read as many characters as possible, stopping before the first ...
and
Now remove all comments
are a royal pain to handle correctly, so it's hardly a surprise that
Resnick made some big mistakes in his grammar, and that the result is
much more difficult to read than RFC 822.
Charles Lindsey writes:
[ ``foobar'' being parsed as two atoms ]
However, I gather Pete Resnick has spotted this discussion and is
taking it up on the DRUMS list. I hope they fix it.
I raised the same issue on the DRUMS mailing list in 1996.
Resnick was claiming that English was error-prone while ABNF was not:
``Having everything in the grammar leaves no ambiguity, and having them
in the prose is almost guaranteeing it.''
I pointed out that the evidence was against him: ``Really? How come your
grammar allows `To: anything I want'? How come your grammar allows the
string `foo' to be parsed as three atoms?'' Of course, these ambiguities
are extremely difficult to eliminate from the formal grammar.
Resnick's response: ``I don't think it makes a difference, unless you do
something silly like try to feed tokens you've received and parsed into
something that's going to put comments or whitespace between them.''
I said that an incorrect spec was unacceptable, and suggested writing
the grammar in C instead of ABNF. Resnick didn't respond.
---Dan