Alan Barrett wrote:
On Tue, 17 Dec 2002, Bruce Lilly wrote:
One cannot recognise a comment unless the header field syntax is known.
One can recognise a comment from lexical analysis alone. This was true
in RFC 822, and should still be true in RFC 2822 unless something went
wrong.
[...]
Other examples could be given, but the above show that it is necessary
to fully parse header field content in order to determine whether
or not there is an encoded-word; use of regular expressions (or the
equivalent) is inadequate.
I agree on this point. However, lexical analysis plus some guessing
will often be good enough.
Lexical analysis is equivalent to using regular expressions (and hence
insufficiently powerful) -- indeed many lexical analyzers are build by
constructing a finite automaton from a set of regular expressions.