Re: [ietf-smtp] [dispatch] BCP proposal: regular expressions for Interne

On 3/28/2016 6:18 AM, John C Klensin wrote:


--On Sunday, March 27, 2016 22:41 -0700 "Murray S. Kucherawy"
<superuser(_at_)gmail(_dot_)com> wrote:

...
And if what you're really producing is regular expressions
that match anything that the ABNFs in the mail RFCs will
legitimately produce, you might want to do a standards track
document that explicitly updates those documents where those
ABNFs are listed.

Murray,

That captures my concern about this effort.  Based on prior
experience (including RFC RFC 3696 and even the effort to make
RFCs 2821 and 5321 internally consistent), it is _really_ easy
to express a requirement in two different ways and have them be
_almost_ the same.   That is a problem because different people
will read different docs.

It seems to me that it would be much better to either do this as
an Informational document that is clearly identified as Sean's
opinion about regular expressions that impose the same
requirements as 5321/5322 but that those continue to control or
to do a standards-track document that contains both the regular
expressions and ABNF, makes clear which one is primary, and
updates the syntax requirements of the base specs.

As Dale expressed (thanks!), "BCPs are *standards* not for protocols butfor *things that people do*. So in regard to[draft-seantek-mail-regexen], the "thing that people do" is "write codethat validates e-mail addresses for further processing". And the point[...] is that people need to write correct code for validating e-mailaddresses."

Sean's opinion about regular expressions for Mail Identifiers (emailaddresses, Message-IDs) is not interesting. If my opinion were all thatinteresting, I would just publish it on Stack Overflow and call it a day(see SO Questions [46155] and [201323]). What is interesting is theIETF's vetted and (rough)-consensus view on the topic.


This topic is a favorite pet project of programmers. It tends to go:

1) "oh, I know what an email address is! It has dots and alphas andmaybe a hyphen" (WRONG),2) "oh, I'll just read RFC 5322 and roll my own" (also wrong, but inmore subtle ways...for one, RFC 5322 has distinct syntax from RFC 5321), or3) "I'm lazy, let's just copy whatever regex shows up on Google first"(pragmatic, usually not right).


Wouldn't it be better if programmers could uniformly go:

4) "Given my email address recognition problem, I'll just copy the regexfrom BCP xyz", rather than spending dozens if not hundreds of hourspouring over email standards documents and testing them against millionsof arcane email address combinations.

The current draft-seantek-mail-regexen is pretty clear (currently) thatit does not attempt to change the Mail standards. If folks want tochange those documents, may I suggest a separate Standards Trackdocument that does exactly that.

Just because a document is labeled "BCP" (or, for that matter,"Standards Track") does not mean that every last single statement in thedocument is normative and error-free. Otherwise, the RFC 3280 and RFC5280 PKIX standards that say that you are supposed to compare an entireemail address case-insensitively (Section 4.1.2.6 of RFC 3280, Section4.2.1.6 of RFC 5280) would have overridden RFCs 5322, 5321, 2822, RFC2821, etc. etc. We have an errata process.

Basically if the regular expressions are wrong, they need to be maderight. One can complain about problems, or one can fix them.

Turns out that regular expressions and ABNF are homomorphic undercertain conditions. As shown in draft-seantek-mail-regexen, "deliverableemail addresses" (RFC 5321 + RFC 6531) certainly fall in thatdefinition, as they can be expressed in a regular language (i.e.,computed with a finite state automaton). Therefore, translating betweenthe two is basically computationally verifiable. The results may notlook pretty but they will work. Perhaps a bigger problem is one's viewas to how normative ABNF is in the context of IETF standards documents.It is possible to have ABNF that says somename = *(ALPHA / DIGIT) butthen have normative text that says that <somename> is limited to 31characters and MUST start with an alphabetic character. Moreover, someABNF (RFC 5321 / RFC 5322 in particular) have "obsolete syntax"; whetherto admit such syntax is a highly context-sensitive engineering decision.Addressing all of these points requires rubbing more than two braincells together.

[46155]:http://stackoverflow.com/questions/46155/validate-email-address-in-javascript[201323]:http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address


Perhaps a BCP that recommends use of strings that are clearly a
proper subset of what the standard allows would be ok, but it
needs to be frightfully clear that it is a recommended subset,
not a requirement.

I am not really interested in subsets, except those subsets driven bythe standards themselves. (ASCII-only vs. EAI is a reasonable subset,provided that both expressions are provided. I would rather do EAI-onlybut we can be pragmatic about that.)


Best regards,

Sean

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp

Re: [ietf-smtp] [dispatch] BCP proposal: regular expressions for Internet Mail identifiers