Re: [ietf-smtp] [dispatch] BCP proposal: regular expressions for Interne

On 3/29/2016 9:28 AM, John C Klensin wrote:


--On Tuesday, March 29, 2016 11:45 -0400 "Dale R. Worley"
<worley(_at_)ariadne(_dot_)com> wrote:

...
Here's another ugly little bit of processing:  On some
systems, library routines that convert dotted-number IP
address strings into four-octet format treat a component that
starts with "0" as being written in octal. E.g.,
"010.010.010.010" is equivalent to "8.8.8.8".  (Try executing
"dig ietf.org @010.010.010.010" on a Linux system.)  As far as
I know, this isn't *specified* anywhere in the RFCs, and some
RFCs (e.g., RFC 997) have leading zeros on numbers that
contain "9".  So it's worth warning people not to use leading
zeros in IPv4 addresses.

And that comment identifies another ugly little issue.  An email
address of example-user@010.010.010.010 implies that
"010.010.010.010" is a domain name and "010" (the rightmost
label) is a TLD.   Because there is no such TLD (nor is there
one for "8."), such an address is an error, so, if a
mail-related regular expression document pursues that question
at all, it would allow something that violated 5321 no matter
whether 010 is interpreted as "2", "8", "10", "16", STX,
Backspace, DLE, or something else.

I'm not suggesting Sean would do that,

Covered that already. ;-) See the pattern "restricts out all-numericlabels [RFC1912]" in Section 3.1.3.

I hope that this does go to show that raw/blind application of the ABNFin RFC 5321/5322 is not sufficient.

  only emphasizing (again)
the dangers of developing a second spec (or two specs more
generally) that is inadvertently not quite consistent with the
other one.

The danger is real, and noting it is appreciated. It's worth consideringthat we are not talking about one spec, but two families of specs (theemail specs and the DNS specs) that we need to summarize and put together.

It turns out that the domain part is 50% of an email address butgenerates perhaps 85% of the complexity. The quoting rules forlocal-part are arcane but at least are fairly systematic. There is aquestion about how much it's the responsibility of an "email addressvalidator" to validate the domain part.

I do not wish to answer this question in isolation. On the one hand,it's usually a DNS library's "job" to answer that (not an email libraryper-se); on the other hand, if it's not a good domain name, the emailaddress is literally pointing to an imaginary place. The answer is, Isuppose, "it depends" and the Best Current Practice is to document theissue so qualified engineers can make sound judgments about what to do.I would analogize this to a US Postal Service validator, validatingtwo-letter state-and-political-division abbreviations. Everystate-or-political-division has a two-character alphanumeric code:enforcing the two-character requirement and the alphabetic requirementin a validator would be separately reasonable if the relevant USPSstandards promise the same. However, avoiding repeated characters (AA,BB, CC) seems to be more of a registration practice/requirement, so avalidator need not impose such a requirement if the relevant standardsdo not call for it.


Regards,

Sean

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp

Re: [ietf-smtp] [dispatch] BCP proposal: regular expressions for Internet Mail identifiers