[Top] [All Lists]

Re: [ietf-smtp] [dispatch] BCP proposal: regular expressions for Internet Mail identifiers

2016-03-29 14:05:45
On 3/29/2016 9:28 AM, John C Klensin wrote:

--On Tuesday, March 29, 2016 11:45 -0400 "Dale R. Worley"
<worley(_at_)ariadne(_dot_)com> wrote:

Here's another ugly little bit of processing:  On some
systems, library routines that convert dotted-number IP
address strings into four-octet format treat a component that
starts with "0" as being written in octal. E.g.,
"" is equivalent to "".  (Try executing
"dig @" on a Linux system.)  As far as
I know, this isn't *specified* anywhere in the RFCs, and some
RFCs (e.g., RFC 997) have leading zeros on numbers that
contain "9".  So it's worth warning people not to use leading
zeros in IPv4 addresses.
And that comment identifies another ugly little issue.  An email
address of example-user@ implies that
"" is a domain name and "010" (the rightmost
label) is a TLD.   Because there is no such TLD (nor is there
one for "8."), such an address is an error, so, if a
mail-related regular expression document pursues that question
at all, it would allow something that violated 5321 no matter
whether 010 is interpreted as "2", "8", "10", "16", STX,
Backspace, DLE, or something else.

I'm not suggesting Sean would do that,

Covered that already. ;-) See the pattern "restricts out all-numeric labels [RFC1912]" in Section 3.1.3.

I hope that this does go to show that raw/blind application of the ABNF in RFC 5321/5322 is not sufficient.

  only emphasizing (again)
the dangers of developing a second spec (or two specs more
generally) that is inadvertently not quite consistent with the
other one.

The danger is real, and noting it is appreciated. It's worth considering that we are not talking about one spec, but two families of specs (the email specs and the DNS specs) that we need to summarize and put together.

It turns out that the domain part is 50% of an email address but generates perhaps 85% of the complexity. The quoting rules for local-part are arcane but at least are fairly systematic. There is a question about how much it's the responsibility of an "email address validator" to validate the domain part.

I do not wish to answer this question in isolation. On the one hand, it's usually a DNS library's "job" to answer that (not an email library per-se); on the other hand, if it's not a good domain name, the email address is literally pointing to an imaginary place. The answer is, I suppose, "it depends" and the Best Current Practice is to document the issue so qualified engineers can make sound judgments about what to do. I would analogize this to a US Postal Service validator, validating two-letter state-and-political-division abbreviations. Every state-or-political-division has a two-character alphanumeric code: enforcing the two-character requirement and the alphabetic requirement in a validator would be separately reasonable if the relevant USPS standards promise the same. However, avoiding repeated characters (AA, BB, CC) seems to be more of a registration practice/requirement, so a validator need not impose such a requirement if the relevant standards do not call for it.



ietf-smtp mailing list

<Prev in Thread] Current Thread [Next in Thread>