ietf-smtp
[Top] [All Lists]

Re: [ietf-smtp] How is EAI mail implemented ?

2021-06-15 13:15:36
This is sort of a followup to the discussion abut domain names in Received
lines in EAI messages.

RFC 6531 describes a model where EAI mail is conceptually parallel to
ASCII mail -- clients tag incoming EAI messages with SMTPUTF8 keywords,
those EAI messages can only be relayed to servers that offer SMTPUTF8, and
so forth.  Having looked at a bunch of EAI mail software, nobody
actually implements it that way.

Since all computers handle 8-bit bytes, mail software generally handles
8-bit data without doing anything special.  You have to change the code
that does DNS lookups to turn U-labels into A-labels, but that's about it.

On my system I turn addresses into A-labels on the way in, and back into
U-labels on the way out (only if there is a UTF-8 local part) and handle
all the local routing and deliveries with A-labels.

It sounds to me like this doesn't allow for UTF-8 domains in aliases. If so,
that's not very friendly.

In any case, this isn't how we do it. We try very hard not to modify addresses,
and instead do various forms of "canonicalization" when looking them up or
comparing them. This is undoubtedly ugly, and to someone like me who also writes
code for AVR microcontrollers it seems pretty inefficient, but as a practical
matter these transformations are down in the noise for modern CPUs.

But a lot of MTAs
don't even do that, they just allow any IDN in the domain, and if you want
the U-label and A-label versions of an address to deliver to the same
place, you have to configure them both.

In our case configuring either one is sufficient. Of course this leaves open the
question of comparing local parts. We haven't really solved this one yet.

Nobody I've seen tags messages as EAI in their internal queues.

We do. In fact our EAI tagging tells us what aspects of EAI are
in use: MAIL FROM address, RCPT TO address, main headers, body.
It turns out to be handy to know this up front.

On
outgoing mail, they check on the fly to see if it's an EAI message:
non-ASCII characters in the envelope or message headers (Exim doesn't even
look at the headers, and says that's not a bug.)

How strictly we follow the RFCs is settable in our case. If someone
wants strict behavior they can have it, if someone is content
to send messages with SMTPUTF8 headers to a non-EAI server they can
have that too.

A lot of MTAs add the
SMTPUTF8 MAIL FROM tag to all outgoing mail to servers that offer
SMTPUTF8, because why not.

Because it's possible that regardless of what you have observed, that server may
then refuse to relay an EAI-tageed but actually non-EAI message to a non-EAI
server?

I think they all notice if an EAI message is
sent to a non-EAI server, and a few in that case do odd things like
turning a UTF-8 local part into a MIME encoded word in the envelope.

Yuck. Frankly, it would be better to just send UTF-8 in this case than to come
up with a private encoding for an address that likely belongs to the server
you're sending to.

This approach is a lot easier to code than trying to tag all the queued
messages, and it can deliver more mail if, e.g., an incoming message has
an ASCII bounce address and UTF-8 recipients but is relayed to an ASCII
recipient, the relay doesn't need EAI.

IME the tagging part was trivial. The canonicalization was considerably harder.
(And writing the tests for all of it was a real PITA.) However, the really
difficult part is balancing standards compliance and all this turning into a
major support call generator.

When looking at IMAP and POP servers, again, since computers all handle
8-bit data, you get most of the way there for free.  I haven't found any
IMAP servers with UTF8=ACCEPT or POP with UTF8 that really works, but I've
found plenty with LOGIN and AUTHENTICATE commands that take UTF-8
strings, and IMAP searches with the complex old character encoding work
remarkably well.  It often seems even to find strings in unencoded UTF-8
headers which I wasn't expecting, perhaps again something that works by
mistake.

I don't think it's a mistake, exactly. More like the optimum handling
for invalid messages turns out to align with the proper handling
for SMTPUTF8 messages.

None of this means the RFCs have to change but it might be time for an
applicability statement or something about how EAI is likely to coexist
with ASCII mail for a long time.

I don't have enough feedback from actual use to be able to say anything
definitive, but my guess is the document EAI needs is something that the IETF
would not be willing/able to write: One that says what parts of the standard
should be followed, what parts should be outright violated, and when.

                                Ned

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp