Re: [ietf-822] utf8 messages


On 15/08/2014 15:13, "Ned Freed" <ned(_dot_)freed(_at_)mrochek(_dot_)com> wrote:

On 14/08/2014 01:56, "Ned Freed" <ned(_dot_)freed(_at_)mrochek(_dot_)com> 
wrote:

I fully agree with Brandon, the standard SHOULD consider the use case
when a
message is transferred from one system to another as a blob (e.g.

flat

file) and
the only available "metadata" is that the message is in MIME format.
Having
some sort of well defined UTF8 indicator in the header section of the
message
would make it much simpler to adopt the new standard as it would

require

substantially less development effort in most cases.


I'm skeptical of the claim, but if you absolutely have to have

something,

why
not add a Received: field containing a "with smtputf8" clause, assuming
one
isn't there already?

Received: headers are not very reliable, and the syntax is is not well
defined.


On the contrary, it's quite well defined. See RFC 5321. The issue isn't
that
it's poorly defined, but rather that there are a lot of agents that don't
create it properly.


Maybe because it was defined too late, and not in the right place? (RFC
5321
is about SMTP not about MIME) From the parser's point of view the reason
is 
indifferent, the reality is that it is better not to rely on it. Even RFC
5321 
says: 

"...receiving systems MUST NOT reject mail based on the format of a trace
header field and SHOULD be extremely robust in the light of unexpected
information or formats in those header fields."

Successfully parsing a Received: header itself requires a lot of
heuristics.


A full parse does, and so does looking for IP address information (which
doesn't appear directly as a clause value and whose position was only
standardized late in the game). Looking for a with clause with a
particular
value does not.


Looks like we have quite different ideas about reliability and parsing.
I certainly would not consider the partial parsing approach you suggested
as reliable.

To be honest I would not be happy to rely on them. Also, when a  message
is transferred between archive stores no new Received: header is
normally
added.


Uh huh. And neither is whatever new header is being proposed here. Why
is one preferable to the other?


Because 
1) the Received: header is already used and abused in many ways
2) as you admitted, there are a lot of agents that don't create it properly
3) semantically it does not makes sense to put the charset information in
   the Received: header (it is meant to be a trace field)
4) if we define a new field, we don't need to worry about finding the
newly 
   defined field with bogus syntax in historic emails sent before the
standard 
   was published

Regarding Ned's concern about inconsistent states I think it would

be a

workable
solution to only honour the UTF8 indicator in the headers when the

UTF8

flag
is not available from metadata. In a well known UTF8 context where

the

SMTP
protocol or the message store already "knows" that the message is

UTF8

the
indicator in the headers can be ignored.


That assumes people will read the standard. It's far more likely that,
given an obvious indicator, they will simply use it.

Is this a serious argument? Why would you bother writing a standard if
you
don't expect people to read it?


It's deadly serious. There's quite a lot of monkey-see-monkey-do
out there.

Your "why bother" argument is bogus though. Other people do read the
standards;
enough to make it worthwhile to develop them. And some finally read them
when
they find their hack didn't work.


Is this enough reason for designing the standard for the monkeys?

I think it is generally desirable to reduce (or at least not

increase)

the amount
of heuristics required to successfully parse a MIME message. We

should

try to
learn from previous mistakes instead of repeating them.


That's the absolute worst example you could have picked, because the

most

serious design error in MIME is the MIME-Version: field. You know, the
field
that tells you whether or not a given message is a MIME message. Sound
familiar?

I don¹t understand this comment. What example are you referring to? (Of
course
I am familiar with the MIME-Version: header, I have read the
corresponding
RFC
many times)


The MIME-Version field has turned into a wart on the protocol. There's no
way
to bump the version since too many things are hard-coded to look for the
1.0,
so it's primary purpose of providing a version indicator is gone. We're
stuck
at 1.0. You can't even put a comment on the field since some agents are
known
to hate that.

And on the other side, a lot of agents will assume MIME even if the if
field
isn't present. It also gets attached willy-nilly to a lot of non-MIME
messages
because that's easier than checking.

As a result the information value of the field is essentially
nonexistent: You
have to attach it to MIME messages but you cannot count on it to tell you
anything.

It's a fully worked example of how redundant indicators turn into warts.
And in
the case of MIME-Version, despite being in the standard from day one this
process was more or less complete in a couple of years.


Thanks for explaining. As I said before, I am familiar with the
MIME-Version: 
header, but I have not used is as an example. (You said it was the absolute
worst example I could have picked)

Daniel

_______________________________________________
ietf-822 mailing list
ietf-822(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-822