ietf-smtp
[Top] [All Lists]

Re: [ietf-smtp] Proper definition of the term "email payload".

2019-03-31 22:05:04
Got it Mr. Klensin. Thanks for the input. Have a nice day :-)



On Mon, Apr 1, 2019 at 8:23 AM John C Klensin <john-ietf(_at_)jck(_dot_)com> 
wrote:

Hi.

In the hope that we don't need to iterate further, let me try to
express the problem in a different way, one that is entirely
consistent with Dave Crocker's comments and a few others.

The difficult with a term like "payload" is that the definition
depends on where one is looking from.  Internet email is a
layered system and the perspective depends on the layer.  For
SMTP (from RFC 821 through 5321 fairly consistently), what it is
transferring ("the payload") would be the message content
starting after the DATA command (or equivalent) and continuing
to the end of data indication (normally CRLF.CRLF).  From a
header specification standpoint pre-MIME (e.g., from the
perspective of RFC 822), the payload would probably the message
body after the blank line that indicates the end of the headers
although I suppose one could construct an argument that would
distinguish between trace information and everything else.  When
we get to MIME (and especially "content-type=multipart/"), one
might claim that multipart messages have multiple payloads, one
per body part after the message headers and MIME body part
headers are excluded.

Dave noted that the term payload "does not appear at all in RFC
5321 or RFC 5322 or RFC 3501".    As the author of one of those
documents, that omission is no accident and is closely connected
to the discussion above.

best,
   john



--On Monday, April 1, 2019 06:34 +0530 Viruthagiri
Thirumavalavan <giri(_at_)dombox(_dot_)org> wrote:

Thanks Mark. You have written beautifully. And yes your answer
makes sense.

On Mon, Apr 1, 2019 at 6:21 AM Mark Sapiro <mark(_at_)msapiro(_dot_)net>
wrote:

To elaborate just a bit on what Barry says, as far as the
Python email library is concerned, the stuff that comes over
the wire as a response to the SMTP DATA command (RFC 821 and
successors) is the email.message object. If you want to see
the whole thing, you use the as_string() or as_bytes()
methods on that object.

That object consists of headers and body as described in RFC
822 and successors. The Python email library refers to that
body as the payload of that message object.

I think this is all consistent and reasonable in terms of
what the email library is trying to do.

In the RFC 821 context, the metadata is the envelope which
has a sender and recipients and the entire message is the
data, but in the RFC 822 context, the data is split into
headers and body and we choose to call the body the payload.

This is a semantic issue. In your "box of beer" example, the
service that delivers it considers the payload to be the box
and contents, but the consumer considers the payload to be
only the contents (and maybe just the beer and not the cans).
Take your pick.

I.e., there is no one definitive answer to your question. You
have reasons for considering the RFC 821 DATA to be the
payload, and you are not wrong, and we have reasons for
considering the RFC 822 body to be the payload, and we are
not wrong either

Forwarded message.
 *From: *Barry Warsaw <barry(_at_)python(_dot_)org
 <mailto:barry(_at_)python(_dot_)org>> *Subject: **Re: Proper
 definition of the term "email payload".* *Date: *March 31,
 2019 at 17:09:30 PDT
 *To: *Viruthagiri Thirumavalavan <giri(_at_)dombox(_dot_)org
 <mailto:giri(_at_)dombox(_dot_)org>>
 *Cc: *ietf-smtp(_at_)ietf(_dot_)org 
<mailto:ietf-smtp(_at_)ietf(_dot_)org>, "R.
 David Murray" <rdmurray(_at_)bitdance(_dot_)com
 <mailto:rdmurray(_at_)bitdance(_dot_)com>>, Mark Sapiro
 <msapiro(_at_)value(_dot_)net <mailto:msapiro(_at_)value(_dot_)net>>


 Hi, I hope you (and they!) don't mind me CCing two other
 people who have worked extensively on Python's email
 library, and in fact much more than myself in the recent
 years.  RDM has done the bulk of the work on the
 new-in-Python-3 APIs, and Mark is a long time core
 developer on GNU Mailman (the project that spawned
 Python's email library).

 There are two ways I think about this, and I'll use the
 original RFC numbers to clarify.  There's RFC 821, which
 describes the on-the-wire protocol for SMTP transfers,
 embodied in Python's smtplib library. Then there's RFC
  822, which describes the format of the content of that
 SMTP transfer, but not the protocol itself.  Of course
 there are lots of developments along the way, but that's
 unimportant for the way I think about these things.

 What I think you are describing, where the headers are
 part of the payload, is more akin to RFC 821.  That's
 the payload as far as the actual bytes-on-the-wire are
 concerned.  Python's email library is for RFC 822 (and
 the many, many elaborations thereof), so in that case, the
 payload is the body of the message.  On more practical
 terms, the implementation makes this clear, and the APIs
 you use to change headers are different in form and
 function than the ones you use to change the body of the
 message.

 I think the Python documentation is fairly clear about this
 distinction.  At least, I don't remember seeing any
 feedback to the contrary, although RDM may have a better
 sense of that.  Of course, we are always open to
 improvements in Python's documentation.

 Cheers,
 -Barry

On Mar 31, 2019, at 10:57, Viruthagiri Thirumavalavan
<giri(_at_)dombox(_dot_)org <mailto:giri(_at_)dombox(_dot_)org>> wrote:

Hello IETF,

I need some clarification about the term "email payload".

Wikipedia says

In computing and telecommunications, the payload is the
part of transmitted data that is the actual intended
message. Headers and metadata are sent only to enable
payload delivery

Python email library documentation says this.

An email message consists of headers and a payload (which
is also referred to as the content). Headers are RFC 5322
or RFC 6532 style field names and values, where the field
name and value are separated by a colon. The colon is not
part of either the field name or the field value. The
payload may be a simple text message, or a binary object,
or a structured sequence of sub-messages each with their
own set of headers and their own payload. The latter type
of payload is indicated by the message having a MIME type
such as multipart/* or message/rfc822.

It looks like Python email library author "Barry Warsaw"
followed similar definition found in wikipedia when
defining his library functions. But I feel like calling
ONLY the email "Body Part" as "payload" is wrong. The term
"payload" should refer to the entire "Message Part" in
Email. i.e. Both Headers and Body.

When you place an order for a "box of beer", you are not
paying only for the "beer cans", but also paying for the
"container box". So the payload here is the entire box.

HTTP Example:

HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Content-Type: text/html
Content-Length: 1234

<html>

<head>
<title>Hello World!</title>
</head>

<body>
(more contents)
 .
 .
 .
</body>
</html>


If you take a closer look at this HTTP example, the
headers are only just instructions for the client. The end
user doesn't need to worry about any piece of information
found in those headers. So wikipedia definition perfectly
suited for applications like HTTP.

But in Email, When a mail get transferred from Hop A to
Hop C via Hop B, the user in Hop A actually wants to
deliver the whole "message part" to Hop C. If Hop B,
strips the headers and transfer only the "Body" part, then
it becomes an "Anonymous" message. So the end user
requires the information found in the "Headers" too. e.g.
From, Subject, Date etc. [In HTTP, title tag is equivalent
to Subject and it's found in the "head" Markup, not in the
HTTP Headers]

As you can see, the user is interested in the "entire
message". So the term "actual intended message" should
refer to the "whole message" extracted from the DATA
command. The "actual intended message" should be pictured
like this in email.

Also note that, when you migrate your mails to another
mail service, you need the whole message with Headers, not
just the body.

Based on my points, I believe calling only the "Body" part
as "Payload" is wrong. I would love to hear your thoughts
on this. If Barry Warsaw is here, would love to know your
opinion too.

PS: I did actually ask this question 2 years back in a
stackexchange website. I wasn't satisfied with the answer
I got there. I don't want to use the term incorrectly in
my application. That's why I'm posting it here.

Thanks
--
Best Regards,

Viruthagiri Thirumavalavan
Dombox, Inc.


--
Mark Sapiro <mark(_at_)msapiro(_dot_)net>        The highway is for
gamblers, San Francisco Bay Area, California    better use
your sense - B. Dylan







-- 
Best Regards,

Viruthagiri Thirumavalavan
Dombox, Inc.
_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp