Re: Angle brackets surrounding Content-ID


Jacob Palme wrote:

I have received e-mail from an implementor, who has noted
a discrepancy between RFC 2110 and 2112 regarding the syntax
of the start parameter to the Content-Type: Multipart/related
defined in RFC2112 and used in some examples in RFC 2110.

RFC2112 says that there should be angle brackets around
the value of the start parameter, e.g.

    Content-Type: Multipart/related; boundary="boundary-example-1";
    type=Text/HTML; start=<foo3*foo1(_at_)bar(_dot_)net>

but RFC2110 has some examples without these angle brackets, e.g.

    Content-Type: Multipart/related; boundary="boundary-example-1";
    type=Text/HTML; start=foo3*foo1(_at_)bar(_dot_)net

My conclusion is that since RFC2112 is the document which
defines this parameter, it should take precedence and
RFC2110 is incorrect in its examples. However, a good
implementor should certainly strip these angle brackets
from all Message-IDs (wherever they occur) before
comparing them with other Message-IDs or using them
in any way. A good implementor should also accept the
incorrect syntax without these angle brackets in what
it receives, but always use the angle brackets in what
it produces.

With this message I only want to check that other e-mail
experts agree with this.


First, both RFC 2110 and 2112 are obsolete, 2110 having been
superseded by 2557 and 2112 having been obsoleted by RFC 2387.

Second, there are several different types of things being discussed:
Content-ID (RFC 2045)
Message-ID (RFCs 822/2822)
start parameters (RFCs 2387/2045/2231 and the RFC editor errata page)

Related constructs include cid and mid URIs (RFC 2392).

RFC 2387 goes to some length to clarify the issue described;
the start parameter is supposed to include the angle brackets
which are an integral part of the RFC 822 msg-id construct.

Content identifiers and message identifiers use similar syntax.
However they serve distinct purposes. Prior to RFC 2822, the
syntax was identical, viz. RFC 822 msg-id, which in turn was
identical syntax to an RFC 822 route-addr with no route, i.e.
an angle-bracketed addr-spec (the latter consisting of a
local-part, '@', and a domain).  RFC 2822 defines msg-id
differently, using id-left and id-right, merely recommending
that id-right be a domain.  Having said that, all of the other
RFCs mentioned above use RFC 822 as their basis, not 2822.
There are implications for comparisons (below).

Temporarily leaving aside "strip these angle brackets", there
are several issues that should be taken into consideration
when comparing identifiers:

* domain names are case-insensitive, so 
"<1234(_at_)foo(_dot_)example(_dot_)net>"
  is semantically identical to "<1234(_at_)FoO(_dot_)ExAmPlE(_dot_)nEt>".  RFC
  2822 presents a problem here, because a receiver can never
  be sure whether or not an RFC 2822 id-right is a case-
  insensitive domain name or something else (which might be
  case-sensitive).
* local-parts and domain literals need to be canonicalized
  w.r.t. quoting conventions prior to comparisons. The
  following are all semantically identical:
  <foo(_dot_)bar(_at_)[1(_dot_)2(_dot_)3(_dot_)4]> (canonical form)
  <"foo.bar"@[1\.2.3.4]>
  <"f\oo.bar"@[1\.2.3.4]>
  <"f\oo\.bar"@[1\.2.3.4]>
  <"f\oo\.bar"@[1\.\2.3.4]>
* if one or more identifiers being compared are in a parameter
  of a MIME Content-Type or Content-Disposition field must
  reassemble any such parameter fragments, remove any RFC
  2231-specific character encoding and/or quoting present,
  convert to a common charset, and possibly consider
  specified language; paying particular attention to the
  published (
  http://www.rfc-editor.org/cgi-bin/errata.pl
  ) errata for RFC 2231 and any other relevant RFCs.
* if one or more identifiers was obtained from a URI, any
  URI-encoding (RFC 2396) must be undone prior to comparison
* identifier syntax (modulo RFC 2822 introductions) and the
  context in which identifiers appear, generally permit
  comments, whitespace, and line-folding, which should be
  removed prior to comparison
Now, regarding "strip these angle-brackets":
* if an implementation chooses to strip the angle brackets
  which are an integral part of a msg-id as used in Content-ID
  and Message-ID fields and in "start" parameters (but NOT) in
  CID or MID URIs), it must be done carefully and properly, not
  with reckless abandon by amateurish programmers.  Note that
  either '<' or '>' or '@' or any other special character may
  appear in a local-part if quoted using either a quoted-string
  or qpair backslash quoting, and do NOT signal the end of the
  identifier and must NOT be stripped (though quoting must be
  canonicalized for comparison as noted above).
* handling of the angle brackets must also be performed with due
  consideration to security issues, as these characters may have
  special meaning to some library functions, and there may well
  be security implications.

In the specific case of CID or MID URIs, the angle brackets are
omitted. When comparing a CID or MID URI to an identifier
obtained from a different source (e.g. when comparing an MID
URI to a Message-ID header field body) an implementation could
either properly and carefully strip the delimiting angle brackets
(ONLY!) from the non-CID/MID identifier or add brackets to the
CID/MID-derived identifier.

My personal recommendation for such comparisons would be to
reassemble fragments, undo quoting and encoding, carefully and
correctly separate the identifier into local-part and domain,
checking for correct identifier syntax (exactly one '@', which
incidentally in very old messages might be " at "), carefully
and correctly canonicalize the local-part and domain, then
perform a case-insensitive comparison of the domains and a
case-sensitive comparison of the local-parts. Specifically
for identifiers, charset and language can probably be ignored
in the event that one identifier is obtained from a parameter
(because identifiers in field bodies (excepting parameters)
are always in a subset of US-ASCII which is invariant across
charsets likely to be encountered and have no language. I
would verify that the angle brackets were present where
required, and not present where forbidden (e.g. after handling
the considerations mentioned above, if there is a '>' at the
end of the domain part, somebody fouled up badly (or RFC 2822
syntax is being used) because a domain name never contains
that character.