Re: New I-D: draft-koch-subject-tags-considered-00.txt


On Thu November 18 2004 16:42, Peter Koch wrote:


Dear ``email people'',

motivated by a discussion on the main IETF list a couple of weeks ago I
just submitted this Internet Draft

        Title           : Subject: [tags] Considered Harmful
        Author(s)       : P. Koch
        Filename        : draft-koch-subject-tags-considered-00.txt
        Pages           : 8
        Date            : 2004-11-18


The draft doesn't mention two very important considerations:

1. the defined semantics of the field (RFC 2822):

   These three fields are intended to have only human-readable content
   with information about the message.  The "Subject:" field is the most
   common and contains a short string identifying the topic of the
   message.

Added cruft identifying some list, etc. does not conform to the
defined semantics ("identifying the topic of the message").

2. the defined syntax of the field (RFC 2822 again):

   The "Subject:" and "Comments:" fields are
   unstructured fields as defined in section 2.2.1, and therefore may
   contain text or folding white space.

   subject         =       "Subject:" unstructured CRLF

   Some field bodies in this standard are defined simply as
   "unstructured" (which is specified below as any US-ASCII characters,
   except for CR and LF) with no further restrictions.  These are
   referred to as unstructured field bodies.  Semantically, unstructured
   field bodies are simply to be treated as a single line of characters
   with no further processing (except for header "folding" and
   "unfolding" as described in section 2.2.3).

Aside from lack of necessity for groping through a Subject field
in a misguided attempt to find information unrelated to the topic
of the message (see below), unstructured fields (as noted above)
are intended to convey human-readable content, not machine-
parsed data.  Attempting to impose structure on an unstructured
field is inappropriate.  And section 2.7 of the draft illustrates the
folly of trying to base machine processing decisions on text
(in the RFC 2277 sense of "text") intended as "only human-readable"
(bearing in mind the fact that RFC 2047 encoded-words may only
appear in a subset of non-protocol, human-readable portions of
header field bodies).

Draft section 3 (identify a list message; three indistinct variants)
can easily and reliably be accomplished using envelope information
and List- message header fields, either with a Sieve (RFC 3028)
filter or a similar mechanism.  Here is a Sieve language snippet for
identifying an ietf-822 list message:

if anyof (address :contains ["From", "Sender", "Return-Path", "To", "CC", 
"Bcc"] "ietf-822", header :contains ["List-Post", "List-ID"] "ietf-822") {
        fileinto "INBOX.IETF-822";
        stop;
}

For some insight into another reason why adding cruft to the
Subject field is a problem, consider the contortions necessary
to remove such cruft for the purpose of getting to the actual
topic-identifying content, as discussed in
draft-ietf-imapext-sort-17.txt.