Re: 'header' test and whitespace


I see from Alexey's minutes that I was supposed to post the text that
I proposed.  I didn't remember that; sorry.  Philip started the
discussion, but didn't post the text.  I gave Philip XML, and here's
what I suggested:

<t>
Because the meaning of leading and trailing whitespace characters in header
fields is ambiguous, and their survival in message transport and processing
is inconsistent, ALL handling of message headers in Sieve MUST normalize
the header field values.  The normalization is similar to, but not the same
as, unfolding (see RFC2822), and is done as follows:
<list style="number">
  <t>
    Remove leading and trailing whitespace characters from each line of
    the header field (multiple lines, in the case of multi-line continuation).
  </t>
  <t>
    Remove the delimiting CRLF from each line.
  </t>
  <t>
    Catenate the lines in order, inserting one ASCII space character (0x20)
    between each pair.
  </t>
</list>
</t>

<t>
To show how this normalization works, we use the character "~" (tilde) to
represent the ASCII space character in the following example.
This normalization will result in all of the following normalizing to the
same value for the subject field, "a~b~~~c~d":
<list style="empty">
  <t>
    Subject:~a~b~~~c~d
  </t>
  <t>
    Subject:a~b~~~c~d~
  </t>
  <t>
    Subject:a~b~~~c~
    <vspace/>
    ~~~~d~~
  </t>
  <t>
    Subject:~~~~~a
    <vspace/>
    ~~~~b~~~c~~~~~
    <vspace/>
    ~~~~d
  </t>
  <t>
    Subject:~a
    <vspace/>
    ~b~~~c
    <vspace/>
    ~d
  </t>
</list>
</t>


Note that I didn't suggest RFC2047-decoding, but I think that's a reasonable
addition to this.  Alternatively, we could specify that strings be decoded
in comparisons (perhaps specified by an option like ":decode" or ":raw").

Barry

--
Barry Leiba, Pervasive Computing Technology  
(leiba(_at_)watson(_dot_)ibm(_dot_)com)
http://www.research.ibm.com/people/l/leiba
http://www.research.ibm.com/spam