As far as drafts submitted within last few days, I finished first version of
the draft for new digest header fields and submitted it yesterday:
http://www.metasignatures.org/draft-leibzon-content-digest-edigest-00.html
http://www.metasignatures.org/draft-leibzon-content-digest-edigest-00.txt
Note: I've attempted before to make this post including draft as an
attachment but I don't that worked. Sorry if you receive another
copy of this post later...
There have also been number of changes from the description I gave before
about EDigest. The biggest is that there are now two header fields -
Content-Digest and EDigest. Content-Digest can be used as replacement
of MD5-Digest with similar rules, i.e. it should be added only by the
originator of transmission and not by intermediate agents and there can
not be more then one Content-Digest in the MIME or message header. Its
syntax is the same as EDigest but without "u" tag/parameter and so it
only provides digest hash for the content in which header it appears.
The syntax for Content-Digest & EDigest has been changed slightly from
the syntax I last mentioned about EDigest:
1. The syntax and description is now similar to MIME fields and same
parser can be used to extract parameters and work with the header
field. That means that ";" is now separator between different tags
(which are now called parameters as per MIME convention). Full ABNF
syntax is included in the draft.
2. Canonicalization parameter "c" has been extended to provide
information separately about header fields canonicalization and
about content body data canonicalization. Full specification is
now "c=simple,mimeform" (this is default value). There has also
been changes in canonicalization method names and in fact how to
do canonicalization is described quite a bit in detail (too much
probably, takes disproportionate amount of space in draft).
3. Parameter indicating number of bytes in canonicalized content is now
called 's' meaning "size".
4. Cryptographic algorithm 'a' parameter is now optional and by default
it is assumed to be "sha1". That means there is now basically only
two required parameters - "v" and "d" making simple cases of using
Content-Digest and EDigest very easy and compact.
5. Time of creation and unique id 't' parameters is now using number
in ISO8601 format (instead of unix seconds), like t=20050704142754
6. The URLs in 'u' are now listed in same way as in References header
in email, i.e. they are enclosed in "<..>" and separated by FWS.
If URI is not specified it is taken to be "cid:" default
7. 'e' (encoding) has been dropped from the spec. As with MD5-Digest
the data used should be what is before transfer-encoding is applied.
Note: I'm still thinking about this. It seems to be more correct
standard-wise to require to "decode" the transfer encoding, before
creating hash of content, but as far as creating digest hash at
intermediate servers (EDigest) that is not convenient. But this
probably only makes a difference for quoted-printable.
Now regarding canonicalization, as mentioned methods of doing it have
been specified in detail and are now different for header and for body.
Both header and body now support 'bare' canonicalization method which
is somewhat similar to "simple" in DK and basically means take the
data as is (i.e. no canonicalization - previously I called this "all").
Header fields default canonicalization method is now 'simple' which
requires that all multi-line header fields be wrapped back into single
line (and properly terminated with CRLF), that repeated whitespace
characters be changed into single one and header field name be changed
to lowercase.
"Nofws" canonicalization for header fields is now that all non-printable
characters be removed all together (i.e. only characters with ascii code
between 33 and 126 remain).
For body data new 'text' canonicalization is somewhat like 'simple'
for header and requires that all text lines be properly terminated
with CRLF and that multiple white-space characters are replaced with
single one. 'nofws' is also available for body and requires removal
of all special characters and CRLF or any other line terminators.
For default for data body there is now special canonicalization name
'mimeform' which is not canonicalization method but defaults to
'text' for "Content-Type: text/*" and to bare for all other content.
I forgot to add Acknowledgments section (hardly the only thing I forgot,
security considerations is at the minimum too), but there were several
people who sent me comments on the 0.18 and 0.21 Edigest and that is
appreciated. In particular I'd like to think Earl Hood for his comments
that helped quite a bit in production of the draft. More comments are
obviously welcome both in public and private.
-----
William Leibzon, Elan Networks:
mailto: william(_at_)elan(_dot_)net
Anti-Spam and Email Security Research Worksite:
http://www.elan.net/~william/emailsecurity/