ietf-mailsig
[Top] [All Lists]

Feedback on DKIM draft (long)

2005-07-14 16:21:11

[I initially sent my feedback via private mail, but by good adivce,
 was asked to also send it to the ietf-mailsig list.]

General Comments:

* There is definitely some duplication of effort going on with this draft
  and with things like Meta-Signatures, <http://www.metasignatures.org/>.

  The problem with differing digest/signing specs is implementators
  have to deal with all these variants vs implementing a single
  algorithm that can be applied to multiple applications.

  I hope that NIH syndrome is not the main cause of having different
  digest/signing algorithms.


Draft-specific Comments:

* Section 1.1

  You mention a trusted third party is not required.  However, is
  should be allowable, and I do not see anything in this draft that
  supports a trusted-third-party system.  For example, the support
  for X509 certificates of signing keys should be allowed.

* Section 3.2:

  Is there any reason that tag=value syntax does follow the parameter=value
  syntax defined in RFC-2045?  I.e. Are implementors required to
  implement another "tag" parsing scheme?

  I can guess at the reasons for the variation you came up with, but
  it may help to explicitly state why the scheme was adopted versus
  leveraging existing (standard) schemes.

  It may also be worth noting why something like RFC-2184 was not
  adopted.

  Note, quoted-printable tends to have a bad (mostly undeserved)
  reputation, so it may help to avoid using it, especially when its
  use has never been applied to header data (at least to my knowledge).

* Section 3.3:

  There is unnecessary information here, and information that can
  lead to ambiguous implementations.

  When it comes to cryptography, you should reference cryptographic
  standard where appropriate since those standards are very explicit
  on algorithms and processes.  For example, you should explicitly
  specify RSASSA-PKCS1-V1_5 signing and verification method must be
  used (which is defined in PKCS#1).

  Avoid "re-describing" algorithms unless you plan to use a custom
  signing method that is not defined in the PKCS specs, or other
  cryptographic-related standards.

  The term "native binary form" is ambiguous and riddled with problems.
  From a cryptographic perspective, ASN.1 DER rules are used for
  encoding all data, allowing for portability (another reason why
  crypto specs should be referenced).

* Section 3.4:

  - IMHO, the "nowsp" algorithm is questionable, especially from
    a cryptographic perspective.  You mention to ignore all SWSP
    in the body.  I think this leaves to too much unnecessay data
    variation to still permit a valid signature verification.
    I.e. "Hello World" is the same as "HelloWorld".  Or do
    I misunderstand the algorithm?

    Here is the canonicalization algorithm I recommend (based upon
    existing work -- OpenPGP and S/MIME -- and common misbehaviors
    of mail software):

    The canoncicalization process will be different for header
    fields vs message bodies.  First, let's use the following base
    definitions:

        WSP = HTAB / SP
            ; white space character
        HTAB = %x09
            ; horizontal tab
        SP = %x20
            ; space
        LWSP = *(WSP / CRLF WSP)
            ; linear white space (past newline)

    which are extracted from relevant RFCs.

    For header fields, the following should suffice for canonicalization:

      1. Strip all WSP characters at the end of each line of a header field,
         before any unfolding is done.
      2. Unfold any fields that are folded.
      3. Convert multiple WSP characters into a single SP character.
      4. Convert all WSP characters into SP characters.
         (See Note below about step 4.)
      5. Convert field names to lowercase.
      6. Concatenate each field value together, inserting a CRLF sequence
         between each field value (according to RFC-2822, the
         CRLF is not considered part of the field).

    NOTE: Step (4) may be ommitted, but I have encounted software that
    refolds and changes some whitespace (e.g. spaces to tabs).  Since
    whitespace modification is definitely abhorent, you may not want
    to entertain trying to deal with it.

    For message bodies, I recommend the following canonicalization process:

      1. All EOL sequences MUST be converted to CRLF (the canonical EOL
         specified in RFC-2822).
      2. All trailing WSP characters of each line of text MUST be removed.
      3. All trailing LWSP characters at the end of the body MUST be removed.

    If the digest will include the combination of header fields and
    message body (or message body parts), a CRLF must be included
    between each component during digest calculation.

    From an implementation perspective, such canonicalization processing
    can be done efficiently and be done stream based.  I.e.  As
    the data is read, the canonicalization process can be done before
    fed into the cryptographic digest procedures.  Cryptographic
    libraries (like openssl) support incremental digest computation,
    so canonicalization data is very temporary and can be limited
    to a well-defined buffer size in memory.

  - I would add a comment that "simple" is not recommended since
    failure rate can be high.  Also, the current algorithm for it
    violates semantics of RFC-2822.  RFC-2822 states:

      Each header field should be treated in its unfolded form for
      further syntactic and semantic evaluation.
            -- RFC-2822, Sec 2.2.3.

    At a minimum, the "simple" algorithm should require unfolding
    of header fields.  If you choose not to require unfolding, you
    should add a note about why this is done in violation of
    RFC-2822 semantics.

    From a performance perspective, I do not think "simple" buys
    much.

  - I find it odd that the header field that will contain the
    signature (DKIM-Signature) must also be included in the
    signing/verification process.

    Why isn't the signature data provided in its own separate
    header field to avoid having to extract out the sig data
    first and dealing with ambiguities of whitespace?  For example,
    is the whitespace before and after the "b=" tag also removed,
    or only the whitespace after (or before)?

    I'd recommend two header fields, one for the meta-information
    and one just to contain the signature.  This way, no unique
    processing is required for the meta-information field, it
    can be processed like all other header fields:

      DKIM-Spec: ...
      DKIM-Signature: ...

    I am speculating that the rational to put everything in one field
    is to make multiple DKIM-Signature fields possible without
    the problem of knowing with signature applies to which DKIM-Spec
    field.  I think this problem is solvable.

  - For greatest flexible, the digest should be separated out, and
    it, along with meta-information is what is signed.  Meta-Signatures
    takes this approach.

* Section 3.5:

  - The ordering restriction of trace header fields (mainly Received)
    is explicitly defined in RFC-2821, not in RFC-2822.

    Note, since DKIM fields are to be handled like trace fields, then
    splitting the signature from the meta-info can be done, with the
    field order in the header defining the which field goes with which
    in case of multiple signatures.

  - IMHO, the "v=" tag should be required.  It always better practice
    to be explicit when possible.  For example, if the "v=" is missing,
    could it be due to an error by the signer?  Data corruption
    during transit?

  - IMHO, I would not support have header fields listed in "h=" unless
    they are present during signing.  Otherwise, when a header field
    is missing, one does not know if this was intentional or a header
    field got dropped in transmission (either accidently or by filters)
    before verification.  Unless there is a clear reason to support
    listing fields that are not present, why allow it?

    How are multiple same-named header fields handled?  Are they
    listed multiple times in "h="?  You appear to answer this
    in section 5.2.2.  It should be mentioned here.  For brevity,
    you may want to support a "count" indicator to reduce space:

      h=Received/2

  - Why is "i=" need to be quoted-printable?  Goes back to earlier
    comment about qp.  Require "i=" value to be a quoted-string.

  - For "l=", the term "octet" should be used instead of "byte."
    Octet appears to be the preferred term used in mail-based
    specs.

    Why is the hash part of the length?

    Because of security implications (which you note) you should
    probably drop this tag.  It is not needed.

  - For the "t=" tag, why not use ISO date/time format.  For example:

      20050714T045532

    Unix second time format is, well, to unixy.  For email, such
    OS-specific type formats should be avoided.  I recommend using
    well-defined standard date format for all dates.

    Plus, the ISO format is more readable by humans.

  - "z=" is a mess, and can eventually lead a DKIM field that is
    longer than 988 octets (RFC-2822 limit).

    You mention that verifiers should not use copied header fields
    for verification.  I do not agree with this.

    This leads to a discussion about when signing is done.  There is
    an implication that the sender MTA should do this, right before
    transmitting to final destination.  Well, this does limit when
    signing can be done and who can do it.

    It also does not protect from potential address re-write rules.
    I.e. Even if the initial signing MTA does signing after
    rewrite rules, intermediary ones may still do address re-writing.
    One benefit of saved headers is the verifying agent could utilize
    them for signature verification, bypassing any re-write rules
    that may happen by MTAs.

    It would be more flexible to have a signing method that is
    not dependent on where the signing occurs (e.g. an MUA could
    do it vs an MTA).

    We can discuss further if you like.  As of now, "z=" should be
    eliminated or the concept of saved header fields should be
    reconsidered.

* Section 5.2.2:

  RFC-2822 is mistakenly referenced for trace order restrictions, when
  it is RFC-2821, section 4.4.

* Section 9.2:

  - The discussion here implies that signing can be done at the MUA
    level, which ties into my comments above about saved header fields
    and address re-writing cases.

  - The need key revocation policies is implied here, which is
    touched upon in Section 9.6.  Key management is critical for
    the system to be secure and trusted by users, therefore, it
    definitely should be spelled out, potentially in a separate
    specification.

--ewh
-- 
Earl Hood, <earl(_at_)earlhood(_dot_)com>
Web: <http://www.earlhood.com/>
PGP Public Key: <http://www.earlhood.com/gpgpubkey.txt>


<Prev in Thread] Current Thread [Next in Thread>