Re: Draft for signed headers


Here are my notes on this draft.  Some of its basic principles are
sound, but there are some major difficulties as I see it:


To sum up:
        You want to sign the body, or you have no security
        You don't want users to check signatures, they can't due to
                revocation problems.
        You don't want users to check MD5s, they want bad articles to
                not even show up in the overview.  Reject after download
                is not that exciting, and also has security holes to
                denial of service attacks.
        OPENPGP is really not at all a suitable choice for USENET.
        The canonicalization algorithm is far too complex and I have
                grave fears about it working.  Indeed, the risks of
                people getting the canonicalization wrong, at least in
                USENET seem greater than the risks of gateways munging
                articles in a way that they invalidate the sig!
        Verified header needs more thought as to usage.

  An existing experimental protocol "pgpverify" [PGPVERIFY] is


How widespread is it?  We used it but really didn't find a significant
population listening to it.  Has that changed of late?

  There also exist protocols for the cryptographic signature of
  bodies of articles, notably S/Mime and PGP/Mime [RFC2015], and it


Multipart/signed is more likely to have relevance to USENET than these.

  The bodies of articles, Mime messages and multiparts are not
  directly included in the Signature. Rather, the intention is that
  each such body part should have a Content-MD5 (or similar) header
  computed for it, and that header should then be included in the
  Signature instead.


While that works, what is the goal?  The only value I can see in this is
that it can save you the trouble to test the MD5 or other hash of the body.
But to encourage non-testing of the body is to invite a security hole.

In what circumstances would we not want to also verify the body?  The only
ones I can think of are control messages where the body is meaningless and
normally blank.


  There is also provision for Verified headers which may be added by
  agents that have checked a Signed header. Verified headers may
  themselves be included in further Signed headers; this may be
  especially useful in the case of gateways which find it necessary
  to change an article in ways that invalidate an original signature.


If the original signature is invalidated, why propagate it?  It is of
value to nobody.  Anybody who has the original pre-modified headers/body
also will have the old signature. 

The main purpose for having some sort of signature other that the original
is signature collapse, where it is inherent that you remove the original.

Another purpose is a gateway where you want to keep both.  However we
would want to discourage that.  It bloats things.  Better that the
original sig be traceable to a common root where possible.

  Every effort has been made to ensure that signatures remain
  verifiable in spite of all reasonable (and even unreasonable)
  changes to which they may be subjected in transit. These include
  changes to the Content-Transfer-Encoding of body parts (a principle
  reason for including them only via the Content-MD5 header), changes
  in the order of headers and of their layout, and encodings and re-
  encodings of unusual character sets. This is to be achieved by
  converting headers into a canonical form before they are signed.
  New headers, yet to be invented, need provide no problem, and there
  is no commitment to any particular character set (provided header-
  names remain in ASCII, as at present).


We debated this quite a bit earlier.   Several people reported that
in the digital signature community, the idea of canonicalization is
looked upon with great skepticism.  It's easy to get it wrong.  The
closer you can get to "hash this byte stream" the better.  That's
the philosophy of multipart/signed.

As for transfer encodings, I think that's simply out of the question.
It means that to check the signature -- and checking the headers without
checking the body should most definitely not be considered checking
the signature, not if the body means anything -- one must decode and
put in canconical form the entire article including body!  It is probably
not appropriate for news transports to do this.   It makes no sense
for news readers to do that -- newsreaders don't want to and probably can't
check signatures due to revocation list issues, only transports can.

2.1.  Syntax of the Signed header

     Signed         = "Signed" ["-" DIGIT] ":" SPACE
                      [CFWS] protocol CFWS key-id CFWS header-ref-list
                      CFWS signature CRLF


What is thought on whether multiple headers should be:

        Signed-1:
        Signed-2:

vs.
        Signed: l=1;
        Signed: l=2;

I know in mail the Received header gets simply done multiple times.  Is
there any feeling each way?

Other than that your syntax is pretty similar to my own, however I
recommend the following general principles:

        a) Avoid positional parameters.  They are less extensible and
           harder to read and easier to get wrong.

        b) Use named parameters, as in:

                Signed: pro=U; key=keyid; Head=+From; Sig=asfasdfsadfasdf

This way you are in MIME standard form, and can add extra parameters
as needed, and in fact if you want, the protocol and Header list parameters
can have defaults and simply not be present 99% of the time, which saves
space.

  The key-id identifies the key used to generate the signature in a
  notation dependent upon the protocol (but commonly "0x" followed by
  hexadecimal digits). The CFWS following it MAY include a comment
  containing an identification of the person or entity which created
  the signature.


My expectation is that key-ids need to be allocated in spaces in a
hierarchy, and in fact should probably follow the syntax of email addresses.

In fact, one can have a default key-ID for a given E-mail address that
contains the E-mail address or is the E-mail address.

Most keys may well be issued by sites, and sites will be given the power
to issue keys within their domains, but not in others.  Using the e-mail
address syntax provides an easy way to lay this out.


  The header-ref "mail-standard" is a macro representing a set of
  common mail headers that SHOULD normally be included when signing
  the headers of a mail message, and is defined as the list

     Date, From, Reply-To, To, Cc, In-Reply-To, References, Subject,
     Keywords, Content-Type, Content-ID

  The header-ref "news-standard" performs the same function for news
  articles, and is defined as the list

     Date, Newsgroups, Distribution, Message-ID, From, Reply-To,
     Followup-To, References, Subject, Keywords, Control, Content-Type,
     Content-ID
[Well, you can imagine those lists are going to be fiercely argued
over]


Much like my own syntax so how can I disagree!  Actually the lists need
not have too much arguing because you can add modifiers.  The "right"
list is simply the most common one, to avoid bulking out lines.

Note that it's fine to include headers that are not commonly used, but
which are signed when used.   Also, you said you wanted to include
content-md5.


  (ii) A header-ref of the form "XXXX/<m>" (or "XXXX/<m>/<n>..."),
       where <m> and <n> are numbers and the current level contains a
       "Content-Type: multipart/*" header, references the header that
       would be referenced by "XXXX" alone (or by "XXXX/<n>...") in
       the <m>th sub-part of that multipart, that sub-part now being
       regarded as the current level.


I have to say this (and the whole body canonicalization) is way too complex.
The body should be a stream of bytes, and that includes internal MIME
headers.  You should not have to parse MIME to check a signature.  There
should be no references to internal headers.  If you want to sign a
mime sub-part, that's fine, I guess.  Do it within the sub-part.  The code
to do what you propose is too much, and too slow.

Plus, since a MIME central part may be content-transfer-encoded, you 
will need to decode it to check these things.


  The signature of a Signed header is constructed in accordance with
  a given header-ref-list as follows:


There is no need to "sign" a "signed" header.   Can you tell me why you
would want to do this?   A "signed" header (and any certificate) is
verifiable on its own when paired with the signed headers and body.  I
can check if it is authentic.   If I don't know the key named in the
header and have no certificate for it, what good does it do me to be
told by another signature that I am seeing the original signed header?

The protocol described seems too complex, but that's beside the point.
There is no reason to sign other people's signatures.

There is also no need to "sign" any elements of your own signed header.
Self-signing of signatures is complex and adds no value.  If the header
list, flags, key name or options are tampered with, the signature immediately
is invalidated.

  The purpose of a Signed header is solely to establish that the
  headers referenced in it were present in an article when that
  article passed through the hands of the person or entity that
  generated the signature (and hence that it did indeed pass through


I tend to use the term "keyholder" here.  It verifies that the holder of
the key did indeed verify or certify what is signed.  Keyholders can be
people, machines, job titles, etc.

  those hands). It SHOULD NOT be taken as an endorsement of whatever
  is contained in the body of the article. If the contents of the
  body require such endorsement, then the body SHOULD be signed
  separately, for example in accordance with PGP/Mime [RFC2015].
[Hmm! I expect that paragraph to raise some discussion.]


Quite.  Header and body are one.  It's nice to verify my headers but if
somebody can just replace my body with their own you've lulled me into
a very false sense of security.


  Signatures will typically be generated by the originators of
  articles (to prove the origin), by moderators of moderated
  newsgroups (to testify to their Approved header), by managers of
  mailing lists, and by gateways. They SHOULD NOT be generated by
  intermediate transports and relayers through which the article
  might pass. This is intended to be an end-to-end protocol, and
  signatures SHOULD ONLY be added when new, hitherto unsigned,
  information is added. Moreover, the set of headers included within
  the signature SHOULD be no more than is necessary to achieve the
  security desired.


Agreed, and I would further say:
        a) Gateways only would sign when moving off USENET, and their
        signatures would be removed on any re-entry to USENET.
        b) Moderators often modify bodies, they will tend to just remove
        the author's sig (if it is no longer of any use, which is the case
        if so much as a byte of the original article changes)

        On the other hand moderators that do no more than stick on an
        "Approved" header might only add their own sig.

        c) Injectors generally should not sign if the author signed.  They
        should sign if the author didn't.


       signature. If (as will indeed often be the case) it is
       required to attest that the body (or sub-part) dispatched


As noted, I think that is always the case, not often, so a design based
on it not being always the case may be in error.

Who checks the content-md5?  Not the reader, that's for sure.  (various
reasons making that impossible were outlined in earlier debate).  So
that means the news server, which in 99% of cases is also the relayer.

So you have every relayer having to canonicalize any
content-transfer-encodings to check the md5.

If there were some way I could see not checking the body, the md5 header
would be great.  But it's false security, which can be worse than no
security in some ways.


I'm still not clear on the purpose of the verified header.   Who will
verify?  What will it mean to do so?   Why would you pass on an article
which failed a check?   It's vital not to do so, since forged articles
can steal message-id space as a clever denial of service attack on
somebody's messsages.   (You have a deamon at their ISP look for their
messages, and you issue fake messages with the same message-ids, feeding
them into other ISPs.   The only thing that stops this is if your signatures
don't match and thus your attempt to use those message-ids is rejected.
that doesn't work if people keep passing on articles that failed a
signature check)

Verification has a useful purpose *outside* the article, in digests.
Digests are lists of several messages, with their hashes and message-ids,
signed by a verifier.    If you trust the verifier it lets you avoid
checking the signatures on articles, you just check the hash.

Verificiation is likely to bulk up articles.  If you know your downstream's
can't check the sig, you might as well remove it.   If you know they can,
why would they want to check yours instead?  Only to gain speed, which is
what the digest does.

If you don't know your downstreams and what they can do, is it wise to
add verified headers?

       NOTE: The Verified header is also useful in the case that a
       gateway (or a moderator) makes some change to an article that
       renders an original Signed header invalid. Such a gateway can
       therefore certify that the original form of the Signed header
       had been verified, and can then resign the article (including
       his added Verified header). Likewise, a site (such as the
       originator's own server) with a well known public key can
       verify and resign an article whose originator's public key may
       be less well known. However, Verified headers SHOULD NOT be
       added as routine by other intermediate sites.


An invalidated signature is just so many garbage bits.   What this gateway
is doing is re-signing, not verifying.   I suppose a gateway might
have some reason (if it's not on USENET) to pass along an article that
failed a signature test, but within USENET it should not.


  It is normally the business of the reading agent of the ultimate
  recipient to check the correctness of a Content-MD5 or similar
  header. Nevertheless, an earlier agent that has added a Verified
  header and also checked such a Content-MD5 header MAY so indicate
  by mentioning it in a "hashcheck" parameter (or "hashcheck-failed"
  as appropriate).


There are a number of reasons not to do it this way:

        a) Why put complexity in the reader when the server can have it
           with less computational effort?
        b) Readers want bad articles out of the overview.  It is of limited
           value to get an overview menu including bad articles, to select
           them, and have your reader download them down just to say,
           "Sorry, this one's MD5 failed"   I simply don't want the bad
           article to show up in the overview at all.


  It is a sad fact of life that those implementing agents for
  handling Netnews and Mail cannot resist the temptation to "improve"
  articles passed through them by rewriting headers that are thought


And this is one thing that a signature algorithim with only limited
canonicalization can stop cold in its tracks.

Screw around with articles and you stop dead as a relay site.  End of
story.  It's a strong incentive not to do it!

As such, canonicalization should be kept to a minimum, ie. sorting and
perhaps a minimum of FWS modification.

  Furthermore, in the case of Mail it is often required for the
  transport protocols to modify articles en route, most notably when
  articles containing octets with the 8th bit set have to be passed
  through a channel that permits only 7bit.


Let transports using such channels do it and undo it (at least in news)
Transport encoding should be the province of transports, not the format
itself.

  It is a further sad fact of life that agents which make such
  changes are not going to go away just because some standard says
  so. Therefore, the canonicalization algorithm SHOULD endeavour to


But in fact they will go away.  The moment their downstream site stops
accepting any articles they modify in some improper way, they are effectively
off the net.   This is no different than if they were making more dangerous
changes that caused the articles to vanish for other reasons (like say
removing colons)

  o    Headers may be re-folded to fit within some preferred overall
       line length.  This may result in the creation of whitespace
       where none existed before.


I'm against this one.  Relays should do it.  Do they do it?


  o    Trailing whitespace may be removed, and line endings changed
       to/from CRLF.


This is OK, it never had a semantic effect under the standard.


  o    Field-names may be converted into some usual canonical form
       (e.g.  "Mime-Version" into "MIME-Version").


Does somebody do this?


  o    Phrases, or parts thereof, may be converted to or from
       quoted-strings.


Yikes!


  o    Date-times may be rewritten in some preferred format, or into
       some preferred timezone.


Death to any relay that does this.  You want signature checkers to
canonicalize dates?  That's asking for trouble, I fear.


  o    Headers with non-ASCII characters may be converted to or from
       the notation defined in [RFC2047].  Observe that there is no
       canonical way to do this conversion and it is, moreover,
       frequently performed in contexts where it is not strictly
       allowed.
[Other contributions to this list welcomed.]


Rather I would remove entries and boil the canonicalization to something
I think most people can get right namely:

        a) Collapse all FWS in multi line headers to single NL (010)
        b) Sort resulting vectors in byte order and concatenate
        c) Hash the resulting stream.

Frankly I don't think this will fail with many USENET relays.  And the
ones it will fail with deserve the quick death (and resulting quick
rebirth as their authors scurry because they have to) they will get.

I'm talking USENET gateways.  I know mail gateways play games.  Let them.
For mail, multipart/signed or S/MIME are likely to be used for signature.

By the way, some people have told me that even the above very simple
algorithm is too complex, so I know they won't buy into something like
you've detailed.


[Open PGP is the obvious choice for this, since it is widely available
and is blessed by the IETF. My only reservation is that is comes with
a rather poor certification system as compared with, say, SPKI. So
this choice might yet have to be reviewed.]


Open PGP is far from the obvious choice.  In fact, it's not even
a cryptographic algorithm.   It is a very non-USENET style packed bits
encoding for a variety of algorithms.   In fact, far from declaring it
an obvious choice I would have called it an obvious thing to avoid if you
hadn't come up and endorsed it.   It's certificate system, such as it is,
has nothing to do with what USENET certificates want to certify, except
perhaps the most basic E-mail address.

Plus the certificate problem and signature problem are somewhat 
different.   A signature contains a key name, some flags and a signature,
which is just the output of a some digital sig algorithm.

A certificate contains all those but also the things it certifies, expressed
in some certification language that matches the needs of USENET.

Sadly, SPKI is losing the battle to X.509 identity certificates.  But
those certificates are clearly not right for USENET. They are too large
for one thing.

Right now my thought is to try to push an SPKI certificate system.

For the signature, no system is needed at all.  The signature is just
a number.  A 320 bit number in the case of SHA1.  You just base64
encode it.  That's it.  It's just a single number, why would it need
a "format".


  The stream of octets resulting from the canonicalization algorithm
  is signed, in binary mode (signature type 0x00), in accordance with
  Open PGP [RFC2440].


You don't include the result of the hash in the signature.  It's redundant.
You calculate the hash yourself, and then test if the signature matches
the hash.  There is no purpose in having a pre-computed copy of the hash
in the signature.



  The output of the algorithm MUST be Ascii-armored [RFC2440], but
  the Armor Header Line ("BEGIN PGP SIGNATURE"), the Armor Headers
  (e.g.  "Version:"), the blank line following the Armor Headers, and
  the Armor Tail ("END PGP SIGNATURE") are to be omitted (thus
  yielding a sequence of base64 characters). Observe that these
  characters will include a CRC checksum, which SHOULD be on a
  separate line from the rest of the signature.


Huh?


  The signature included within the Ascii-armor MAY include
  certificates as evidence that the signing key has the necessary
  authorization to sign articles of that nature, but such usage is in
  general deprecated except between parties that have agreed
  otherwise or where, for some reason, an unusual signatory is
  signing and attaches a certificate from the usual signatory.


Oh, so you want to include certificates in the signature, and that's the
purpose of this encoding form.  Is there value to doing this?   Because
signatures are long, I think they need to be one per header, which is why
I proposed an independent certificate header.