ietf-openpgp
[Top] [All Lists]

Re: Section 5.2.3 of latest draft: bis14.

2005-07-20 14:52:01


On 15 Jul 2005, at 4:47 PM, Hal Finney wrote:


This is definitely an error and needs to be fixed.


Fixed.

A couple of other relatively minor points relating to this section.

We now use the term "data set" for the hashed and unhashed subpackets:

      - Hashed subpacket data set. (zero or more subpackets)

      - Two-octet scalar octet count for the following unhashed
subpacket data. Note that this is the length in octets of all of
        the unhashed subpackets; a pointer incremented by this number
        will skip over the unhashed subpackets.

      - Unhashed subpacket data set. (zero or more subpackets)

"Data set" is defined in the next section, 5.2.3.1:

    A subpacket data set consists of zero or more signature subpackets,
    preceded by a two-octet scalar count of the length in octets of all
    the subpackets; a pointer incremented by this number will skip over
    the subpacket data set.

This definition could be interpreted to mean that the data set includes
the two-octet scalar count.  In fact, in the layout in 5.2.3 the data
set does not include the scalar count. 5.2.3.1 could be reworded to say
"A subpacket data set consists of zero or more signature subpackets,
AND IS preceded by a two-octet scalar count..."


Personally, I'd just remove the comma. I also removed a semicolon:

A subpacket data set consists of zero or more signature subpackets preceded by a two-octet scalar count of the length in octets of all the subpackets. A pointer incremented by this number will skip over the subpacket data set.

Another slight wording inconsistency is in 5.2.3:

    The data being signed is hashed, and then the signature data from
    the version number through the hashed subpacket data (inclusive) is
    hashed. The resulting hash value is what is signed.

This "x is hashed, and then y is hashed" business has caused confusion
for implementors.  5.2.2 fixed the wording for V3 packets:

    The concatenation of the data to be signed, the signature type and
    creation time from the signature packet (5 additional octets) is
hashed. The resulting hash value is used in the signature algorithm.

We should make the same change for V4 packets in 5.2.3.  I don't know
if there are any other places where we talk about hashing X and then
Y and then Z instead of, as we should, hashing the contatenation of
X and Y and Z.


How about:

The concatenation of the data being signed and the signature data from the version number through the hashed subpacket data (inclusive) is hashed. The resulting hash value is what is signed. The left 16 bits of the hash are included in the signature packet to provide a quick test to reject some invalid signatures.


I am diffing against bis-12 which is the only old version I have here.
Another change I notice is that the preferred algorithm signature
subpackets in 5.2.3.7, 5.2.3.8 and 5.2.3.9 have their contents changed
from a "sequence" of one-octet values to an "array" of one-octet values.
However we do not otherwise define "array".  Is that word really
better than "sequence" here?  To me, a sequence of values is a plainer
description while an array perhaps connotes a somewhat more complex
data structure. Of course in C an array is simply bytes in memory so if
that is how it is being read, OK.  I'm just worried that an implementor
is going to look for a definition of array.


I'm worried that an implementor is going to look for a definition of definition. This is something I worry about with each "clarification" we make, that N people think it's better and M people think it's worse. This is why I am resistant to clarifications (even though it seems I make a lot of them).

I agree with you that "sequence" is clearer than "array" and given how much of the text is either yours or mine, it's no wonder it used to say "sequence." I'm more than happy to change it back.

Let me meditate upon it for a bit.


Section 5.5.2:

    V2 keys are identical to V3 keys except for the deprecated V3 keys
    except for the version number. An implementation MUST NOT generate
    them and may accept or reject them as it sees fit.

Two "except for"s here, it doesn't look right.


It's worse than that.

Here's the correction:

V2 keys are identical to the deprecated V3 keys except for the version number. An implementation MUST NOT generate them and may accept or reject them as it sees fit.


Section 5.9 on literal packets:

      - File name as a string (one-octet length, followed by a file
name). This may be a zero-length string. Commonly, if the source
        of the encrypted data is a file, this will be the name of the
        encrypted file. An implementation MAY consider the file name in
        the literal packet to be a more authoritative name than the
        actual file name.

I know we discussed this here, but I'm not sure this is right yet.
What is the "actual file name"?  And what does it mean for a name to
be authoritative? This is making some assumptions about processing flow
which may not be correct.  I think "actual file name" means the name of
the file being decrypted, assuming that the encrypted data actually came
from a file.  But then, usually the encrypted file name is not used for
the decrypted data, rather some modification of that file name is used,
so perhaps that is the "actual file name"?

Maybe we could change the last sentence to "When decrypting, an
implementation MAY use this name as the name of an output file."
That would hint what we mean it to be used for. Or maybe just leave the
last sentence off entirely and just say that this is commonly the name
of the encrypted file, let the implementor figure out what if anything
he wants to do with it.



We refer to RFC 822 in two places, but that's been superceded by
RFC 2822.


I only found one, unless you're counting the citation. I updated to be 2822.

The only reason I noticed this in my diffing was because we changed
to put a space after RFC.  But in the references we have no space,
e.g. [RFC 2045] became [RFC2045].  I guess it's OK to use a space in
the text and no space in the references, but why not do it the same
in both contexts?  I would vote for no space, it looks better to me,
but your eyes may differ.


At the risk of sounding like the White Knight, the reason it is the way it is is that I consider (e.g.) "RFC 2045" to be the name of the RFC, whereas "[RFC2045]" is an identifier that is what the name is called, and years of programming has instilled in me an aversion to spaces there. Heck, I even usually put them in all upper case. (I recently cited 2440 as [OpenPGP] and the mere use of mixed case gave me pause.) Just as I'd never refer in English to Mister and Zuccherato's paper as "[MZ05]" that's still the identifier I used for it.


Section 13, Security Considerations:

     * In winter 2005, Serge Mister and Robert Zuccherato from Entrust
       released a paper describing a way that the "quick check" in
       OpenPGP CFB mode can be used with a random oracle to decrypt two
       octets of every cipher block [MZ05]. They recommend as
       prevention not using the quick check at all.

       Many implementers have taken this advice to heart for any data
       that is both symmetrically encrypted, but also the session key
       is public-key encrypted. In this case, the quick check is not
       needed as the public key encryption of the session key should
       guarantee that it is the right session key. In other cases, the
       implementation should use the quick check with care. On the one
       hand, there is a danger to using it if there is a random oracle
       that can leak information to an attacker. On the other hand, it
       is inconvenient to the user to be informed that they typed in
       the wrong passphrase only after a petabyte of data is decrypted.
       There are many cases in cryptographic engineering where the
       implementer must use care and wisdom, and this is another.

This is good but I think some of the wording could be smoothed.   The
first sentence of the second paragraph should not have a comma after
the first part of the "both" clause, and "but" doesn't seem like the
right connective.  I suggest,

       Many implementers have taken this advice to heart for any data
       that is symmetrically encrypted and for which the session key
       is public-key encrypted.


Done.

I also have a problem with "there is a danger to using it if there is
a random oracle that can leak information".  This makes it sounds like
the random oracle is some other entity independent of the implementation. I would prefer to avoid the word "oracle" as not all implementors may be
familiar with the technical meaning, and in common use it has mystical
or religious connotations.


But that's what it's called -- a random oracle.

I think what we want is something like "there is a danger to using it
if timing information about the check can be exposed to an attacker,
particularly via an automated service that allows rapidly repeated
queries".


I made a few edits. See below.

Finally I think the last clause should say "and this is one" rather
than "and this is another".


Done.

I do have to add that I think this paragraph is perhaps a little informal
or even poetic for a security document.  Implementors "take things to
heart" and use their "care and wisdom".  I could see an implementor
wondering whether he was reading a spec or beginning a study of Zen.
Maybe we should think about changing this to be a little more cool and
just warn them that if they are going to use the check bits, they need
to be aware of the danger of leaking timing data.  The content of the
paragraph is good, it's just the style which struck me as being a bit off.
Again, your taste may differ.


Here's my thinking. This isn't a security document. It's a protocol document. If you want to be really picky, it's a data format document.

We cannot tell people in here everything to do. We can't tell them what to do in something as easy as speculative key ids (which I pick solely because we've been discussing it recently). We certainly can't tell them when it's proper to do a quick check and when it isn't.

We can and do, however, editorialize and wave a few flags and drop a few hints.

I'm glad that leaps out, because that's what I want. I don't want to say SHOULD NOT. I don't want to say MAY. I know my opinion is going to change on this at least once in the next five years. I want to say -- ummm, think long and hard about it and do the best job you can. If the new implementer is shocked by this into reading the Mister and Zuccherato paper, so much the better. If they check to see what other people have done, marvelous.

Here's the final edit I have:


Many implementers have taken this advice to heart for any data that is symmetrically encrypted and for which the session key is public-key encrypted. In this case, the quick check is not needed as the public key encryption of the session key should guarantee that it is the right session key. In other cases, the implementation should use the quick check with care.

On the one hand, there is a danger to using it if there is a random oracle that can leak information to an attacker. In plainer language, there is a danger to using the quick check if timing information about the check can be exposed to an attacker, particularly via an automated service that allows rapidly repeated queries

On the other hand, it is inconvenient to the user to be informed that they typed in the wrong passphrase only after a petabyte of data is decrypted. There are many cases in cryptographic engineering where the implementer must use care and wisdom, and this is one.



Everything else looked good as far as I could see.


Thanks. I appreciate the comments and corrections

        Jon


Hal Finney