Alicia:
I was just responding to:
 
"The data being signed must remain in binary form,
it is too large to do any type of text encoding
such as base64, in order to just specify S/MIME
related MIME-Type text headers."
 
Of course, there is no need to text encode the
binary data to use S/MIME, so the size of the
binary data need not be a factor.  The overhead
can be only the header and boundary octets (which
happen to be 7-bit).
 
We are also using CMS for a "non-e-mail" application 
but chose to use MIME encoding (with binary transfer 
encoding) since it provided a number of advantages, 
including the ability to bind the content type to 
the signature (as per your second post I think...).  
 
I understand that you have an application where you 
do not wish to MIME encode the content and would 
prefer (some) functions available from MIME to be 
available directly with CMS.
 
In our specific case we avoid detached signatures -
this was not desirable - so the issues you raise 
in your follow-up posting we did not need to consider.
 
We just mandated that the data formats at the input 
and output of the CMS processes be MIME encoded 
(but allowing binary transport).  This way we could 
use the MIME headers to identify transfer encoding, 
content type, private header extensions, etc. at the 
input and output of the secure channel, knowing that 
the headers are not changed in transit (this allows 
the transfer of additional secure "out of band" 
signalling between the applications & use the MIME 
standard for that signalling).  So in our case 
the trusted timestamp would be over the MIME encoded 
data including headers.  The MIME overhead was 
considered to be small compared with the benefits of
sticking with a fairly widely used MIME standard.
 
But then again, one size does not fit all 
(and I agree that having MIME text mixed with CMS ASN.1 
is a bit odd and perhaps peculiar to e-mail applications.
I also agree that CMS Section 4 could be clearer, but 
there are a lot of e-mail apps which assume MIME ...),
Cheers,
Tony
-----Original Message-----
From: Alicia da Conceicao [mailto:alicia(_at_)engine(_dot_)ca] 
Sent: December 13, 2005 10:56 AM
To: Tony Capel
Subject: Re: SignedAttribute for Mime-Type
(You will probably get this response from others.)
I am not sure why you need an OID.  Just specify content encoding as = 
binary in the MIME header. There is no mandatory requirement to base64 
encode the mime content. In fact, it is recommended not to (except 
under certain circumstances) = since it leads to unnecessary message 
expansion.
Dear Tony:
Thank you for your response.  CMS requires all data content types to be
arbitrary octet strings, with the interpretation is left to the application.
There is no requirement to use any type of textual S/MIME headers in the data.
I attached the relevant sections from the CMS rfc below.
Because of this, I did not impliment any type of data formating for my CMS
structures.  In fact, I specifically avoided it for several reasons, including:
1) many signatures contained detached data, so the hash used in
        the CMS SignData structure should correspond to the core
        data itself and not include any encapsulation or headers,
        since with detached signatures, the encapsulation and
        headers could not be stored anywhere
2) identical core data can be encapsulated differently or have
        sightly different headers, which would give them different
        hashes; this would make pairing up signatures with their
        detached data nearly impossible
3) time stamp tokens, which are also detached signatures, also only
        hash the core data itself in the TSTInfo, and do not use any
        type of encapsulation or headers, since all time stamp
        tokens are used for detached data
Currently, I am using CMS for lots of thing completely unrelated to S/MIME or
e-mail.  In practice, more than 10% of the signatures generated using my CMS
software are for detached data.  It just seems easier to simply specify a MIME
type for the core data as an attribute that would work for both attached and
detached signatures.
Alicia.
=====================================================================
4 Data Content Type
   The following object identifier identifies the data content type:
      id-data OBJECT IDENTIFIER ::= { iso(1) member-body(2)
         us(840) rsadsi(113549) pkcs(1) pkcs7(7) 1 }
   The data content type is intended to refer to arbitrary octet
   strings, such as ASCII text files; the interpretation is left to the
   application.  Such strings need not have any internal structure
   (although they could have their own ASN.1 definition or other
   structure).
   S/MIME uses id-data to identify MIME encoded content.  The use of
   this content identifier is specified in RFC 2311 for S/MIME v2
   [OLDMSG] and RFC 2633 for S/MIME v3 [MSG].
   The data content type is generally encapsulated in the signed-data,
   enveloped-data, digested-data, encrypted-data, or authenticated-data
   content type.
5.2.1  Compatibility with PKCS #7
   This section contains a word of warning to implementers that wish to
   support both the CMS and PKCS #7 [PKCS#7] SignedData content types.
   Both the CMS and PKCS #7 identify the type of the encapsulated
   content with an object identifier, but the ASN.1 type of the content
   itself is variable in PKCS #7 SignedData content type.
   PKCS #7 defines content as:
      content [0] EXPLICIT ANY DEFINED BY contentType OPTIONAL
   The CMS defines eContent as:
      eContent [0] EXPLICIT OCTET STRING OPTIONAL
   The CMS definition is much easier to use in most applications, and it
   is compatible with both S/MIME v2 and S/MIME v3.  S/MIME signed
   messages using the CMS and PKCS #7 are compatible because identical
   signed message formats are specified in RFC 2311 for S/MIME v2
   [OLDMSG] and RFC 2633 for S/MIME v3 [MSG].  S/MIME v2 encapsulates
   the MIME content in a Data type (that is, an OCTET STRING) carried in
   the SignedData contentInfo content ANY field, and S/MIME v3 carries
   the MIME content in the SignedData encapContentInfo eContent OCTET
   STRING.  Therefore, in both S/MIME v2 and S/MIME v3, the MIME content
   is placed in an OCTET STRING and the message digest is computed over
   the identical portions of the content.  That is, the message digest
   is computed over the octets comprising the value of the OCTET STRING,
   neither the tag nor length octets are included.
   There are incompatibilities between the CMS and PKCS #7 signedData
   types when the encapsulated content is not formatted using the Data
   type.  For example, when an RFC 2634 [ESS] signed receipt is
   encapsulated in the CMS signedData type, then the Receipt SEQUENCE is
   encoded in the signedData encapContentInfo eContent OCTET STRING and
   the message digest is computed using the entire Receipt SEQUENCE
   encoding (including tag, length and value octets).  However, if an
   RFC 2634 signed receipt is encapsulated in the PKCS #7 signedData
   type, then the Receipt SEQUENCE is DER encoded [X.509-88] in the
   SignedData contentInfo content ANY field (a SEQUENCE, not an OCTET
   STRING).  Therefore, the message digest is computed using only the
   value octets of the Receipt SEQUENCE encoding.
   The following strategy can be used to achieve backward compatibility
   with PKCS #7 when processing SignedData content types.  If the
   implementation is unable to ASN.1 decode the signedData type using
   the CMS signedData encapContentInfo eContent OCTET STRING syntax,
   then the implementation MAY attempt to decode the signedData type
   using the PKCS #7 SignedData contentInfo content ANY syntax and
   compute the message digest accordingly.
   The following strategy can be used to achieve backward compatibility
   with PKCS #7 when creating a SignedData content type in which the
   encapsulated content is not formatted using the Data type.
   Implementations MAY examine the value of the eContentType, and then
   adjust the expected DER encoding of eContent based on the object
   identifier value.  For example, to support Microsoft AuthentiCode,
   the following information MAY be included:
      eContentType Object Identifier is set to { 1 3 6 1 4 1 311 2 1 4 }
      eContent contains DER encoded AuthentiCode signing information