[Top] [All Lists]

Re: WGLC on draft-ietf-sieve-mime-loop

2007-08-10 09:51:39

 > 7 Action extract_text

QUESTION: What do we do if the Content-Transfer-Encoding is anything
other than 7bit?

If it is 8bit, we take the :first number characters (not bytes), taking
into account the
"charset=" parameter of the Content-Type header, when presented. If it is
base64 or
quoted-printable, we convert it to 8bit and proceed as if it was 8bit. The
same for binary,
 even if the result will be probably useless. (The useless results can
happen using base64,
too, but the possibility is smaller)

I guess the reason this is a problem is if you hit some 10MB base64 encoded
attachment and are trying to extract the first 100 bytes.  You either have
to decode the whole 10MB, or write some picky code to extract as little text
as possible while decoding legal units of QP/base64 encoding.

With the body test we offer :raw :content :text to decide what kind of
transform is required, which is why I believe extract_text should be moved
to the body spec, and implemented through an optional argument to force the
setting of the match variables.

:type, :subtype, :contenttype

What is the obvious advantage of having them, compared to header :contains
? I mean, isn't
'header :contains "Content-Type" "text/"' as powerful, as 'header :type
"text"'? If this was discussed already on this list, could you tell me the
starting date?

It's a matter of precision of parsing.  Consider:

    header :contains "Content-Type" "text"


    header :type "Content-Type" "text"

Against this content-type header:

Content-Type: application/pdf;  name="presentationtext.pdf"