Cyrus Daboo wrote:
I would like to draw your attention to the following draft:
Please review this document and send issues to the list or direct to the
4.1 Body Transform ":raw"
# This will match a message containing the words "MAKE MONEY FAST"
# in body or MIME headers other than the outermost RFC 822 header,
# but will not match a message containing the words in a
# content-transfer-encoded body.
Wouldn't it be more correct to say that it matches the string, or
even the character sequence "MAKE MONEY FAST"? Also I'm not sure I
understand what is meant by "a content-transfer-encoded body". It
*could* still match the character sequence in a quoted-printable
encoded body, couldn't it?
4.2 Body Transform ":content"
MIME parts encoded in "quoted-printable" or "base64" content
transfer encodings MUST be decoded to prior to the match.
I'm not a native English speaker, but the above sentence doesn't
make sense to me. Probably should say "..MUST be decoded prior to.."
If an implementation does not support conversion of a given
charset to UTF-8, it MAY compare against the US-ASCII subset
of the transfer-decoded character data instead.
Does the above rely on all current and future charsets having
a one-to-one mapping to US-ASCII for all characters with code
points 0-127? Is this a safe assumption? Is it even true of all
existing charsets? Maybe it would be better to explicitly
exclude all parts who can't be converted to UTF-8?