On Sat, 2007-08-11 at 10:54 +0100, Alexey Melnikov wrote:
Dilyan Palauzov wrote:
If it is 8bit, we take the :first number characters (not bytes),
Actually 'octet' should be used, as 'character' might use one or more
octets, depending on charset.
please, no. it's not good that you can't know if you get a garbage
value, ie. a truncated UTF-8 sequence or UCS-2, Big5, Shift-JIS or any
other character set encoding which uses multiple octets for a single
character.
3028 (and 3028bis) already has wording which says transcoding into UTF-8
SHOULD be supported. it seems natural that this extension takes the
lead of [SIEVE] section 2.7.2. I wouldn't mind it if this extension
*requires* support similar to what is described in the second paragraph:
[...] Implementations convert text
from header fields in all charsets [MIME3] to Unicode, encoded as
UTF-8, as input to the comparator (see 2.7.3). Implementations
MUST be capable of converting US-ASCII, ISO-8859-1, the US-ASCII
subset of ISO-8859-* character sets, and UTF-8. Text that the
implementation cannot convert to Unicode for any reason MAY be
treated as plain US-ASCII (including any [MIME3] syntax) or
processed according to local conventions. [...]
I can't see any advantage to allowing an implementation to cop out of
this -- it's not a very streneous requirement, and it would cause
interoperability problems if it is made a quality of implementation
issue. remember that the eventual goal of IETF is to make e-mail pure
UTF-8, with as little extraneous protocol as possible. we shouldn't add
non-i18n aware features now.
:type, :subtype, :contenttype
What is the obvious advantage of having them, compared to header
:contains ? I mean, isn't 'header :contains "Content-Type" "text/"' as
powerful,
If I remember correctly, the header test doesn't know anything about
header field structure, so it can match any of the Content-type parameters.
yes, consider
Content-Type: image/jpeg; x-exif-data="context/something/other"
you could use :matches (since it is an anchored search), but it would
only work on the type, not the subtype. e.g.
header :matches "Content-Type" "*/plain"
wouldn't match
Content-Type: text/plain; format="flowed"
and you can't easily fix the pattern to handle it.
however, I still don't see the need for all three, IMHO
just :contenttype is enough, and
header :matches :contenttype "Content-Type" "text/*"
is as clear or maybe clearer than
header :type "Content-Type" "text"
I am hard pressed to find a use for the :subtype test, too.
In addition to the above, I think this is slightly easier to understand
'header :contains "Content-Type" "text/"'
oh, we agree :-)
--
Kjetil T.