ietf-mta-filters
[Top] [All Lists]

Re: Three new drafts and a question

2003-04-28 16:48:34

On Sunday 27 April 2003 19:17, ned(_dot_)freed(_at_)mrochek(_dot_)com wrote:
<snip>
I think replaceheader is too complex.

But it's needed in the real world. Adding/removing the [mailing list]
tag to/from the subject line and fixing mangled/mangling reply-to
headers is popular among people who need to handle large (or large
amounts of) mailing list traffic.

Please explain why you cannot use deleteheader/addheader for this.

Think for a bit about the right
way to handle modifications to encoded words. It gets very very nasty
in a great big hurry.  For example, suppose I have an encoded-word in
iso-8859-1 and I use replaceheader to change an a with an accent
grave to an a with an ogonek. What does the result look like? Are
adjacent encoded-words affected? Should they be?

The base Sieve spec deals with recognizing encoded-words and IMO all
that this extension needs to say is that if the result of the
replaceheader action needs encoding, the Sieve implementation MUST do
so.

This is not even close to sufficient. The interaction with existing
encoded words also has to be considered.

The _how_ is up to the implementation and can range from re-encoding
only the affected (encoded-)words using a minimal charset (for a
certain definition of that) to brutally encoding the whole result in
=?utf-8?b?...?= chunks unconditionally.

A cure that is worse than the disease given the state of many clients today.

One could add a :preferred_charsets option that takes a stringlist as
parameter, but I think that's hacky. One could specify a preferred
algorithm that SHOULD be followed along the lines of

1. Only touch encoded-words that are affected.
2. If the affected portions of the result are US-ASCII, remove the
encoding, else
3. If the affected portions of the result can be represented in one of
the charsets already used in encoded-words in the original form of the
header, encode using that charset, else
4. If they can be represented in iso-8859-1 or the server's locale
charset or any site-defined sorted list of charsets, use the best
(latin-1, locale, first in the list), else
5. use utf-8.

Yeah, and given past experience with encoded-word handling, how many
implementations do you think are going to get this right?

but I don't see the need to require a particular behavior when encoding.

I'm afraid I do.

                                        Ned