Re: Proposal for escaping on non-UTF-8 sequences in Sieve


On Fri, 20 Oct 2006, Michael Haardt wrote:

Do implementations have to support encodings of NUL ala ${hex:0}?  If so,
that will be a barrier to this extension being broadly supported.
(I think it should be a "MAY support".)

nn> Note that =00 in quoted printable has to be dealt with already.

If you're referring to RFC 2047's Q encoding, 3028bis currently says:

      <...>  An encoded NUL octet
      (character zero) SHOULD NOT cause early termination of the header
      content being compared against.

So an implementation that refused to match beyond the =00 would still be"conditionally compliant".

An implementation that uses nul-terminated strings is already broken
and things will not get worse using an encoded NUL character in a string.

So, in your opinion, being able to insert NUL into strings is as importantas the ability to encode other, non-UTF-8, non-NUL octets, and thatsupport for those capabilities should therefore be tied? I think theresult will be fewer implementations of this extension as well asimplementation that don't actually comply.

The original MIME documents didn't make an exception for NUL...and thathad to be changed when they were revised as part of moving to DraftStandard. RFC 2049 says:

    (17)  The definitions of "7bit" and "8bit" have been
          tightened so that use of bare CR, LF can only be used
          as end-of-line sequences.  The document also no longer
          requires that NUL characters be preserved, which brings
          MIME into alignment with real-world implementations.

Which comes first, encoding replacement or variable expansion?  Or are
they concurrent?  Whatever the answer, the variables I-D will need to make
that clear.
(I think encoding replacement should come before variable expansion.)


I think they should be processed inside-out, but not in separate
passes.  Variables introduce of the concept of strings (aka test
and action arguments) not being literals, but string expressions that
are evaluated.  It makes sense that arguments of string functions
turn from literals to string expressions, too.

As a result, without variables, "${hex:${hex:4646}}" is a syntax error.


No, it would be replaced with "${hex:FF}"

With variables, it is "FF".



1) Given that variables are *explicitly* not handled that way, why should
   that be true of these?

2) Why would pulling in the variable capabilty change the behavior of the
   above?

2) The behavior you describe would be useful how?

I'll note that while a script that uses this extension for all non-UTF8
octets may be displayable without munging, the result may still be
incomprehensible if, for example, an 8bit ISO-2022-JP MIME part is
included in a string.  Indeed, such encoding will probably make it more
difficult to display that MIME part readably: currently, if a user is
viewing the script with raw ISO-2022-JP in it they can probably display
that MIME part by overriding their browser's charset encoding for the
page.


Indeed, a hack like that is not as useful any more.  But it was a hack
to begin with, because overriding the charset encoding renders contained
UTF-8 text useless.

No, US-ASCII is a subset of ISO-2022-JP, so at least the syntax portionsof the script would still be readable. Is it a hack? Yes. Does it work?Yes. Will there be any equivalent when the raw ISO-2022-JP is encodedusing ${hex}? I strongly doubt it.

Convert the ISO-2022-JP part to UTF-8.

My interactions with Sendmail's Japanese customers tell me thatISO-2022-JP is still heavily preferred over UTF-8 in email in Japan. Areyou claiming otherwise? Or do you think that it doesn't matter?



Philip Guenther