At 15:52 09.11.98 -0800, Chris Newman wrote:
7. Request: Implementations are required to decode header charsets.
I could live with this. A lighter-weight alternative would be:
Implementations are required to (a) treat all non-ASCII characters in a
script as a syntax error or (b) decode MIME header encoding. That's the
"you don't have to do it, but if you do it, do it right" approach.
Decoding to what?
3 possibilities:
- Decode to octet string (simple, but dangerous)
- Decode to UTF-8 (requires universal charset translation)
- Decode if you recognize the charset, otherwise leave alone, which
leaves a *slight* incompatibility problem with server upgrades.
I'm fine with the decoding being to an octet string, and the expected
charset being implicit in the comparator function, but one needs
to *somehow* get access to the charset name(s?) to be able to detect the
case where "shit happens".
Something like this:
if subject contains "I ordered a Räksmörgås" then
if matched charset is iso-8859-1 then
do something
else
don't dare to do something
fi
fi
Syntax wildly inventive...haven't read -04, I'm afraid....
Note that Räksmörgås will be represented in UTF-8 in the script while
the script is being moved around. UTF-8 can't represent the octets
of 8859-1 without an escaping mechanism, and a layman user would go
bonkers if asked to use one.
And simply declaring matching on anything that isn't English illegal
forever is Not An Option.
We have a problem.
(Räksmörgås is Swedish for a shrimp open-faced sandwich; popular for
testing because it contains the 3 most important special Swedish letters...)
Harald
--
Harald Tveit Alvestrand, Maxware, Norway
Harald(_dot_)Alvestrand(_at_)maxware(_dot_)no