On Tue, 25 Jul 2006, Kjetil Torgrim Homme wrote:
On Tue, 2006-07-25 at 07:46 -0700, Philip Guenther wrote:
...
to US-ASCII characters. Strings and comments may contain octets
outside the US-ASCII range. Specifically, they will normally be in
UTF-8, as specified in [UTF-8]. NUL (US-ASCII 0) is never permitted
in scripts, while CR and LF can only appear as the CRLF line ending.
"normally" isn't good if you can't know if your script is normal or not.
...thus the text I added to 2.4.2 giving examples of when a script might
include non-UTF-8 text. Given the exact cases mentioned, I think
"normally" is a reasonable word to use. Other email RFCs (1894, 2822,
3463, 3834, 3898) certainly haven't had problems with using it to describe
situations where a strict rule can't be drawn but setting the reader's
expectations is useful.
I suggest you rephrase it as "Strings and comments may sometimes have a
different encoding than UTF-8, so for consistent behaviour across
implementations, it is recommended to avoid non US-ASCII". (yes, tongue
is firmly in cheek.)
But that's not what we're saying. We expect consistent handling of these
strings across implementations. Thus the "MUST accept" in 2.4.2.
speaking of CRLF, I'd like a clarification of multi-line strings in
section 2.4.2 (some of its text duplicates the above, I'm not sure
that's good). something like:
Any CRLF before the final period are considered part of the string.
How about "The CRLF before the final period is considered part of the
string.", inserted into the penultimate paragraph of section 2.4.2?
to make it a little more clear that implementations should NOT change
the CRLF into its local line delimiter sequence.
I don't understand what this has to do with the text you suggested. Could
you clarify what behavior you think the document should require or ban?
Philip Guenther