In <ylfzs9pq4c(_dot_)fsf(_at_)windlord(_dot_)stanford(_dot_)edu> Russ Allbery
<rra(_at_)stanford(_dot_)edu> writes:
Kai Henningsen <kaih(_at_)khms(_dot_)westfalen(_dot_)de> writes:
rra(_at_)stanford(_dot_)edu (Russ Allbery) wrote:
that all software will soon expect any untagged 8-bit data to be in
Unicode,
That part is baseless exaggeration.
Quite to the contrary, that's a summary of the basic foundation of the
current draft, and is exactly what has been argued by multiple people on
the USEFOR list on multiple occasions.
To be more precise, the expectation is that any untagged 8-bit data will
be in UTF-8.
And to be even more precise, the expectation is that software will check
to see whether the untagged data looks like UTF-8, and treat it as such if
it is. If it isn't then the data will be non-compliant, but a reasonable
workaround would be to treat it as whatever default charset has been set
up by the user.
Another possibility I could be persuaded of is a Header-Charset: header
which indicated the actual charset used in the headers (defaulting to
UTF-8). There are problems with that (e.g. if the first header containing
Non-ASCII comes earlier than that header), but it might fly well enough to
be useful.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk Snail: 5
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5