people often claim that it's okay to use utf-8 without tagging in contexts
where other charsets may also appear, because sufficiently long strings of
utf-8 can be distinguished (more or less reliably) from other charsets by
checking to see if the string is valid utf-8.
question:
in an environment where either utf-8 or gb18030 may appear, how reliably
can gb18030 and utf-8 strings be identified and distinguished from one another?
offhand it appears that many gb18030 strings are also valid utf-8 strings.
Keith