distinguishing between utf-8 and gb18030

ietf-822

distinguishing between utf-8 and gb18030

2003-01-16 08:23:06


people often claim that it's okay to use utf-8 without tagging in contexts
where other charsets may also appear, because sufficiently long strings of
utf-8 can be distinguished (more or less reliably) from other charsets by
checking to see if the string is valid utf-8.

question:

in an environment where either utf-8 or gb18030 may appear, how reliably
can gb18030 and utf-8 strings be identified and distinguished from one another?
offhand it appears that many gb18030 strings are also valid utf-8 strings.

Keith

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
distinguishing between utf-8 and gb18030, Keith Moore <= Re: distinguishing between utf-8 and gb18030, Dave Crocker Re: distinguishing between utf-8 and gb18030, Keith Moore Re: distinguishing between utf-8 and gb18030, Dave Crocker Re: distinguishing between utf-8 and gb18030, Keith Moore Re: distinguishing between utf-8 and gb18030, ned+ietf-822 Re: distinguishing between utf-8 and gb18030, D. J. Bernstein Re: distinguishing between utf-8 and gb18030, Keith Moore Re: distinguishing between utf-8 and gb18030, Arnt Gulbrandsen Re: distinguishing between utf-8 and gb18030, Keith Moore Re: distinguishing between utf-8 and gb18030, Charles Lindsey

Previous by Date:	Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter), Keith Moore
Next by Date:	Re: distinguishing between utf-8 and gb18030, Dave Crocker
Previous by Thread:	rather than argue and bicker about who said what..., Keith Moore
Next by Thread:	Re: distinguishing between utf-8 and gb18030, Dave Crocker
Indexes:	[Date] [Thread] [Top] [All Lists]