The bottom line of this argument is that we should only
support ascii (read English) or the secutity code
will be harder to write.
The article basically says that Unicode is more complex than
ascii therefore security cannot easily validate input strings.
Here is the last bit of the article:
( http://www.counterpane.com/crypto-gram-0007.html#9)
With Unicode, we probably won't be able to get
a consistent definition of what to accept, what
is a delimiter under what circumstance, or how
to handle arbitrary streams safely. It's just
a matter of time before simple validators pass
data and upper layer software, trying to be
helpful, attach magic-character semantics, and
we have a brand-new variety of security holes.
Unicode is just too complex to ever be secure.
It would be easy to make a similar (perhaps stronger)
argument that handling all encodings would make security
much more difficult. The multi-byte encoding have a
large range of characters (eg: SJIS, EUC-JP, GB 2312, etc.)
So shall we give up on the rest of the world so that
security coding will be easier?