By Unicode signature, I'm guessing you mean the BOM? That problem
seems to have been easily dealt with by simply deciding to allow it
in UTF-8. It doesn't appear to have caused any problems in practice
today.
In the case of XML, I think you are right. In general, however, see
http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-05.txt
I don't know what you problems you refer to with "representation of
non-BMP characters". UTF-8 precisely specifies how these characters
are represented. There's no issue here. Did you mean something else?
Quite a few implementations use 6 bytes (rather than 4 bytes) to represent
non-BMP characters. See
http://www.unicode.org/reports/tr26/
--
MURATA Makoto <murata(_at_)hokkaido(_dot_)email(_dot_)ne(_dot_)jp>