This is the problem, HTML does not support mixed character sets.
Also, the charset affects the entire HTML document. Therefore, your
resource settings would have to conform with the charset, and this
can be a big problem if messages existing in the archive have different
specified charsets. It would be hard to guarantee that all messages
will use the same charset.
I think I understand ... is this right?
If an single email contains two different character sets,
you're screwed, I understand that.
If two emails are received, each with a different character set
1) you are screwed on index pages, which will has a bunch
of subject lines from different character sets
2) you are screwed on message pages, because navigational aids
like the word "follow-ups" will be in a different character set
from the messages.
Ok, so I see how unicode would magically fix everything. But, imagine that
wasn't available, and I get a message in an unknown character set.
The result is an un meta-tagged message page (which will default to either
iso-8859-1 or some browser heuristic). Assuming iso-8859-1, we get good
navigational aids and an undreadable message. Had we used a meta tag the
message would be readable and we'd lose the navigational aids. Yuck, yuck,
yuck, it's a choice between two evils. Given just those options, I think a
message page meta tag (generated from the corresponding email's character
set) would be better, though.
Converting to unicode won't be graceful either. If one converts everything
unknown to unicode, I bet in practice a lot of iso-8859-1 messages will go to
unicode and be unrenderable by legacy browswers. I guess legacy browswers
will have to be replaced.