Still: imho the proper thing to do would be to honor the language
of the incoming message all the way to the generated HTML.
Imagine you have an international list. People post in Korean,
Chinese, some even in English. Which encoding do I want to
force on them? None!
I can even imagine a message with two parts, one in Korean, and
the other in Chinese. In that case, text in two languages must
coexist in a single HTML file. What character encoding scheme
can be used?
I would recommend UTF-8N in such a case. Pattern matching in
UTF-8N (or UTF-8) is relatively easy.
Another possibility is to use ISO-2022-JP-2 encoding or its
variants. But not many people actually use this encoding, and
I imagine very few people want to go along with the nightmare of
ISO 2022 style.
By the way,
This would limit your server to documents of one common encoding,
so I believe per-document encoding is preferable.
you can also use per-directory encoding.