Arnt Gulbrandsen wrote:
Bruce Lilly writes:
Aside from font issues, Unicode normalization uses huge tables which
might
not be practical for some devices which do support email. Therefore
transcoding a space- and memory-efficient charset into Unicode may
well render the transformed message unusable or unreplyable if
transferred (forwarded, etc.) to some devices [it is not unusual for
mail to be forwarded to PDAs, mobile phones, pagers, etc.].
Some chap at a Unicode conference talked about a trie-based
implementation that squeezed the tables into 35k. I believe he was from
Psion.
35k isn't huge, not even on a handheld.
For the Danger Hiptop (T-Mobile Sidekick) we represent all mail text and
headers on the device as UTF-8 for storage and network transmission and
UTF-16 for display. We do not have normalization tables on the device
and so far have not not missed them. The tables we do have are for
character classes (since that data is required to be Java compliant) and
for sorting. We cheat a little on the sorting by not having collation
data for characters that are not in any of our fonts.
The only real downside I've found to not keeping the original encodings
on the device is that there is no way to reencode text whose charset was
mislabeled without refetching it over the network. But in practice I
see a lot more text that is completely unlabeled or systematically
mislabeled (claiming to be ISO-8859-1 instead of windows-1252, for
instance) than that claims to be one thing but is actually something
substantially different.
Eric