ietf-822
[Top] [All Lists]

Re: internationalization of mail

2004-08-29 16:49:55

Arnt Gulbrandsen wrote:

Bruce Lilly writes:

Aside from font issues, Unicode normalization uses huge tables which might not be practical for some devices which do support email. Therefore transcoding a space- and memory-efficient charset into Unicode may well render the transformed message unusable or unreplyable if transferred (forwarded, etc.) to some devices [it is not unusual for mail to be forwarded to PDAs, mobile phones, pagers, etc.].

Some chap at a Unicode conference talked about a trie-based implementation that squeezed the tables into 35k. I believe he was from Psion.

35k isn't huge, not even on a handheld.

For the Danger Hiptop (T-Mobile Sidekick) we represent all mail text and headers on the device as UTF-8 for storage and network transmission and UTF-16 for display. We do not have normalization tables on the device and so far have not not missed them. The tables we do have are for character classes (since that data is required to be Java compliant) and for sorting. We cheat a little on the sorting by not having collation data for characters that are not in any of our fonts.

The only real downside I've found to not keeping the original encodings on the device is that there is no way to reencode text whose charset was mislabeled without refetching it over the network. But in practice I see a lot more text that is completely unlabeled or systematically mislabeled (claiming to be ISO-8859-1 instead of windows-1252, for instance) than that claims to be one thing but is actually something substantially different.

Eric


<Prev in Thread] Current Thread [Next in Thread>