Pete Resnick <presnick(_at_)qualcomm(_dot_)com>
I'm baffled by both Arnt's and Barry's comments. If I'm writing an
e-mail client that is in use in Japan, it is completely unacceptable
to always send HTML (legacy receiving apps and for all of the other
obvious reasons). Still, when the user types their message into my
client, the text autowraps, and it autowraps on word (i.e.,
character) boundaries. It is still desireable to send out that
message without irrevocably hard wrapping paragraphs and ridiculous
not to allow it in certain character sets because we couldn't get the
spec right in the first place.
In school my teachers explained to me again and again that I mustn't leave
out steps in my reasoning. Decades later I still do it all too often ;)
Sorry.
Just to state the obvious first, I think it's highly desirable to show
email the way other text is shown. Good wrapping is part of that.
I'll try to divide the world's languages in four.
1. Languages like English, where classic e-mail is text/plain, c-t-e 7bit,
and German, where text/plain c-t-e 8bit latin1 used to be the norm. All
languages that use small alphabets and space-separated words are in
this group.
In this group, the existing f=f does just fine.
2. Languages that use _big_ alphabets. Japanese is in this group as
written today. (I believe that formerly the language was sometimes
shoehorned into katakana/hiragana, but that today computer users expect
kanji to be available.)
As I understand it, these languages usually have more than one encoding
standard (e.g. EUC-JP, Shift-JIS, Unicode) , so you can't just run "cat
$MAIL" to read your mail. MUAs must exert themselves a bit.
If a sending MUA has to declare the character set and can send unicode,
sending a zero-width space sounds like a simple way to indicate the
availability of a linebreak opportunity, with excellent backward
compatibility. It requires the use of quoted-printable to handle the
overlong "lines", but that's OK, because "cat $MAIL" doesn't work
anyway.
I don't know whether other big character sets contain features like
Unicode's ZWS. I'd expect so.
3. Some languages don't have much e-mail yet. There's effectively nothing
with which to be compatible. Rongo-rongo is one, I guess.
4. The remaining languages are the ones for which an extended f=f could
make sense -- if my assumptions and logic above hold.
These must be written with smallish alphabets (the character sets
preferred for mail today must be 8-bit), most linebreak opportunities
must not be at whitespace, and there must be a signficant tradition
with which to be compatible.
What languages are actually in this group?
--Arnt