ietf-822
[Top] [All Lists]

Re: BOF: Format-Flowed and Non-Western Charsets

2002-03-19 10:19:52

Pete Resnick <presnick(_at_)qualcomm(_dot_)com>
I'm baffled by both Arnt's and Barry's comments. If I'm writing an 
e-mail client that is in use in Japan, it is completely unacceptable 
to always send HTML (legacy receiving apps and for all of the other 
obvious reasons). Still, when the user types their message into my 
client, the text autowraps, and it autowraps on word (i.e., 
character) boundaries. It is still desireable to send out that 
message without irrevocably hard wrapping paragraphs and ridiculous 
not to allow it in certain character sets because we couldn't get the 
spec right in the first place.

In school my teachers explained to me again and again that I mustn't leave
out steps in my reasoning. Decades later I still do it all too often ;)
Sorry.

Just to state the obvious first, I think it's highly desirable to show
email the way other text is shown. Good wrapping is part of that.

I'll try to divide the world's languages in four.

1. Languages like English, where classic e-mail is text/plain, c-t-e 7bit,
   and German, where text/plain c-t-e 8bit latin1 used to be the norm. All
   languages that use small alphabets and space-separated words are in
   this group.

   In this group, the existing f=f does just fine.

2. Languages that use _big_ alphabets. Japanese is in this group as
   written today. (I believe that formerly the language was sometimes
   shoehorned into katakana/hiragana, but that today computer users expect
   kanji to be available.)

   As I understand it, these languages usually have more than one encoding
   standard (e.g. EUC-JP, Shift-JIS, Unicode) , so you can't just run "cat
   $MAIL" to read your mail. MUAs must exert themselves a bit.

   If a sending MUA has to declare the character set and can send unicode,
   sending a zero-width space sounds like a simple way to indicate the
   availability of a linebreak opportunity, with excellent backward
   compatibility. It requires the use of quoted-printable to handle the
   overlong "lines", but that's OK, because "cat $MAIL" doesn't work
   anyway.

   I don't know whether other big character sets contain features like
   Unicode's ZWS. I'd expect so.

3. Some languages don't have much e-mail yet. There's effectively nothing
   with which to be compatible. Rongo-rongo is one, I guess.

4. The remaining languages are the ones for which an extended f=f could
   make sense -- if my assumptions and logic above hold.

   These must be written with smallish alphabets (the character sets
   preferred for mail today must be 8-bit), most linebreak opportunities
   must not be at whitespace, and there must be a signficant tradition
   with which to be compatible.

   What languages are actually in this group?

--Arnt