ietf-822
[Top] [All Lists]

Re: printable multibyte encodings

1992-12-17 07:54:43
Just a straight 16-bit representation of a 16-bit character has two problems.
First, if most of the characters are in fact ASCII -- often the case -- then
it is twice as big as it needs to be.  Second, it often includes octets that
cause trouble for software, e.g. ASCII NULs (not an issue if it's inside
a MIME encoding, but significant in other contexts).  Straight-16-bit would
often be the preferred representation inside programs, but for storage and
transmission (and use in filenames etc.), an encoding which avoids these
problems is desirable.

UTF-2, in particular, is an encoding of 16-bit characters that represents
ASCII characters as themselves (one octet apiece) and is "file-system
safe", avoiding octets that have special meaning to common software.


I have not seen the definition about UTF-2. If it is "file-system safe" I assume
it has no "/" in it except for the real "/". Also some other characters like
":" and "." and "\" need to be encoded to be "file-system safe".
Does it encode ISO 8859-1 characters as themselves?
What we need is an encoding that is "file-system safe" and represents the
printable characters in ISO 8859-1 and a few of the most common control
characters as them selves (one octet apiece). This would make both old
ascii files and newer (now fairely common) ISO 8859-1 files to be
used without change.

    Dan

--
Dan Oscarsson
Telia Research AB                       Email: 
Dan(_dot_)Oscarsson(_at_)malmo(_dot_)trab(_dot_)se
Box 85
201 20  Malmo, Sweden

<Prev in Thread] Current Thread [Next in Thread>