perl-unicode

RE: Tossing another can of worms into the minefield...

2002-02-05 18:39:26
Nick:

You might consider using Microsoft's fallback mappings when converting
to and from Microsoft encodings - for compatibility with Windows.

The Microsoft mapping tables on ftp.unicode.org do not show their
fallback mappings, but there are mapping tables in the ICU tree that
show them. These tables have been extracted from Windows using a tool
that's in the ICU source tree as well. The files are not distributed in
the normal ICU download but are provided as additional tables one can
incorporate into the ICU 'dat' file if one chooses to do so.

The mappings show both fallbacks (many Unicode-> one code page
character) as well as 'reverse fallbacks' (many code page characters ->
one Unicode). The tables on ftp.unicode.org show neither.

You can find the tables under [ICU]/charset/data/ucm I believe.

=Ed


-----Original Message-----
From: Nick Ing-Simmons [mailto:nick(_dot_)ing-simmons(_at_)elixent(_dot_)com] 
Sent: Tuesday, February 05, 2002 12:51 PM
To: perl-unicode(_at_)perl(_dot_)org
Subject: Tossing another can of worms into the minefield...


What does the list think of the idea of fallbacks for "common"
approximations
e.g. have Unicode->iso8859-1 map Microsoft cp1250

<U2018> \x91 |0 # LEFT SINGLE QUOTATION MARK
<U2019> \x92 |0 # RIGHT SINGLE QUOTATION MARK

Fallback map those to "'"

<U201C> \x93 |0 # LEFT DOUBLE QUOTATION MARK
<U201D> \x94 |0 # RIGHT DOUBLE QUOTATION MARK

And those to '"'

Likewise perhaps map iso8859-15's

<U20AC> \xA4 |0 # EURO SIGN

To

<U00A4> \xA4 |0 # CURRENCY SIGN


--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

<Prev in Thread] Current Thread [Next in Thread>