RE: Unicode / Transliteration

  where 'transliteratorX' is the name of a transliteration class to use
  (i.e. Unicode::Transliterate::ISO_8859_15::ASCII for the ISO_8859_15
  to ASCII transliteration table).


Your main example was accent-stripping. This operation isn't related to
encodings really. It is a general operation that simply happens to yield an
ASCII subset for a certain input. Maybe you want to define your module as
something more specific and more inclusive: a Unicode-to-human-readable-URI
transliterator. This will be a non-trivial module to implement in full! Yet,
something pretty useful could be done fairly simply using nothing more than
regexes.

Script-to-script transliterators (Japanese->Latin for example) would be
useful, but encoding-to-encoding transliterators are not so useful really.
There are too many dimensions to the problem. And, fallback characters or
routines are probably the best design to generate useful output when
mis-matched encodings are being cross-converted.

Here's an interesting article on transliteration in general and ICU's
implementation in particular:

http://oss.software.ibm.com/icu/userguide/Transliteration.html

Transliteration itself pre-dates computers by centuries. It is a fascinating
topic for anyone interested in linguistics.

=Ed

Previous by Date:	Unicode / Transliteration, Jean-Michel Hiver
Next by Date:	Starnge characters when displaying html files saved in UTF-8 format, Jalal Kakavand
Previous by Thread:	Unicode / Transliteration, Jean-Michel Hiver
Next by Thread:	Re: Unicode / Transliteration, Philip Newton
Indexes:	[Date] [Thread] [Top] [All Lists]