perl-unicode

Unicode / Transliteration

2001-12-10 09:49:46
Hi guys,

  In the laborious process of 'unicodizing' my application, I'm coming
  across quite a few "funny" issues.
  
  In the context of a web application, URI-encoded unicode string are
  absolutely awful. It's actually nothing new, I had the same problem
  with the 8-bit ISO-8859-15 character set, where spaces are turned into
  %20 and so on.

  The way I got around this was to build a lossy table mapping
  ISO-8859-15 to US ASCII, and then applying a few simple regexes so
  that a sentence like "Le rêve du café" gets turned into
  "le-reve-du-cafe". Not only is this useful to get cool looking URIs,
  but it's also useful to build search engines that actually match
  'café', 'cafe', 'CäFê', etc.

  The problem is that as you can imagine, on a character set wider than
  latin-1, things get slightly trickier, especially when you realize
  that 'Dingbats' is in the Unicode charset ;-)

  Ideally I would like to write a CPAN Unicode::Transliterate module
  that could be modular enough to dynamically import transliteration
  tables from any charset to any other charset, and eventually depending
  on the language (for example, the japanese word 'roku' might actually
  sound better if written 'lok' when read in French).
  
  I would like to know if you had any suggestions on how I should do
  that.
  
As for the interface, I was thinking of:

  package Unicode::Transliterate

  +@ Unicode::Transliterate new ('transliterator1', 'transliterator2', etc)
  + Unicode::String process (Unicode::String $string)

  where 'transliteratorX' is the name of a transliteration class to use
  (i.e. Unicode::Transliterate::ISO_8859_15::ASCII for the ISO_8859_15
  to ASCII transliteration table).


As for the implementation, I'm still trying to get my head around doing
something not so-slow without having to go XS. Any ideas or suggestions?

Cheers,
-- 
IT'S TIME FOR A DIFFERENT KIND OF WEB
================================================================
  Jean-Michel Hiver - Software Director
  jhiver(_at_)mkdoc(_dot_)com
  +44 (0)114 221 4968
================================================================
                                      VISIT HTTP://WWW.MKDOC.COM

<Prev in Thread] Current Thread [Next in Thread>