Hi guys,
In the laborious process of 'unicodizing' my application, I'm coming
across quite a few "funny" issues.
In the context of a web application, URI-encoded unicode string are
absolutely awful. It's actually nothing new, I had the same problem
with the 8-bit ISO-8859-15 character set, where spaces are turned into
%20 and so on.
The way I got around this was to build a lossy table mapping
ISO-8859-15 to US ASCII, and then applying a few simple regexes so
that a sentence like "Le rêve du café" gets turned into
"le-reve-du-cafe". Not only is this useful to get cool looking URIs,
but it's also useful to build search engines that actually match
'café', 'cafe', 'CäFê', etc.
The problem is that as you can imagine, on a character set wider than
latin-1, things get slightly trickier, especially when you realize
that 'Dingbats' is in the Unicode charset ;-)
Ideally I would like to write a CPAN Unicode::Transliterate module
that could be modular enough to dynamically import transliteration
tables from any charset to any other charset, and eventually depending
on the language (for example, the japanese word 'roku' might actually
sound better if written 'lok' when read in French).
I would like to know if you had any suggestions on how I should do
that.
As for the interface, I was thinking of:
package Unicode::Transliterate
+@ Unicode::Transliterate new ('transliterator1', 'transliterator2', etc)
+ Unicode::String process (Unicode::String $string)
where 'transliteratorX' is the name of a transliteration class to use
(i.e. Unicode::Transliterate::ISO_8859_15::ASCII for the ISO_8859_15
to ASCII transliteration table).
As for the implementation, I'm still trying to get my head around doing
something not so-slow without having to go XS. Any ideas or suggestions?
Cheers,
--
IT'S TIME FOR A DIFFERENT KIND OF WEB
================================================================
Jean-Michel Hiver - Software Director
jhiver(_at_)mkdoc(_dot_)com
+44 (0)114 221 4968
================================================================
VISIT HTTP://WWW.MKDOC.COM