I don't know the details of what you're trying to do, so I'll just tell
you what I have done to deal with multiple character sets. It may help
you; it may not.
If all you want to do is recognize and save/pass along the strings in
other character sets, try replacing
use utf8;
with
use bytes;
"use bytes;" works for utf8 strings as well as strings in other
character sets.
I think it is best to perform regular expressions on UTF-8 strings -
then you can use general property classes such as \p{IsAlpha}. For
these types of regular expressions I switch to "use utf8;" for that one
statement and then switch back to "use bytes;".
I use Text::Iconv for transferring data between UTF-8 and other
character sets. With the project I'm working on, we always do our
processing in UTF-8 and transfer to/from other character sets only for
saving and returning data, and only when absolutely necessary (e.g. HTTP
file downloads to OS's that only understand certain character sets for
file names and file contents). We want our inner modules to be as
generic as possible, and UTF-8 solves our problem better than anything
else - since it handles all languages. Some people might think this is
too much work, but for our complex framework it's the only way it will work.
For web forms: in order to always get UTF-8 from form posts, we display
our web pages in UTF-8 and use the following <meta> tag in the <head> tag.
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Thanks,
Mary
Mags Doheny wrote:
Hi,
i need to get my perl scripts to recognize strings encoded in other
charsets; the utf8 pragma does the trick for unicode; does anyone know
of other pragmas available for, say, the iso-8859-x charsets?
Thanks,
Mags/