perl-unicode

Some problem with UTF-8 charset

2001-12-12 10:48:01
Hi All,

Greetings!

I am having Perl 5.6.1

I am running a script with uses regular expressions to parse an Input
data file. Till now the data file was having normal ASCII character
strings. But from now onwards the datafile will have special characters
under UTF-8 as well as normal ASCII. How do I handle this in my script?
Do I need to do something different? Will the following sample regular
expressiongs that I use in my script will still work:

***********************************************************************
@DATA = <IN>;
foreach $_ (@DATA) {
    chop($_); ## remove the newline
    s/^\s*//; ## remove the spaces from the beginning of line
    s/\s*$//; ## remove the spaces from end of the line
    @list = /<IDOC .*?>(.*?)<\/IDOC>/gm;
    foreach $_ (@list) {
       @srslist = /<E1LFA1M .*?>(.*?)<\/E1LFA1M>/gm;
       foreach $_ (@srslist) {

           # AT this point we have single record
           if (/(.*?)<E1LFB1M .*?>(.*?)<\/E1LFB1M><E1LFM1M
..*?>(.*?)<\/E1LFM1M>/) {
               $srs_line = $1;
               $apv_line = $2;
               $prv_line = $3;
               ## At this point we can take individual element and write
to a file
               $_ = $srs_line;
               @srs_list = /<\w+>\s*(.*?)\s*<\/\w+>/gm;
               %srs_list = /<(\w+)>\s*(.*?)\s*<\/\1>/gm;
***********************************************************************

I have seen the pragma "use utf8", do I need to use it. But I am not
getting much info on this.

Basically I want to parse this file and then the contents of this file,
I need to put in Oracle database with supplore multibute characters or
data of type NVARCHAR2.

Please let me know how I can do this.

Thanks for all your help.

Thanks,
Nilanjan


<Prev in Thread] Current Thread [Next in Thread>