perl-unicode

Re: detecting and opening unicode files?

1999-08-07 15:55:39
Hi Dan,
Thanks for your help. It turns on out my Windows NT4.0 system that the lstat
command is not returning anything:


($dev,$ino,$mode,$nlink) = lstat($_);  # where $_ happens to be a unicode file
print "lstat output:$dev,$ino,$mode,$nlink \n";


Only the commas get printed when $_ is a unicode file (turns out to be a
Japanese language file).  If I copy my unicode file to my Unix system (SGI
Irix) then perl is able to stat the file, but on WindowsNT4.0 it isn't able to
stat.

Any ideas?

Thanks,

Tom




On Aug 4,  4:07pm, Tom Shou wrote:
Subject: Re: detecting and opening unicode files?
Dan,
Thanks for the sample code. Someone else here is bringing me a unicode
Japanese
file for me to test so I haven't had a chance to run the script against a
non-English file but it should run fine. The question I have now is how do I
'cd' to a directory that has a unicode name? The script you wrote takes the
directory as an argument, but on my NT system the path is a unicode name that
just appears as "??????" when I do a dir command so there is not a way for me
to  pass the directory name. Is there a way to open a directory in binary
mode
and still generate an array of the files and directories in the directory?


Thanks,

Tom Shou



On Aug 5,  6:51am, Dan Kogai wrote:
Subject: Re: detecting and opening unicode files?
Tom Shou said;
I have a perl script running on a Windows NT 4.0 system which calculates
file checksums in a directory tree but it turns out that some of the
files and directories in the tree are unicode files (e.g. Japanese) so
when my script encounters these files/directories it can't figure out
what to do with them. I've been searching the web but I'm not sure how
to open a unicode file (I don't care about the contents since I'm just
doing a byte checksum). My script is shown below.

  I don't think it's a problem of Unicode per se;  It's a problem of
binmode.  When you feed camel a non-ascii text file, you often treat the
file as binary. Just binmode(MYFILE) as soon as you open it and see what
happens.

  BTW, Sorry to tell you this but your code is yucky!.  Here is my version
that does that does the same (Well, not exactly.  See SYNOPSIS).  I couldn'
t help rewriting it (too me 5 minutes so it didn't hurt me much).  Tested
on FreeBSD. Untested on Win32 env (But I know it will work).

#!/usr/local/bin/perl

=head SYNOPSIS

perl scriptname [list of directories]

=cut

use strict;

for my $ARGV (@ARGV){
    &do_sum($ARGV);
}

sub do_sum{
    use File::Find;
    use Fcntl;
    my $dir = shift;
    my ($count, $csum, $gsum, $content);
    find(sub {
        return unless lstat($_) and -f _; # check only files
        $count++;
        sysopen MYFILE, $_, O_RDONLY or die "$_:$!";
        # "open MYFILE, $_" may fail if $_ contains spaces and other
        # strange characters.
        binmode(MYFILE); # should not be necessary because
                         # we no longer use <MYFILE> but
                         # it won't hurt either
        read MYFILE, $content, -s $_; # another way to read entire file
        $csum = unpack ("%32C*", $content) & 32767;
        $gsum += $csum;
    }, $dir);
    print "$dir: num files = $count, sum of checksums= $gsum\n";
}

__END__

Dan the Camel Tamer
________ DAN Kogai (CEO, DAN co. ltd.)
      _/ __  Tel:+81 3-5433-7565          Fax:+81 3-5433-7566
     /_ /+/  6-35-5 Shimouma Setagaya Tokyo 154-0002 Japan
     _/-/---- http://www.dan.co.jp/ -----------------------------------
-- End of excerpt from Dan Kogai



--
____________________________________________________________________
Tom Shou | shou(_at_)engr(_dot_)sgi(_dot_)com | SGI | 650.933.5362 | 
650.932.0687 fax

http://reality.sgi.com/shou_engr
-- End of excerpt from Tom Shou



-- 
____________________________________________________________________
Tom Shou | shou(_at_)engr(_dot_)sgi(_dot_)com | SGI | 650.933.5362 | 
650.932.0687 fax

http://reality.sgi.com/shou_engr 

<Prev in Thread] Current Thread [Next in Thread>