perl-unicode

Re: Help slurping a file? -- Solved

2005-10-28 07:07:55
Hi,
Thanks for responding to my email.
I'll look into the perlmonks list. I'm still getting used to the quirks of
perl. I was really trained in java.

I found out, since I posted this question, that the problem wasn't with the
slurping at all.
It was kind of a stupid error on my part, actually.
The way I was testing was by printing to System Out. Although in TextPad,
and in my java interpreter, the carraige return character displays as a
newline,  my perl interpreter system out ignores the carraige return , and
prints the most recent output line over the previous line. Since each line
length is the same size,  it look like it is only reading a total of one
line.

So, basically, your solution is exactly right.

Thanks,
Renee


Quoting David Graff <graff(_at_)ldc(_dot_)upenn(_dot_)edu>:


reneeh(_at_)stanford(_dot_)edu said:
I'm not sure if this is the correct group to post this question to. If
there is a better forum for this kind of question, please let me know.

There doesn't seem to be any reference to unicode in your question, so
it probably lacks relevance to the perl-unicode list... (You can try
www.perlmonks.org -- they love the kind of stuff you're describing.)

In any case, I have found that "typical" line-termination patterns on
macosx depend on the application that creates the file.  I use mostly
unix-based apps, so on my powerbook, most of my text files have just
"\n",
but I have seen both "\r" and "\r\n" as well.

Have you tried something like this:

#!/usr/local/bin/perl

my $fname = "path/name_of.file";
open( IN, $fname );
{
    $/ = undef;
    $_ = <IN>;
}
close IN;

printf("File size = %d, slurped string size = %d\n", -s $fname,
length());

__END__

If things are kosher, the two numbers shown by the printf should be
equal,
and if that's the case, the next question is figuring out how to split $_
into lines.  This should work for just about every case:

  @lines = split /[\r\n]+/;

(That will obliterate blank lines.  If it's important to keep track of
blank lines, put parens around the regex to capture the line termination
characters -- each string of ([\r\n]+) will be saved in @lines,
interleaved
between the non-empty lines that they separate.)

      Dave Graff





Renee Halbrook
Bioinformatics Programmer
The Carnegie Institution of Washington
Department of Plant Biology
260 Panama Street
Stanford, CA 94305

<Prev in Thread] Current Thread [Next in Thread>