perl-unicode

Re: bytes::substr() ?

2003-09-02 17:30:12
At 9:07 am -0500 27/8/03, ed-perluni(_at_)inkdroid(_dot_)org wrote:

I'm working with a byte oriented protocol, and need to extract byte n1 through
byte n2 from a string. Problem is, the string can be UTF8, and substr() is
character oriented. What (if anything) is the best way to do this in Perl?

Untitled 3.txt contains the two Chinese characters - = (one, two) and is saved as UTF-8.

When I run this script in the bash shell, the contents of the file is read as six bytes and I can get whatever substring I like of that six-byte string. What you see in the shell is not necessarily what you are getting if you have Terminal set to display UTF-8 as characters. The results will look far more like what you want in BBEdit.


perl -e 'open F, "$ENV{HOME}/Desktop/Untitled 3.txt" or die $!;
$s = <F>;
print "length: ", length $s, $/ ;
print "$s\n" ;
print substr $s, 0, 4'
length: 6
- =
-?

<Prev in Thread] Current Thread [Next in Thread>