Re: bytes::substr() ?

At 9:07 am -0500 27/8/03, ed-perluni(_at_)inkdroid(_dot_)org wrote:

I'm working with a byte oriented protocol, and need to extract byte n1 through
byte n2 from a string. Problem is, the string can be UTF8, and substr() is
character oriented. What (if anything) is the best way to do this in Perl?

Untitled 3.txt contains the two Chinese characters - = (one, two)and is saved as UTF-8.

When I run this script in the bash shell, the contents of the file isread as six bytes and I can get whatever substring I like of thatsix-byte string. What you see in the shell is not necessarily whatyou are getting if you have Terminal set to display UTF-8 ascharacters. The results will look far more like what you want inBBEdit.



perl -e 'open F, "$ENV{HOME}/Desktop/Untitled 3.txt" or die $!;
$s = <F>;
print "length: ", length $s, $/ ;
print "$s\n" ;
print substr $s, 0, 4'
length: 6
- =
-?

Previous by Date:	[ANN] Unicode::Collate 0.27 released, SADAHIRO Tomoyuki
Next by Date:	Re: bytes::substr() ?, Jarkko Hietaniemi
Previous by Thread:	[ANN] Unicode::Collate 0.27 released, SADAHIRO Tomoyuki
Next by Thread:	Re: bytes::substr() ?, Jarkko Hietaniemi
Indexes:	[Date] [Thread] [Top] [All Lists]