perl-unicode

Re: Warnings on illegal UTF8

1998-10-11 08:35:12
"GA" == Gisle Aas <gisle(_at_)aas(_dot_)no> writes:

GA> One could suggest that we should make length() return undef for bad
GA> UTF-8 strings (inside 'use utf8') and no warnings.

GA> Problem with this proposal is that it would make length() much slower.
GA> Today it only looks at the first byte of each UTF-8 byte-sequence and
GA> then skips all the 10xx xxxx bytes without looking at them.  Problem
GA> with your string is that both '\xFC' and '\xDF' are legal UTF-8 start bytes,
GA> I agree that there should be some simple way to determine if a
GA> sequence is valid UTF-8.  Some new pragma to make length() more
GA> careful?

Are the SV's marked as containing UTF8? If not, perhaps such a marker
would be useful, then when the string is first loaded it could be
examined for validity and marked. With the other functions trusting
the bit setting.

Then the user could be given various control levels. When and how the
bit is set. Whether or not to trust the bit setting. And of course,
someway to set the bit.

<chaim>
-- 
Chaim Frenkel                                        Nonlinear Knowledge, Inc.
chaimf(_at_)pobox(_dot_)com     <<< New Email Address                  
+1-718-236-0183

<Prev in Thread] Current Thread [Next in Thread>