perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 14:45:26
Chaim Frenkel writes:
IZ> I have a Tk widget with tags, and want to search for bold letter X
IZ> which follows non-bold one with Perl regexp.  How do you propose to do
IZ> it with non-sequential data?

Why does data representation have to effect how the syntax for the search
is specified?

Suppose that "<b>" "</b>" represent many-bits-chars marked as of width
0.  Then to look for "<b>foo</b> bar" literally you just write

    m%<b>foo</b> bar%i ;

to look for "foo bar" you do

    use utf8_with_width;
    m%foo bar%i ;

and both cases match "<b>foo</b> bar".  

In other words: if you *want* to match for 0-width data in the string,
you just ask to match for it.  If you want to ignore the 0-width data,
then the REx engine will do it for you.

In yet other words: you do not need to introduce a new syntax for
matching meta-data.

Ilya