perl-unicode

Re: In-Band Information Considered Harmfult

1998-10-24 18:59:26
At 04:16 PM 10/24/98 -0400, somebody wrote:

++ Asking the question whether the pattern:
++ 
++     print if /Perl is terrific/i;
++ 
++ matches which of the following lines:
++ 
++     perl<!-- capitalize? --> is terrific
++     perl is<? AUDIO heavenly chorus ?> terrific
++     Perl<fnote isbn="1-56592-149-6" /> is terrific
++     perl is<emph>not so</emph> terrific
++     perl <quot>is terrific</quot>, commented Bray
++     <!ENTITY adjective "terrific">perl is &adjective;!
++ 
++ The last one requires parsing of XML before it can be made to match.
++ Other than that, though, simply ignoring all of the metadata is
++ sufficient to match correctly on 1235 and to properly not match on
++ 4.

Sorry for coming late to this party.  I invented that set of examples.
My point was that it is in principle *impossible*, in lots of cases,
to decide what is "correct" without additional information.  

It makes me nervous for someone to postulate that there is such a 
thing as a "correct" match to any of 12345.  For things like comments,
PIs, and entity refs, perl could decree that the policy is thus-and-so,
and be consistent (not necessarily correct for lots of application needs) -
but when you have tags in the way (cases 3-5) the situation is hairy
indeed.

Having said that, I'll shut up and lurk for a while until I get more
context. -Tim

<Prev in Thread] Current Thread [Next in Thread>
  • Re: In-Band Information Considered Harmfult, Tim Bray <=