perl-unicode

Re: In-Band Information Considered Harmfult

1998-10-23 18:09:28
Adam Turoff wrote :
|| John Macdonald wrote:
|| > As I said, Tim Bray did make the case a lot more clearly at the
|| > conference than I just did.  I'm afraid I didn't write down his
|| > slide, though.  Did anyone write it down?  Was it on the CD?  Maybe
|| > we should just ask Tim.
|| 
|| http://www.textuality.com/px

Thanks.  The example I was referring to is in

    http://www.textuality.com/talk/px/show/s-60.html
    ...
    http://www.textuality.com/talk/px/show/s-65.html

Asking the question whether the pattern:

    print if /Perl is terrific/i;

matches which of the following lines:

    perl<!-- capitalize? --> is terrific
    perl is<? AUDIO heavenly chorus ?> terrific
    Perl<fnote isbn="1-56592-149-6" /> is terrific
    perl is<emph>not so</emph> terrific
    perl <quot>is terrific</quot>, commented Bray
    <!ENTITY adjective "terrific">perl is &adjective;!

The last one requires parsing of XML before it can be made to match.
Other than that, though, simply ignoring all of the metadata is
sufficient to match correctly on 1235 and to properly not match on
4.

So, it looks like my memory is faulty - I had though that there was
at least one example that had text between a pair <xx> and </xx>
where the meaning of xx was such that the text should *not* be
considered for matching purposes.  Unless the third line could be
written as:

    Perl<footnote> Programming Perl, Wall et al., isbn="1-56592-149-6"
    </footnote> is terrific

I have to retract my objection.  (But if there *can* be such
non-text, it is a problem.)

-- 
objects:                                    | John Macdonald
    Think of them as data with an attitude. |   jmm(_at_)elegant(_dot_)com