perl-unicode

Re: In-Band Information Considered Harmfult

1998-10-24 20:09:08
I think your objection stands, since no one has figured out how to
concisely describe the difference between
        Perl <footnote>...</footnote> is teriffic
and
        perl <footnote>...</footnote> is <emph>not</emph> teriffic

I wrote this a while ago:


I think Tim Bray mentioned that the XML working group was trying to come
up with a standard which would apply some sort of semantics to a DTD. 
No matter what, I think that this semantic information would have to be
known on a per-scalar basis, not kept in the regexp. In terms of
matching against, say, /perl is terrific/, I can think of at least the
following meanings for each tag: Ignore tag, Ignore tag & contents of
tag, Require tag.  For instance, if the string is "perl is
<emph>not</emph> terrific" you would want the <emph> tag to be ignored,
but not the contents.  If the string is "perl<note>well, Perl,
actually</note> is terrific" you would want the <note> tag and its
contents to be ignored.  I can't think of a good example for requiring
the tag to be explicitly matched, but it seems necessary for
orthogonality ?!  Ignoring the partial document problem for a minute, I
could envision this working via magic on the scalar.  Say that the
standard which indicates the semantic meaning of each tag is a Document
Semantics Definition (DSD):

my $dsd = new DSD 'foo.dsd'; # Indicates that <emph> tag is ignored
my $xml = "perl is <emph>terrific</emph>";
$dsd->apply($xml);
$xml =~ /perl is terrific/;  # Matches!

Ducking the rocks,

Benjamin Holzman