perl-unicode

Re: In-Band Information Considered Harmful

1998-10-24 13:02:42
You, Chip Salzenberg, wrote:
++ 
++ I disagree.  Before you try to make this assertion again, please
++ explain how Perl would properly handle the 'll' case with code-based
++ metadata.  Be sure to allow for the various kinds of metadata nesting
++ behavior: <b> doesn't nest, <li> nests, and <p> marks a spot instead
++ of a region.  And Perl's RE and other character-processing engines
++ need to know this to handle them properly in the 'll' case.

That's rather trivial. Any SGML based document will have a DTD that
gives you this information. Any XML based document is "well formed"
all elements with content must have a closing tag - content free
elements have a different form. (<IMG/>).

As for whether or not <b> is allowed to nest is irrelevant when it
comes to a data instance. It's relevant whether a substring is in
nested element - but that is document specific information.

In the given example, the document will tell you whether 'll' is in
a nested <b> or not. If it isn't, would it be relevant whether it's
allowed?

BTW, in HTML, <p> doesn't mark a spot. It's the opening tag of the P
element; an element with content.


Abigail