perl-unicode

Re: In-Band Information Considered Harmful

1998-10-23 15:21:03
Chip Salzenberg writes:
I think instead we'd need new metadata escapes in the RE language.
Let's call them \m{X} to require metadata tag X, and \M{X} to forbid
tag X.  e.g.:

    /\m{italic}\m{bold}Yes!/

Note that those codes impose conditions on the following text, they do not
represent embedded codes (a la Ilya or WordPerfect).  Thus any string that
would match the previous example would also match:

    /\m{italic}s/

(partial metadata specification is OK), but would NOT match:

    /\M{bold}s/

(I suppose we could get really brave and allow the {} to be a regex.  Ouch!)

The only thing I don't see as obvious in this scheme is how to access
the additional information associated with a tag when matching.
/\m{a}text/ for anchored /text/ is fine, but once you've found it, how
do you access the anchor HREF -- perhaps because you're only looking
for HREFs to perl.org?  It's possible that we won't be able to express
all that in the RE engine per se, and that we'll have to escape via
(?{}) and use the Perl language primitives.

I think we miss this earlier, on \X level already (I forgot what the
property escape is, and assume it is \X).  One should be able to
encode

        (?!\X{vowel})\X{upper}

as

        \X{^vowel,upper}

Probably | should be allowed too.  Then we can have parameterized
attributes 

        \X{^vowel,upper,collating=>O}

This is very close to

        \M{a,http=~/perl.org/}

Ilya