On Tue, 2007-04-10 at 19:12 +0000, Aaron Stone wrote:
Something like this:
encoded-character = "${" encoded-char-scheme ":" encoded-char-seq
"}"
encoded-char-scheme = hex / unicode
encoded-char-seq = *(LWSP WSP 1*HEXDIG) LWSP
if we allow ${hex:100} in the grammar, we need to say something in the
text about the valid range. I would prefer to stick to separate
productions for encoded-arb-octets and encoded-unicode-char to keep the
text simple and to minimise the change to the text.
Note that LWSP is optional by definition,
ouch, good catch!
so we have to include SP or WSP
to force some kind of separator between 1*HEXDIG's. Note that this is not
valid according to the syntax above,
${unicode:
123
ABC
}
..because 123 and ABC do not have WSP between them. Use WSP / CR / LF? Is
there some variant of LWSP that mandates at least one character of
something be present?
LWSP requires WSP after CRLF, too, so it's simply not what we want, we
need to add another basic terminal, perhaps
blank = WSP / CRLF
I suggest we stick to the poll question from Alexey, but with "1*blank"
replacing LWSP in his suggested new text.
I think there are three options for values that are out of range:
1. Throw an error and reject the script.
2. Ignore the offending value.
3. Insert some placeholder like ' ' or '?'.
I don't think we need to revisit this question.
I concur that comments should not be allowed.
--
Kjetil T.