[Top] [All Lists]

Re: Implementing encoded-character

2007-04-04 04:51:18

On Wed, 2007-04-04 at 11:23 +0200, Michael Haardt wrote:
yes.  whitespace is only allowed between hex-pairs.  btw, how do you
feel about allowing CRLF as well as SPC and TAB between hex-pairs?

Is CRLF allowed inside other ${} expressions (variables)?

variables doesn't allow any whitespace at all.

I don't understand this statement.

The grammar matches words inside the character sequence that makes up a
string.  No matter how much of a word is matched, if it is not complete,
it will be taken as the literal character sequence.  That means you
need infinite look-ahead.

Before looking at it, I expected that if ${hex: is found, it would be
an error if it were not followed by arguments and a closing brace.

well, you don't need to backtrack much:

  ${unicode:cafe ab ab ab ab ab ab
     ab ab add ${hex:40 41}}

you just go along, and as soon as you find a syntax "error", you bail
and copy what you've buffered so far verbatim (in this case,
"${unicode: ... add "), then restart the state machine.  worst case, the
buffering is the size of the script plus storage for the decoded Unicode
characters while parsing the script.

So we have:

"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"

I don't particularly like that, because most likely the second was
never meant that way.  Is there any way to change that at this point?

you want to change unicode-hex to 1*HEXDIG instead?  the wording should
already handle it, so it's just the ABNF which needs a tweak.  that's
fine with me.  I think it's Philip's call.

Kjetil T.