[Top] [All Lists]

Re: Implementing encoded-character

2007-04-04 03:39:12

"${hex:40" -> "${hex:40"
"${hex: 40 }" -> "${hex: 40 }"

yes.  whitespace is only allowed between hex-pairs.  btw, how do you
feel about allowing CRLF as well as SPC and TAB between hex-pairs?

Is CRLF allowed inside other ${} expressions (variables)?

"${unicode:40}" -> "${unicode:40}"

no, this is "@".

Good thing I asked.  I just reread RFC 2234 and found out that I
have to read 1*6HEXDIG as 1*6(HEXDIG), not as 1*(6HEXDIG).

There is no word of the encoded-character grammar inside the string,
taking everything literal.

I don't understand this statement.

The grammar matches words inside the character sequence that makes up a
string.  No matter how much of a word is matched, if it is not complete,
it will be taken as the literal character sequence.  That means you
need infinite look-ahead.

Before looking at it, I expected that if ${hex: is found, it would be
an error if it were not followed by arguments and a closing brace.

"${hex:40${hex:40}}" -> "${hex:40$}

no, "${hex:40(_at_)}"

Oops, I meant to write @.  But you agree on my interpretation how things
are processed.

"${unicode:020000}" -> error

Unicode range violation.

no, U+20000 is inside the Unicode range.  ${unicode:0020000} fails due
to not matching unicode-hex (too many digits), ${unicode:200000} fails
due to being outside the Unicode range.

My mistake, again.  So we have:

"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"

I don't particularly like that, because most likely the second was
never meant that way.  Is there any way to change that at this point?