[Top] [All Lists]

Re: Implementing encoded-character

2007-04-05 11:02:18

yes.  whitespace is only allowed between hex-pairs.  btw, how do you
feel about allowing CRLF as well as SPC and TAB between hex-pairs?

Is CRLF allowed inside other ${} expressions (variables)?

variables doesn't allow any whitespace at all.

Hmm, right, variables contain no arguments and we don't have functions
yet.  Thinking about string expressions, I certainly would like to
have CRLF as white space, but I also would like embedded comments in
that case.

I would like to allow CRLF but comments IMO go way too far.

The argument for allowing CRLF is that it is really needed to allow reasonable
formatting of long runs of hex-encoded stuff. If, say, you want to write a few
hundred bytes worth of material, with the current proposal you either have to
put it all on one line, hope you can find a CRLF in there that you can leave
unencoded and thus create a line break (unlikely), or use a series of set
actions to build up the string piecemeal (ugly and requires variables support).

The same cannot be said of comments - you are free to put one in front of the
string or at the end and end up with something that's readable. Of course I
suppose you could argue that there are cases where it is clearer to have the
comment in the middle, but I have to say I find that to be a fairly contrived

  Just looking at encoded-character, I see no need for CRLF
and even have an odd feeling with, but considering it as syntactic
prototype for string expressions, both CRLF and comments sound useful.

Before looking at it, I expected that if ${hex: is found, it would be
an error if it were not followed by arguments and a closing brace.

well, you don't need to backtrack much:

It's no problem really, just confusing.  If someone starts to write
${hex:, most likely he meant to encode data.  Only CS people think stuff
like "it's not a word of the grammar, thus of course being an literal
as specified". ;-)

"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"

I don't particularly like that, because most likely the second was
never meant that way.  Is there any way to change that at this point?

you want to change unicode-hex to 1*HEXDIG instead?  the wording should
already handle it, so it's just the ABNF which needs a tweak.  that's
fine with me.  I think it's Philip's call.

Yes, that would be more logical.  I consider ${hex: and ${unicode: as
functions of constant arguments.  No matter which argument is passed
to them: Syntax errors (like a missing brace) should cause an error,
and semantic errors like range overflows should cause an error, too.
It's bizarre to see 0x200000 being an overflow, but 0x2000000 causing
everything to be taken literally.

I'm not wild about changing from 1*6 to 1* but I can live with it if need be.
That said, I completely disagree with your assessment of when it is appropriate
to generate an error. These "overlay" syntaxes always have a tension between
syntactic generality and potential collision with regular strings people might
want to use - the more general you make your syntax the more likely you are to
collide with some legitimate regular string. So, while I have no major issue
with allowing 1*HEXDIG, only to call overly long strings an error, I have a
major problem with allowing something like "${hex:" or even "${hex" or "${" to
match and then generate an error.