Re: Implementing encoded-character

On Wed, 2007-04-04 at 14:25 +0200, Michael Haardt wrote:

yes.  whitespace is only allowed between hex-pairs.  btw, how do you
feel about allowing CRLF as well as SPC and TAB between hex-pairs?

Is CRLF allowed inside other ${} expressions (variables)?

variables doesn't allow any whitespace at all.


Hmm, right, variables contain no arguments and we don't have functions
yet.  Thinking about string expressions, I certainly would like to
have CRLF as white space, but I also would like embedded comments in
that case.  Just looking at encoded-character, I see no need for CRLF
and even have an odd feeling with, but considering it as syntactic
prototype for string expressions, both CRLF and comments sound useful.

I kind of like the idea of things that look like variables but are
functions operating on the right side of the colon.

We had a bit of discussion in Prague about list expansions that access
external data sources. This would certainly be one way to handle it,
though we'd have to be careful about strict vs. lazy evaluation. Anyhow,
that should probably be the subject of a separate thread.

Before looking at it, I expected that if ${hex: is found, it would be
an error if it were not followed by arguments and a closing brace.


well, you don't need to backtrack much:


It's no problem really, just confusing.  If someone starts to write
${hex:, most likely he meant to encode data.  Only CS people think stuff
like "it's not a word of the grammar, thus of course being an literal
as specified". ;-)

I agree, it'd be confusing for that to happen.

[snip]

"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"

Ugh, if it looks like encoded-char and walks like encoded-char...

My test implementation left-shifts the current value of the encoded
character, then adds the next hex digit. When it hits whitespace, it
checks if the value is within appropriate bounds; if so, stores the
character then loops, if not, stores '?' then loops. Would we really
rather be very strict about this? I'm in favor of some flexibility.


You need to strictly implement the grammar in the specificaiton, whatever
that ends up being. Any flexibility will allow someone to write one of
these things that works in your implementation but silently fails and causes
wierd results elsewhere.

Past experience with RFC 2047 encoded-words has shown that allowing leeway in
this situations is a curse, not a blessing.

                                        ned