[Top] [All Lists]

Re: Implementing encoded-character

2007-04-09 22:16:27

On Thu, 2007-04-05 at 21:10 +0000, Aaron Stone wrote:
On Thu, Apr 5, 2007, Ned Freed <ned(_dot_)freed(_at_)mrochek(_dot_)com> said:

"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"

Ugh, if it looks like encoded-char and walks like encoded-char...

My test implementation left-shifts the current value of the encoded
character, then adds the next hex digit. When it hits whitespace, it
checks if the value is within appropriate bounds; if so, stores the
character then loops, if not, stores '?' then loops. Would we really
rather be very strict about this? I'm in favor of some flexibility.

You need to strictly implement the grammar in the specificaiton, whatever
that ends up being. Any flexibility will allow someone to write one of
these things that works in your implementation but silently fails and causes
wierd results elsewhere.

Past experience with RFC 2047 encoded-words has shown that allowing leeway 
this situations is a curse, not a blessing.

Indeed, point taken!

It's not strict yet (I'll cross that bridge when we agree on where it is ;-),
it just translates the hex values to utf-8. And now, counting from 0-9 in
Western Arabic, Eastern Arabic and Amharic (thanks!)...

Converting [${unicode:30 31 32 33 34 35 36 37 38 39}]
        to [0123456789] length 11

Converting [${unicode:06f0 06f1 06f2 06f3 06f4 06f5 06f6 06f7 06f8 06f9}]
        to [۰۱۲۳۴۵۶۷۸۹] length 21

Converting [${unicode:1369 136a 136b 136c 136d 136e 136f 1370 1371 1372}]
        to [፩፪፫፬፭፮፯፰፱፲] length 31

(Are there any number systems up in the four bytes per symbol ranges?)

If anybody would like to use my code, I'd be happy to make it available
without restriction. It's all of 100 lines, and most of the fun was
generating utf-8 by hand.