Practical problems with custom .ucm based encoding

Hello,

The cool Encoding support in 5.8 to be enables me to properly solve a
very common task: making HTML entities out of utf-8 data.

I generated a ucm file with entries like this:

    <U00A0> \x26\x6E\x62\x73\x70\x3B                 |0 # nbsp

The resulting Encode::HTMLEntities encoding works perfectly. However, I
want it to do more.

Not every unicode character has a corresponding entity. Unknown ones can
be encoded like &#8364;, so I would like my Encoding to use a simple
function as a fallback. This proves hard. With CHECK == Encode::FB_WARN
it looks like the whole string is left untouched, so my plan to just
substr() off the first character, handle it by hand and repeat is not
going to work.

I'd be very happy with a CHECK mode which would allow me to handle a
single problematic character in perl. Having to find it in a longer
string is very hard in this case, because it's every character > 0x{7f}
which is not in my .ucm file.

-- 
Bart.

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

FYI: Encode performance on Japanese encodings, Dan Kogai

Next by Date:

Re: Practical problems with custom .ucm based encoding, Dan Kogai

Previous by Thread:

FYI: Encode performance on Japanese encodings, Dan Kogai

Next by Thread:

Re: Practical problems with custom .ucm based encoding, Dan Kogai

Indexes:

[Date] [Thread] [Top] [All Lists]