Re: [Encode] UCS/UTF mess and Surrogate Handlings



Dan Kogai wrote:

...
Okay, here is my strategy.

                decode("\x{8C00}-\0x{8FFFF}")   encode("\x{10000}-\x{10FFFF}")


The Unicode consortium does discuss this:

http://www.unicode.org/versions/corrigendum1.html

    Corrigendum #1: UTF-8 Shortest Form

    The conformance clause C12 in The Unicode Standard, Version 
    3.0 forbids the generation of "non-shortest form" UTF-8, and 
    forbids the interpretation of illegal sequences, but not the 
    interpretation of "non-shortest form". Where software does 
    interpret the non-shortest forms, security issues can arise. 
    For example:

    Process A performs security checks, but does not check for
    non-shortest forms. 
    
    Process B accepts the byte sequence from process A, and 
    transforms it into UTF-16 while interpreting non-shortest 
    forms. 
    
    The UTF-16 text may then contain characters that should 
    have been filtered out by process A. 

You might want to consider adding a security override for this.

Brian Stell

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re[2]: [PATCH 1/2 + 0.1] Supported.pod, Anton Tagunov

Next by Date:

Re: what now? (background), Dan Kogai

Previous by Thread:

Re: [Encode] UCS/UTF mess and Surrogate Handlings, Dan Kogai

Next by Thread:

Re: [Encode] UCS/UTF mess and Surrogate Handlings, Nick Ing-Simmons

Indexes:

[Date] [Thread] [Top] [All Lists]