Re: MHonArc and multi-byte characters in HTML

2001-10-06 15:45:16
On October 6, 2001 at 16:28, Greg Matheson wrote:

I'm unsure how to deal with the string clipping issue with respect to
resource variables: e.g. $SUBJECT:72$.  I see this a fundamental issue
with Perl itself since there is no built-in string type that abstracts
this problem (like strings in Java) in a simple and efficient matter,
yet.  An approach that would ignore the problem but make sure nothing
bad happens is to change all default resources settings to not using
the clipping support in resource variables.  Therefore, any clipping
must be explicitly specified under the advisory of the problems that
multi-byte character encodings may cause.  I believe I will go make
this kind of change to default resource settings for v2.5.

the only effect would be on half a character, which is minimal

Not necessarily.  The effect can be multiple characters depending
on the encoding that is used.  For example, for variable-width encodings,
if the clip occurs on a byte that denotes a shift, all data that follows
can be affected.  Remember, resource variables are just a part of the
entire character stream.  The text after the resource variable
can be affected in how it gets rendered.

and would even indicate the variable had been clipped, so I 
think disabling clipping in the case of multi-byte character
encodings would be a worse cure than the disease. 

Where did I state anything would be disabled.  All that I stated is
that default resource values would be changed to not use clipping
in any resource variables.  Users will still be free to use it
if they know it is not an issue for their message data.  Note,
I believe MSGPGBEGIN is the only resource with a default value
that uses resource variable clipping: $SUBJECT:72$.

Plus, there is currently no easy way to tell what the actual encoding
is in effect for a given "string", so disabling is not possible.  If
it was, then the problem would not be a problem since if the encoding
was known, the clipping code could be smart to take the encoding into


<Prev in Thread] Current Thread [Next in Thread>