Hello,
I am a consultant for a company which uses RT for their internal support.
They asked me to fix a problem they were having where
attaching binary files to a ticket caused the file to become corrupt sometimes.
They tracked it down to the case where the mod_perl
session which serves the request to add the attachment to the ticket has
previously been used to perform some ticket-related operation. I
finally tracked down this problem to a bug in perl. Here is a detailed
description of the problem:
When you attach a file to a ticket using RT it saves the file you attach
into a file into /tmp. It then adds a MIME::Body::File
record to the MIME::Entity which represents the ticket. Later, it calls
make_singlepart() on the MIME::Entity, which converts the entity
into a string. During this process, it calls as_string() on the
MIME::Body::File. This causes the file to be read in and printed into a
string using the IO::Scalar object. IO::Scalar's print() function calls the
function join() on the data as it is read in, before that
data is appended onto the destination string.
The problem occurs inside join(). join() recycles string objects into
which it does the joining, which it later returns. It never
touches the UTF8 flag on these strings. So, on the initial run, it has no
strings to recycle (or few), and when they are created they are
set to ASCII. So all the results of join() are ASCII, which is what MIME and RT
wants, as ASCII is also what is used for processing
binary data. The problem is, on the second and subsequent executions of RT
within the perl system, the recycled strings often have the
UTF8 flag set. So, join ('', $string), where $string is ASCII, will often
return a UTF8 string. When this UTF8 string is later converted
into ASCII it is modified, and so the binary data is corrupted.
The solution is to apply the following patch to perl (tested with perl
5.8.2), which sets the UTF8 flag on the returned string to
something sensible.
diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
--- perl-5.8.2/doop.c 2003-09-30 10:09:51.000000000 -0700
+++ perl-5.8.2-patched/doop.c 2004-01-05 23:23:13.000000000 -0800
@@ -647,6 +647,9 @@
register STRLEN len;
STRLEN delimlen;
STRLEN tmplen;
+ int utf8;
+
+ utf8 = (SvUTF8(del)!=0);
(void) SvPV(del, delimlen); /* stringify and get the delimlen */
/* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
@@ -674,22 +677,37 @@
SvTAINTED_off(sv);
if (items-- > 0) {
- if (*mark)
+ if (*mark) {
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv, *mark);
+ }
mark++;
}
if (delimlen) {
for (; items > 0; items--,mark++) {
sv_catsv(sv,del);
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv,*mark);
}
}
else {
- for (; items > 0; items--,mark++)
+ for (; items > 0; items--,mark++) {
+ utf8 += (SvUTF8(*mark)!=0);
sv_catsv(sv,*mark);
+ }
}
SvSETMAGIC(sv);
+ if( utf8 )
+ {
+ if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
+ {
+ Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII
strings");
+ }
+ SvUTF8_on(sv);
+ } else {
+ SvUTF8_off(sv);
+ }
}
void
There may be other perl functions with similar problems; this is beyond
the scope of my job, however I hope that the maintainers of
perl will be proactive in attempting to find and fix any similar problems, as
the way they have added UTF8 support to perl doesn't make
it obvious when such bugs exist. I'd say that any built-in function that
returns a string should be checked for (a) setting the UTF8 flag
at all and (b) whether the value it sets it to is sensible. Also I think
warnings when mixed types of strings are passed into functions
are sensible as this can be dangerous, and as we don't know what character set
the ASCII strings are in, the routines themselves can't
really handle this case properly if any extended characters are present.
I hope this helps.
Nicholas