perl-i18n

Solution to corrupt attachments problem with RT3 and perl

2004-01-05 23:25:05

   Hello,
      I am a consultant for a company which uses RT for their internal support. 
They asked me to fix a problem they were having where
attaching binary files to a ticket caused the file to become corrupt sometimes. 
They tracked it down to the case where the mod_perl
session which serves the request to add the attachment to the ticket has 
previously been used to perform some ticket-related operation. I
finally tracked down this problem to a bug in perl. Here is a detailed 
description of the problem:

      When you attach a file to a ticket using RT it saves the file you attach 
into a file into /tmp. It then adds a MIME::Body::File
record to the MIME::Entity which represents the ticket. Later, it calls 
make_singlepart() on the MIME::Entity, which converts the entity
into a string. During this process, it calls as_string() on the 
MIME::Body::File. This causes the file to be read in and printed into a
string using the IO::Scalar object. IO::Scalar's print() function calls the 
function join() on the data as it is read in, before that
data is appended onto the destination string.

      The problem occurs inside join(). join() recycles string objects into 
which it does the joining, which it later returns. It never
touches the UTF8 flag on these strings. So, on the initial run, it has no 
strings to recycle (or few), and when they are created they are
set to ASCII. So all the results of join() are ASCII, which is what MIME and RT 
wants, as ASCII is also what is used for processing
binary data. The problem is, on the second and subsequent executions of RT 
within the perl system, the recycled strings often have the
UTF8 flag set. So, join ('', $string), where $string is ASCII, will often 
return a UTF8 string. When this UTF8 string is later converted
into ASCII it is modified, and so the binary data is corrupted.

      The solution is to apply the following patch to perl (tested with perl 
5.8.2), which sets the UTF8 flag on the returned string to
something sensible.

diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
--- perl-5.8.2/doop.c   2003-09-30 10:09:51.000000000 -0700
+++ perl-5.8.2-patched/doop.c   2004-01-05 23:23:13.000000000 -0800
@@ -647,6 +647,9 @@
     register STRLEN len;
     STRLEN delimlen;
     STRLEN tmplen;
+    int utf8;
+
+    utf8 = (SvUTF8(del)!=0);
 
     (void) SvPV(del, delimlen); /* stringify and get the delimlen */
     /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
@@ -674,22 +677,37 @@
        SvTAINTED_off(sv);
 
     if (items-- > 0) {
-       if (*mark)
+       if (*mark) {
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv, *mark);
+       }
        mark++;
     }
 
     if (delimlen) {
        for (; items > 0; items--,mark++) {
            sv_catsv(sv,del);
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv,*mark);
        }
     }
     else {
-       for (; items > 0; items--,mark++)
+       for (; items > 0; items--,mark++) {
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv,*mark);
+       }
     }
     SvSETMAGIC(sv);
+    if( utf8 )
+    {
+        if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
+       {
+           Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII 
strings");
+       }
+        SvUTF8_on(sv);
+    } else {
+        SvUTF8_off(sv);
+    }
 }
 
 void

      There may be other perl functions with similar problems; this is beyond 
the scope of my job, however I hope that the maintainers of
perl will be proactive in attempting to find and fix any similar problems, as 
the way they have added UTF8 support to perl doesn't make
it obvious when such bugs exist. I'd say that any built-in function that 
returns a string should be checked for (a) setting the UTF8 flag
at all and (b) whether the value it sets it to is sensible. Also I think 
warnings when mixed types of strings are passed into functions
are sensible as this can be dangerous, and as we don't know what character set 
the ASCII strings are in, the routines themselves can't
really handle this case properly if any extended characters are present.

      I hope this helps.

            Nicholas


<Prev in Thread] Current Thread [Next in Thread>
  • Solution to corrupt attachments problem with RT3 and perl, Nicholas Adrian Vinen <=