mhonarc-users

Re: MHonArc and multi-byte characters in HTML

1998-04-28 04:43:21
Jason R Mastaler wrote:
&gt;&gt; &gt; ESC$B<i2,ESC(B ESC$BCNI'ESC(B / MORIOKA Tomohiko ...
                   ^
See the unescaped open bracket there?  I don't know enough about
encodings to say whether or not the bracket is specifically legal
there or not, but it doesn't look like legal HTML to me.  My
understanding is that the wilma_striphtml program requires legal HTML
for correct operation.

It is a complicated problem, and I cannot say what should be the right solution.
However, I can show you a dirty workaround.
  The following patch will set MSB of Japanese characters, and strip
ESC$B & ESC(B.  This is called EUC-JP character encoding.  In this way,
Japanese characters will not affect wilma.

diff -urN MHonArc2.2.0/lib/mhtxtplain.pl MHonArc2.2.0-jp0/lib/mhtxtplain.pl
--- MHonArc2.2.0/lib/mhtxtplain.pl      Wed Mar  4 09:12:54 1998
+++ MHonArc2.2.0-euc-jp/lib/mhtxtplain.pl  Fri Mar 20 21:19:12 1998
@@ -174,7 +174,7 @@
 sub jp2022 {
     local(*body) = shift;
     local(@lines) = split(/\r?\n/,$body);
-    local($ret, $ascii_text);
+    local($ret, $ascii_text, $jp_text);
     local($_);

     $ret = "<PRE>\n";
@@ -205,7 +205,7 @@
        # Process Each Segment
        while(1) {
            if (s/^(\033\([BJ])//) { # Single Byte Segment
-               $ret .= $1;
+               # $ret .= $1;
                while(1) {
                    if (s/^([^\033]+)//) {      # ASCII plain text
                        $ascii_text = $1;
@@ -228,10 +228,12 @@
                    }
                }
            } elsif (s/^(\033\$[\(_at_)AB]|\033\$\([CD])//) { # Double Byte 
Segment
-               $ret .= $1;
+               # $ret .= $1;
                while (1) {
                    if (s/^([!-~][!-~]+)//) { # Double Char plain text
-                       $ret .= $1;
+                       $jp_text = $1;
+                       $jp_text =~ tr/\041-\176/\241-\376/;
+                       $ret .= $jp_text;
                    } elsif (s/(\033\.[A-F])//) { # G2 Designate Sequence
                        $ret .= $1;
                    } elsif (s/(\033N[ -^?])//) { # Single Shift Sequence
End of patch.
-- 
Koichi Nakatani
Graphic Arts Center, Konica Corporation

<Prev in Thread] Current Thread [Next in Thread>