mhonarc-users

Re: Message body format

1997-04-11 19:09:18
I could use some advice from the Perl-savvy (which I ain't!) on this
list regarding the format of message bodies.  Our page format includes a
"left margin" stripe of color down the side of the page, so we don't
have the usual screen width for text display.  Thus the message text
bleeds over the right side of the page unless the lines happen to be
very short.  I downloaded a program called txt2html to use as a filter. 
I've done the following, hoping to force the message body to be
HTML-ized:

<MIMEFilters>
...
message/partial:m2h_text_plain'filter:txt2html.pl
text/*:m2h_text_plain'filter:txt2html.pl
...
text/plain:m2h_text_plain'filter:txt2html.pl
text/richtext:m2h_text_plain'filter:txt2html.pl
...
</MIMEFilters>

This will not work.  MHonArc will look for a routine called
"m2h_text_plain'filter" in txt2html.pl, and there is none.  In
order to use text2html.pl, you will ne to write a MHonArc filter
wrapper to interface with txt2html.pl.  The information on how
to write MHonarc filters and how to hook them into the program
is described in the MIMEFILTERS resource page of the documentation.

Since you stated you were not Perl savvy, you may not have the time
to learn Perl to do what you need.  Therefore, I have included below a
version of the mhtxtplain.pl library that may be able to do something
to suit your needs (this library, or reasonable facsimile, will be
included in the next release of MHonArc).  Read the comments
in the code to see the new options available to the filter.  Use
the MIMEARGS resource to define the options you desire.

##---------------------------------------------------------------------------##
##  File:
##      @(#) mhtxtplain.pl 1.7 97/04/11 19:57:26 @(#)
##  Author:
##      Earl Hood       ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu
##  Description:
##      Library defines routine to filter text/plain body parts to HTML
##      for MHonArc.
##      Filter routine can be registered with the following:
##              <MIMEFILTERS>
##              text/plain:m2h_text_plain'filter:mhtxtplain.pl
##              </MIMEFILTERS>
##---------------------------------------------------------------------------##
##    MHonArc -- Internet mail-to-HTML converter
##    Copyright (C) 1995-1997   Earl Hood, 
ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu
##
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License
##    along with this program; if not, write to the Free Software
##    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
##---------------------------------------------------------------------------##


package m2h_text_plain;

$Url            = '(http://|https://|ftp://|afs://|wais://|telnet://' .
                   '|gopher://|news:|nntp:|mid:|cid:|mailto:|prospero:)';
$UrlExp         = $Url . q%[^\s\(\)\|<>"']*[^\.;,"'\|\[\]\(\)\s<>]%;
$HUrlExp        = $Url . q%[^\s\(\)\|<>"'\&]*[^\.;,"'\|\[\]\(\)\s<>\&]%;
$QuoteChars     = '>|[\|\]+:]';
$HQuoteChars    = '&gt;|[\|\]+:]';

##---------------------------------------------------------------------------##
##      Text/plain filter for mhonarc.  The following filter arguments
##      are recognized ($args):
##
##          nourl               -- Do hyperlink URLs
##          quote               -- Italicize quoted message text
##          nonfixed            -- Use normal typeface
##          keepspace           -- Preserve whitespace if nonfixed
##          maxwidth=#          -- Set the maximum width of lines.  Lines
##                                 exceeding the maxwidth will be broken
##                                 up across multiple lines.
##          asis=set1:set2:...  -- Colon separated lists of charsets
##                                 to leave as-is.  Only HTML special
##                                 characters will be converted into
##                                 entities.
##
##      All arguments should be separated by at least one space
##
sub filter {
    local($header, *fields, *data, $isdecode, $args) = @_;
    local($ctype, $charset, $nourl, $doquote, $igncharset, $nonfixed,
          $keepspace, $maxwidth);
    local(%asis) = ();

    $nourl      = ($'NOURL || ($args =~ /nourl/i));
    $doquote    = ($args =~ /quote/i);
    $nonfixed   = ($args =~ /nonfixed/i);
    $keepspace  = ($args =~ /keepspace/i);
    if ($args =~ /maxwidth=(\d+)/) {
        $maxwidth = $1;
    } else {
        $maxwidth = 0;
    }

    ## Grab charset parameter (if defined)
    $ctype = $fields{'content-type'};
    ($charset) = $ctype =~ /charset=(\S+)/;
    $charset =~ s/['"]//g;  $charset =~ tr/A-Z/a-z/;

    ## Check if certain charsets should be left alone
    if ($args =~ /asis=(\S+)/i) {
        local(@a) = split(':', $1);
        foreach (@a) {
            tr/A-Z/a-z/;
            $asis{$_} = 1;
        }
    }

    ## Check MIMECharSetConverters if charset should be left alone
    if ($main'MIMECharSetConverters{$charset} eq "-decode-") {
        $asis{$charset} = 1;
    }

    ##  Check if max-width set
    if ($maxwidth) {
        $* = 1;
        $data =~ s/^(.*)$/&break_line($1, $maxwidth)/ge;
        $* = 0;
    }

    ## Convert data according to charset
    if (!$asis{$charset}) {
        ##      Japanese message
        if ($charset =~ /iso-2022-jp/i) {
            return (&jp2022(*data));

        ##      Latin 2-6, Greek, Hebrew, Arabic
        } elsif ($charset =~ /iso-8859-([2-9]|10)/i) {
            $data = &iso_8859'str2sgml($data, $charset);

        ##      ASCII, Latin 1, Other
        } else {
            &esc_chars_inplace(*data);
        }
    } else {
        &esc_chars_inplace(*data);
    }

    ##  Check for quoting
    if ($doquote) {
        $data =~ s(_at_)\n(${HQuoteChars})(.*)@\n$1<I>$2</I>@go;
    }

    ## Check if using nonfixed font
    if ($nonfixed) {
        $data =~ s/(\r?\n)/<br>$1/g;
        if ($keepspace) {
            $* = 1;
            $data =~ s/^(.*)$/&preserve_space($1)/ge;
            $* = 0;
        }
    } else {
        $data = "<PRE>\n" . $data . "</PRE>\n";
    }

    ## Convert URLs to hyperlinks
    $data =~ s@($HUrlExp)@<A HREF="$1">$1</A>@gio  unless $nourl;

    ($data);
}

##---------------------------------------------------------------------------##
##      Function to convert ISO-2022-JP data into HTML.  Function is based
##      on the following RFCs:
##
##      RFC-1468 I
##              J. Murai, M. Crispin, E. van der Poel, "Japanese Character
##              Encoding for Internet Messages", 06/04/1993. (Pages=6)
##
##      RFC-1554  I
##              M. Ohta, K. Handa, "ISO-2022-JP-2: Multilingual Extension of  
##              ISO-2022-JP", 12/23/1993. (Pages=6)
##
##  Author of function:
##      NIIBE Yutaka    gniibe(_at_)mri(_dot_)co(_dot_)jp
##      (adapted for mhtxtplain.pl by Earl Hood 
<ehood(_at_)medusa(_dot_)acs(_dot_)uci(_dot_)edu>)
##
sub jp2022 {
    local(*body) = shift;
    local(@lines) = split(/\r?\n/,$body);
    local($ret, $ascii_text);

    $ret = "<PRE>\n";
    for ($i = 0; $i <= $#lines; $i++) {
        $_ = $lines[$i];

        # Process preceding ASCII text
        while(1) {
            if (/^[^\033]+/) {  # ASCII plain text
                $ascii_text = $&;
                $_ = $';

                # Replace meta characters in ASCII plain text
                $ascii_text =~ s%\&%\&amp;%g;
                $ascii_text =~ s%<%\&lt;%g;
                $ascii_text =~ s%>%\&gt;%g;
                ## Convert URLs to hyperlinks
                $ascii_text =~ s%($HUrlExp)%<A HREF="$1">$1</A>%gio
                    unless $'NOURL;

                $ret .= $ascii_text;
            } elsif (/\033\.[A-F]/) { # G2 Designate Sequence
                $_ = $';
                $ret .= $&;
            } elsif (/\033N[ -]/) { # Single Shift Sequence
                $_ = $';
                $ret .= $&;
            } else {
                last;
            }
        }

        # Process Each Segment
        while(1) {
            if (/^\033\([BJ]/) { # Single Byte Segment
                $_ = $';
                $ret .= $&;
                while(1) {
                    if (/^[^\033]+/) {  # ASCII plain text
                        $ascii_text = $&;
                        $_ = $';

                        # Replace meta characters in ASCII plain text
                        $ascii_text =~ s%\&%\&amp;%g;
                        $ascii_text =~ s%<%\&lt;%g;
                        $ascii_text =~ s%>%\&gt;%g;
                        ## Convert URLs to hyperlinks
                        $ascii_text =~ s%($HUrlExp)%<A HREF="$1">$1</A>%gio
                            unless $'NOURL;

                        $ret .= $ascii_text;
                    } elsif (/\033\.[A-F]/) { # G2 Designate Sequence
                        $_ = $';
                        $ret .= $&;
                    } elsif (/\033N[ -]/) { # Single Shift Sequence
                        $_ = $';
                        $ret .= $&;
                    } else {
                        last;
                    }
                }
            } elsif (/^\033\$[\(_at_)AB]|\033\$\([CD]/) { # Double Byte Segment
                $_ = $';
                $ret .= $&;
                while(1) {
                    if (/^([!-~][!-~])+/) { # Double Char plain text
                        $_ = $';
                        $ret .= $&;
                    } elsif (/\033\.[A-F]/) { # G2 Designate Sequence
                        $_ = $';
                        $ret .= $&;
                    } elsif (/\033N[ -]/) { # Single Shift Sequence
                        $_ = $';
                        $ret .= $&;
                    } else {
                        last;
                    }
                }
            } else {
                # Something wrong in text
                $ret .= $_;
                last;
            }
        }

        $ret .= "\n";
    }

    $ret .= "</PRE>\n";

    ($ret);
}

##---------------------------------------------------------------------------##

sub esc_chars_inplace {
    local(*foo) = shift;
    $foo =~ s(_at_)\&@\&amp;@g;
    $foo =~ s@<@\&lt;@g;
    $foo =~ s@>@\&gt;@g;
    1;
}

##---------------------------------------------------------------------------##

sub preserve_space {
    local($str) = shift;

    1 while $str =~ s/\t+/'&nbsp;' x (length($&) * 8 - length($`) % 8)/e;
    # $str =~ s/ {2,}/'&nbsp;' x length($&)/ge;
    $str =~ s/ /\&nbsp;/g;
    $str;
}

##---------------------------------------------------------------------------##

sub break_line {
    local($str) = shift;
    local($width) = shift;
    local($q, $new) = ('', '');
    local($try, $trywidth);

    ## Translate tabs to spaces
    1 while $str =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;

    ## Do nothing if str <= width
    return $str  if length($str) <= $width;

    ## See if str begins with a quote char
    if ($str =~ s/^($QuoteChars)//) {
        $q = $1;
        --$width;
    }

    ## Create new string by breaking up str
    while ($str) {

        # handle case where no-whitespace line larger than width
        if (($str =~ /^\S+/) && (length($&) >= $width)) {
            $new .= $q . $&;
            $str = $';
            next;
        }

        $try = '';
        $trywidth = $width;
        $try = substr($str, 0, $trywidth);

        if ($try =~ /\S+$/) {
            $trywidth -= length($&);
            $new .= $q . substr($str, 0, $trywidth);
        } else {
            $new .= $q . $try;
        }
        substr($str, 0, $trywidth) = '';

    } continue {
        $new .= "\n"  if $str;
    }
    $new;
}

##---------------------------------------------------------------------------##
1;
        --ewh
<Prev in Thread] Current Thread [Next in Thread>