Re: FYI: XML-based mhonarc archive

2000-07-11 16:41:00
On Tue, 11 Jul 2000 12:49:05 -0700 
Earl Hood <ehood(_at_)hydra(_dot_)acs(_dot_)uci(_dot_)edu> wrote:

On July 11, 2000 at 08:53, J C Lawrence wrote:
I have the same problem with MHonArc throwing out PHP variable
assigment files: Occassionally people send messages containing
control characters (^X, ^Z, and ^_ seem to be popular) amd PHP
burps on them (message in syslog).Currently I'm using a
post-process sed script to whack the unwanted characters.

I guess PHP is not liberal about what is allowed in strings.

Actually its a here file problem.  I filed a bug and pushed it to a,
"Well, yeah, we really should provide a couple modes for here files
ala perl/bash."  They wer going to just take it as a stated
limitation for PHP here files.  Its really not hurting me at the
moment; the archived message in question displays less well than I'd
like (lines tend to get cropped), but it does display, and nothing

A more efficient solution is to modify the text filters to strip
out, or replace, non-printable characters.  

Aye, that does sound the correct long term approach.

Probably make it optional since it will not be safe for cases
where control codes are part of the character set (like Japanese
messages).  Hmmmm, it may have to be associated with character

This exposes a more general problem.  I have a fairly international
subscriber base, many of whom post in non ISO-8859 codepages, and
statistically are also the most likely to post messages containing
spurious control characters (that are not part of their base
character set, eg ^X).  Lotsa codepages there.

For individual cases, you could add a line to the text filter(s)
to remove non-printable characters if it is safe to do so for
messages you archive.

I'll have to look into this (not a perl person at all).

J C Lawrence                                 Home: claw(_at_)kanga(_dot_)nu
---------(*)                               Other: coder(_at_)kanga(_dot_)nu
http://www.kanga/nu/~claw/        Keys etc: finger claw(_at_)kanga(_dot_)nu
--=| A man is as sane as he is dangerous to his environment |=--

<Prev in Thread] Current Thread [Next in Thread>