mhonarc-dev

[Bug #2971] spammode option interferes with iso-2022-jp

2003-04-05 16:57:46

=================== BUG #2971: LATEST MODIFICATIONS ==================
http://savannah.nongnu.org/bugs/?func=detailbug&bug_id=2971&group_id=1968

Changes by: Earl Hood <earl(_at_)earlhood(_dot_)com>
Date: Sat 04/05/2003 at 17:57 (US/Central)

            What     | Removed                   | Added
---------------------------------------------------------------------------
              Status | Open                      | Closed
       Fixed Release |                           | 2.6.3




=================== BUG #2971: FULL BUG SNAPSHOT ===================


Submitted by: kkawa                   Project: MHonArc                      
Submitted on: Thu 03/27/2003 at 22:33
Category:  Character Sets             Severity:  1 - Ordinary               
Bug Group:  Undesired Behavior        Resolution:  Fixed                    
Assigned to:  None                    Status:  Closed                       
Platform Version:  All                Perl Version:  5.6.0                  
Component Version:  2.6.2             Fixed Release:  2.6.3                 

Summary:  spammode option interferes with iso-2022-jp

Original Submission:  The iso-2022-jp encoding is the most commonly used 
encoding in Japan.
The problem is that the encoded text often contains the '@' mark,
which apparently triggers the spam mode filter and as a result,
surrounding characters will be incorrectly replaced by 'x'.


A "proper" fix would probably require the entire MHonARC to
work on Unicode --- the entire processing should operate on
Unicode, rather than raw inputs. But this change might be too
big.

The easy change is perhaps for the spammode to be made smarter.
The typical iso-2022-jp sequnce is something like "B(_at_)n8}9L2p",
so if you require a dot on the right hand side of the address,
this problem can be avoided.


Follow-up Comments
*******************

-------------------------------------------------------
Date: Mon 03/31/2003 at 11:54       By: ehood
Modified ADDRESSMODIFYCODE value when SPAMMODE specified
to require a dot in the domain portion of the regex:
s|([\!\%\w\.\-+=/]+@)([\w\-]+\.[\w\.\-]+)|$1.('x' x length($2))|ge

This should hopefully be a decent work-around for iso-2022-jp
data.

Fixed checked into CVS.  Please verify.

-------------------------------------------------------
Date: Fri 03/28/2003 at 23:26       By: ehood
[Limitation]
Currently, there are work-arounds to this.  SPAMMODE
is just a convienence to setting other resources.  I.e.
ADDRESSMODIFYCODE can be set specifically to work-around
iso-2022-jp encoding.

As for Unicode, the TEXTENCODE resource allows a user
to preconvert all data to UTF-8.  However, as the original
report suggests, Unicode is not used an intermediate
format for processing with the ability to then re-encode
to another format when writing pages.  I'd prefer to avoid
round-tripping at this time, but it something to consider
for the future.

Technically, I would consider this bug a limitation with
SPAMMODE, hence I dropped the severity since Japanese users
can explicitly set related resources to work-around the
problem.


CC list is empty


No files currently attached


For detailed info, follow this link:
http://savannah.nongnu.org/bugs/?func=detailbug&bug_id=2971&group_id=1968

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV

<Prev in Thread] Current Thread [Next in Thread>
  • [Bug #2971] spammode option interferes with iso-2022-jp, nobody <=