procmail
[Top] [All Lists]

Re: furrin character sets

2003-02-22 13:41:02
On Sat, Feb 22, 2003 at 12:02:22AM -0800, Professional Software
Engineering wrote:

Pursuant to that ugly exchange here last week wherein someone dropped 
in and advertised some other list, I went and checked to see whether  
there was anything worth noting over there.  I can't say that it      
appears that there's much there (and certainly no existing rcfile     
base to refer to), though anyone interested should really check for   
themselves rather than taking my word for it>                         

I looked briefly and had a similar tentative conclusion.


'3D' appears prefixed to several of the character set declarations in other 
conditions as well - that's the MIME quoted-printable hex character code 
for '=' -- I've only seen that within HTML within a MIME quoted-printable 
block, where the MIME headers themselves should properly have a character 
encoding identifier, thus an HTML META tag within the MIME quoted-printable 
body isn't generally very significant (at least, not in my experience 

I've seen it whenever I try to read Tony's mail to this list using mailx.
I had to cancel a couple of half-formed answers because I found I'd
been flummoxed by the 3Ds.  That was one of several unrelated but
cotempraneous reasons why I have within the past two weeks switched to
mutt.  (Tony, I still wish you'd send plaintext to this list, though.)

This is free for use and discussion here on the official procmail
list.  Note that I define quite a few character set encodings (and
have attempted to be rather complete in doing so, perhaps retentively
so) [. . . .]  If anyone has suggestions on better divisions,
I'm all ears, but as-is, it works well enough for me while still
affording some meaningful groupings.

Interesting, I must say.  Way, way beyond what I've ever needed myself,
fwiw.  My foreign-language checks rely on the simple class range that
David Tamkin has reposted various times since quite a while ago on
this list.  Here's what I use *for headers only* (so it's apples-to-
oranges, I admit) to blow away foreing stuff except for German, which
I do wish to get:

 WS           = $TAB$SPACE          # whitespace (N.B.: order can matter)
 GERMAN       = äÄöÖüÜß             # some diacriticals I want to accept
 LOWBIT       = $WS-~               # range helps delimit Western text
 HIBIT        = [^$LOWBIT]          # complement delimits non-Western
 MOD_HIBIT    = [^$LOWBIT$GERMAN]   # range for modified non-Western text

 :0  # 021230 () look for non-Western chars in asserted From: or Subject:
  * $  $GO^0    SUBJECT ??  $MOD_HIBIT
  * $  $GO^0    FROM    ??  $MOD_HIBIT
  { RX = "${RX:+$RX, }UBE.SJ|FR.HI-BIT" }

Catches only about 3% of my spam based on headers-only checks; but
it's a consistent recipe that needs no upkeep.

-- 
dman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


<Prev in Thread] Current Thread [Next in Thread>