procmail
[Top] [All Lists]

Re: Filter for Japanese double-byte characters?

1999-10-09 05:35:02
I'm embarrassed that it took so long to figure out how to enter [ESC]$B.
Using vi, I knew I had to use Ctrl+V, but I didn't realize I actually
had to press the Esc key :-) .

I thank era eriksson, Satoru Manita, and Philip Guenther for their
kindness and patience.

   * ^[\$B    (entered with vi by Ctrl+V,Esc,\,$,B)

does a fine job of finding all posts containing Japanese, with one
exception.  Posts from people using the AOL 4.0 mailer seem to have
Japanese coded in a completely different way.  No Esc at all. What I
see is mostly pairs of characters separated by "^" -- I'm guessing this
is Ctrl.  The first character of each pair is a capital letter (very
often "A") with some diacritic, e.g, circumflex, acute, grave. The
second character can be anything, it seems. I can't show you the
diacritics, but a typical line is

^At^AI^Ie^I^C^AA^N1/4^AI^AR^A}^A^Ah and so on
(the "1/4" is a single character)

I think I could find this stuff by searching on [Ctrl] plus, say
capital A with a circumflex accent (I believe this is character code
194, right?), or capital A with an acute accent (code 193?). How can I
put these in a recipe?  Or could I use weighted scoring and look for
many [Ctrl]?  Is it possible to use vi to enter just a [Ctrl] in the
recipe, and then count them?

   * -1^0
   * 1^1 [Ctrl] plus A with a circumflex accent

   * -5^0
   * 1^1 [Ctrl]

are the kind of things I'm thinking of.

With Philip Guenther's help I've relearned some basic weighted scoring,
even though I didn't need it to find [ESC]$B..  Sorry for the dumb
questions.

Dick Moores  rdm(_at_)netcom(_dot_)com