procmail
[Top] [All Lists]

Re: Filter for Japanese double-byte characters?

1999-10-09 13:36:21
On Sat, 9 Oct 1999, era eriksson wrote:

On Sat, 9 Oct 1999 05:30:11 -0700 (PDT), Dick Moores 
<rdm(_at_)netcom(_dot_)com>
wrote:
 > ^At^AI^Ie^I^C^AA^N1/4^AI^AR^A}^A^Ah and so on

In the Latin-1 character set, uppercase A with an acute accent is
character number 193, so you got that right, but it's not "contol A
acute", it's just "A acute" and the character you are seeing in front
of it is probably a regular caret character (byte value 94).

Here's just a partial response.  The rest of your post will require much
study :-) .

Of course I had tried counting carets before on this stuff, and always
got the maximum, very large number.  Your suggestion to use the
parentheses around "\^" was very helpful.  I ran a post with many lines
of the J code in question through Procmail using 

   * -30^0
   * 1^1 (\^)

and found that there were only 2 carets in the whole thing.  Here's the
log entry:

   procmail: Score:     -30     -30 ""
   procmail: Score:       2     -28 "(\^)"

So if all but two of those "^" things I see aren't carets, what are
they?  In particular, what is that "^" in front of that "A acute"
character?  (There are always many "A acute"s in this stuff.)  I guess
I'll have to dig into the programs you mentioned, viz(1), cat -A, and
od(1).

Thanks,

Dick Moores  rdm(_at_)netcom(_dot_)com