On Sat, 9 Oct 1999 05:30:11 -0700 (PDT), Dick Moores <rdm(_at_)netcom(_dot_)com>
wrote:
^At^AI^Ie^I^C^AA^N1/4^AI^AR^A}^A^Ah and so on
(the "1/4" is a single character)
I think I could find this stuff by searching on [Ctrl] plus, say
capital A with a circumflex accent (I believe this is character code
194, right?), or capital A with an acute accent (code 193?). How can I
If you can show those characters in their "raw" version, that would
help.
Ctrl is just a "classification" thing, the character codes zero throuh
31 decimal are "control" characters because that's what they are used
for in the original ASCII encoding (on a teletype, ctrl-s would stop
the terminal from printing out output temporarily, ctrl-q would
resume, for example -- that sort of "control"). So when you type
ctrl-s you transmit a byte whose value is 19 and when you type A you
transmit a byte whose value is 65 and so forth, and the "controlness"
of the first is simply because its number is in the "control" range.
With that explanation, "control-accented character" doesn't really
make sense at all (although in a wicked sort of way, it makes sense
for characters in the range 128-159 in various ISO-8859 character sets
such as Latin-1 aka ISO-8859-1).
In the Latin-1 character set, uppercase A with an acute accent is
character number 193, so you got that right, but it's not "contol A
acute", it's just "A acute" and the character you are seeing in front
of it is probably a regular caret character (byte value 94).
There are programs such as viz(1) or cat -A or od(1) which let you
view the exact unambiguous byte values. Getting an od dump of the
string above would probably help a little bit.
Just for example, here are the first few bytes of a binary file in od
-ch format:
$ od -ch /vmunix | head -2
0000000 203 001 \b \0 ¦ ö 215 7 200 × Q \0 \0 \0 \0 \0
0183 0008 f6a6 378d d780 0051 0000 0000
The first line is a character rendering where non-ASCII characters are
shown as control codes (with backslashes) or in octal notation, and
the second row is hexadecimal. It is also possible to get decimal and
a number of other formats out of od -- see the manual page for details.
* -1^0
* 1^1 [Ctrl] plus A with a circumflex accent
* -5^0
* 1^1 [Ctrl]
Because the caret, like the dollar sign, has a special meaning to
Procmail, you have to backslash-escape it. Things get more complicated
still because Procmail's parser is sort of broken when the first thing
in a condition is a backslash, so we avoid putting a backslash as the
first character in the regular expression by putting it inside a set
of parentheses (which don't have any other meaning; parentheses are
used for grouping generally but it doesn't usually make sense to
"group" a single character):
* -1^0
* 1^1 (\^)A
* -5^0
* 1^1 (\^)
I have also replaced the cirumflex a with a regular one here.
Hope this helps,
/* era */
--
Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition