procmail
[Top] [All Lists]

Re: hex value in condition

1997-08-07 23:57:00
On Fri, 08 Aug 1997 07:35:31 +0900, Mitsuru Furukawa <furu(_at_)009(_dot_)com>
wrote:
 era> HENSHIN=`echo "yta/rg==" | mmencode -u` # probably not more efficient?
 era> ASC=`echo "AC1/" | mmencode -u`         # might use Perl after all
<...>
 era> The mmencode trick is perhaps not more efficient than Perl, but I
 era> wanted to come up with an alternative. (I tried uudecode first but it
 era> became very unwieldy :-)
To be candid, it was little bit beyond my comprehension;-(
Especially "yta/rg==" and "AC1/" parts were puzzling to me.

They are the mmencode equivalents of the strings we want to match. Of
course, they're not exactly human-readable, but comments could fix
that (since you probably wouldn't want to change them often). My idea
was simply to provide something somewhat less expensive than Perl to
accomplish the same thing. But if you have, for instance, printf(1) on
your systems, there's definitely no need to go via Perl:

    HENSHIN=`printf "\xca\xd6\xbf\xae"`
    ASC=`printf "\x00-\xff"`

Any 7-bit-to-8-bit encoding program is fine; you can probably find or
write a program that does hex to binary directly, to minimize
overhead. (printf has a slew of library routines you don't need for
this simple task.)

But if you are concerned with mixing up of 2-byte Japanese chars and
1-byte ASCII chars in matching operation, then it is not necessary.
EUC Japanese chars could co-exist with ASCII chars "safely"
and any byte of EUC char would not be mis-interpreted as ASCII char.

What I was trying to say was that if your character glyphs are two
bytes wide, and you're looking for the two-byte sequence CD, then ABCD
should match (that's the glyph "AB" followed by the glyph "CD", while
ACDB ("AC" and "DB") should not. Therefore, you need to look for
byte-pairs which start on even-byte boundaries only. (I had guessed
you were using a four-byte encoding, but I suppose "32768 character
glyphs ought to be enough for anyone" :-)

Actually, I want to use this $HENSHIN matching to delete
quoted portion in reply message from cc:Mail such as
from
____________________________ HeNsHiN ________________________________
to the end of mail.
Does
  sed -e '/^_* $HENSHIN _*$/,$d'
work?

You should double-quote the expression, otherwise $HENSHIN will be
passed to sed literally. (Of course, the other dollar signs need to be
single-quoted or backslased.)
  Also, I would have my doubts about stock sed being able to cope with
8-bit characters. (If yours does, fine. Modern sed:s certainly should,
but don't be surprised if the sed you have isn't "modern".)

So far, I have received only Japanese cc:Mail message with such portion.
To prepare for domestic cc:Mail message with quotation,
could anyone tell me the corresponding word in domestic cc:Mail? 
Is it "Reply"? "REPLY"? "Quote"? Or?

I tried an Alta Vista search but couldn't find anything quickly. If
you have other cc:Mail features you know are in messages it generated
(like X-Mailer: headers, funny Subject formatting, whatever), try
searching for that. (Looking for "Received: from cc:Mail" looked like
a good start but it needs to be narrowed down a lot. Perhaps throw in
"In-Reply-To:" and so forth.)

/* era */

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>

<Prev in Thread] Current Thread [Next in Thread>