procmail
[Top] [All Lists]

Re: Idiosyncratic Punctuation in Mail

1997-08-17 11:49:00
On Sun, 17 Aug 1997 11:19:11 -0400 (EDT), Paul O Bartlett
<pobart(_at_)access(_dot_)digex(_dot_)net> wrote:
    I receive, as undoubtedly others do, mailing list postings with
idiosyncratic punctuation, usually where context would tend to indicate
that some sort of quotation mark (single or double, forward or
reverse?) is supposed to be.  I see a lot of these as the hex vaules
0x5ED4, whch come out in ISO-8859-1/Latin-1 (with which I usually read
mail for a reason) as a caret followed by a circumflexed uppercase O
(letter).
<...>
    Has anybody come up with a recipe to filter this garbage into
something standard, like ISO-8859-1?

The hard part is in coming up with something that will work for all
cases. If you know a particular correspondent or group of
correspondents are sending Latin-1 which is not really Latin-1, you
can filter their messages. If you receive mail from whatever source
that is incorrectly tagged (perhaps not in MIME at all, in spite of
the contents being 8-bit, or with an "unknown" charset), you have no
way to know in what character set it +really+ is. It could be any Mac
script, any DOS code page, or a gadzillion other formats. (In
practice, they're often set up to claim they're Latin-1, only making
matters worse.)

Of course, if you "know" what characters are supposed to be there,
fire away.

:0bfw
* From: (bozo1|bozo2|bozo3)
* B ?? [^ -~]
| sed -e 's/funnystring/correct string/g' # -e 's/more strings/.../g' ...

If the characters to replace are a context-free one-to-one mapping, tr
is quicker than sed and, in many incarnations, allows you to use
backslash-octal instead of "live" 8-bit characters.

/* era */

I am unaware of any coding scheme which would override the caret
character. The D4 is a curly quote on the Mac, I think. (Windows curly
quotes come out in the unprintable 128-160 range of Latin-1, if memory
serves.)

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>

<Prev in Thread] Current Thread [Next in Thread>