mharc-users

Weird header format - and how to get .catch messages newly scanned and archived?

2005-05-06 01:27:06
Hello,

I have archived a bunch of messages from a Yahoo groups account with
the yahoo2maildir.pl script I found on the web. It grabs the messages
from the yahoo archive websites, including the header, and stores them
in separate files, one message per file. Another script can generate
an mbox out of the messages.

The problem is that yahoo obfuscates parts of the headers. At least
that's what I think the problem is. The real problem is that a lot of
the messages don't get archived properly: They get sorted in the
.catch folder, and there, they don't even have subjects. One example
of such a message header:

---snip---

From - Thu Jun 24 16:21:47 2004
Return-Path: <2chucky(_at_)w(_dot_)(_dot_)(_dot_)>
X-Sender: 2chucky(_at_)w(_dot_)(_dot_)(_dot_)
X-Apparently-To: kirchenmusik(_at_)y(_dot_)(_dot_)(_dot_)
Received: (qmail 71160 invoked from network); 24 Jun 2004
23:21:47 -0000
Received: from unknown (66.218.66.217)
by m23.grp.scd.yahoo.com with QMQP; 24 Jun 2004 23:21:47 -0000
Received: from unknown (HELO smtp07.web.de) (217.72.192.225)
by mta2.grp.scd.yahoo.com with SMTP; 24 Jun 2004 23:21:47 -0000
Received: from [80.140.78.73] (helo=wilson)
by smtp07.web.de with asmtp (TLSv1:RC4-MD5:128)
(WEB.DE 4.101 #26)
id 1BddXN-0007OX-00
for kirchenmusik(_at_)y(_dot_)(_dot_)(_dot_); Fri, 25 Jun 2004 01:21:45 +0200
To: kirchenmusik(_at_)y(_dot_)(_dot_)(_dot_)
Date: Fri, 25 Jun 2004 01:21:14 +0200
User-Agent: KMail/1.6.1
References: <003401c45a39$c0f90f00$e75b06d5(_at_)server>
<200406250109(_dot_)40105(_dot_)thomasmohr(_at_)a(_dot_)(_dot_)(_dot_)>
In-Reply-To: 
<200406250109(_dot_)40105(_dot_)thomasmohr(_at_)a(_dot_)(_dot_)(_dot_)>
MIME-Version: 1.0
Content-Disposition: inline
X-UID: 1159
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Message-Id: <200406250121(_dot_)14894(_dot_)2chucky(_at_)w(_dot_)(_dot_)(_dot_)>
Sender: 2chucky(_at_)w(_dot_)(_dot_)(_dot_)
X-eGroups-Remote-IP: 217.72.192.225
From: =?iso-8859-1?q?J=F6rg_Gottschlich?= <2chucky(_at_)w(_dot_)(_dot_)(_dot_)>
Subject: Re: [kirchenmusik] Orgelvertretungen ohne Bezahlung
X-Yahoo-Group-Post: member; u=159471668

---snap---


Some older messages do, however, get properly archived, with subject,
in the correct folder, and everything, like this one:

---snip---

From - Tue May 29 07:47:56 2001
Return-Path: <ellen(_at_)s(_dot_)(_dot_)(_dot_)>
X-Sender: ellen(_at_)s(_dot_)(_dot_)(_dot_)
X-Apparently-To: kirchenmusik(_at_)yahoogroups(_dot_)com
Received: (EGP: mail-7_1_3); 29 May 2001 14:47:55 -0000
Received: (qmail 5083 invoked from network); 29 May 2001
14:47:52 -0000
Received: from unknown (10.1.10.26) by l10.egroups.com with QMQP; 29
May 2001 14:47:52 -0000
Received: from unknown (HELO smtp-outbound.bhp.t-online.de)
(195.145.119.39) by mta1 with SMTP; 29 May 2001 14:47:47 -0000
Received: from ylva.ada.t-online.de ([172.30.8.40]) by
smtp-outbound.bhp.t-online.de (Netscape Messaging Server 4.15) with
SMTP id GE3QFM00.QLN for <kirchenmusik(_at_)yahoogroups(_dot_)com>; Tue, 29 May
2001 16:47:46 +0200
To: <kirchenmusik(_at_)yahoogroups(_dot_)com>
Subject: AW: [kirchenmusik] Swingin' moutain ...
Date: Tue, 29 May 2001 16:46:02 +0200
Message-ID: <000001c0e84e$35bf5a80$cd3efea9(_at_)ellen>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.2627
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600
In-Reply-To: 
<ICEILDLLOCEJIMALFFHMEEHICHAA(_dot_)Christoph(_dot_)Spengler(_at_)t(_dot_)(_dot_)(_dot_)>
Received: from ellen ([217.3.204.56:1043]) by ylva.ada.t-online.de
(SmtpProxy); Tue, 29 May 2001 16:47:46 +0200 (MET DST)
From: "Ellen Schwarz-Schertler" <ellen(_at_)s(_dot_)(_dot_)(_dot_)>

---snap---



Maybe I have the wrong rule? :

Name: kirchenmusik
Description: Kirchenmusikarchiv
Address: kirchenmusik(_at_)yahoogroups(_dot_)de
Address: kirchenmusik(_at_)(_dot_)*


Does it have to do with the mangled Message-ID?

The yahoo2maildir.pl extracts every single message into a single file,
so I could, with sed, manipulate the corrupted parts mboxify them, and
copy them to .newmail so that mharc would sort them in correctly.

Any help would be greatly appreciated.

Uwe







---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHARC-USERS

<Prev in Thread] Current Thread [Next in Thread>