mhonarc-users

Can newsgroup articles file be processed by mhonarc?

2004-04-01 18:07:32
In my program I use Net::NNTP module to fetch messages form particular newsgroup and append them to regular text file (it seems to me that they are articles in source form) as I matter of fact I use:

...
$art = $nntp->article($msg_num);

if (defined($art)) {
        
        foreach my $l (@$art) {
                print OUT $l;   
        }
        print OUT "\n";

}
...

I attach example file. Can the file of that format be processed by mhonarc the 
way
mbox file is? I noticed when I split the file into single messages files they (thanks to -add .msg switch) seem to be added (one by one) to archive.

Regards
Darek
Path: 
atlantis.news.tpi.pl!news.tpi.pl!newsfeed.tpinternet.pl!wsisiz.edu.pl!nntp.idg.pl!news.glorb.com!logbridge.uoregon.edu!artemis.acsu.buffalo.edu!newsstand.cit.cornell.edu!not-for-mail
From: "A. Sinan Unur" <1usa(_at_)llenroc(_dot_)ude>
Newsgroups: comp.lang.perl.misc
Subject: Re: Array from a string.
Date: 31 Mar 2004 16:40:31 GMT
Organization: Cornell University
Lines: 31
Sender: asu1(_at_)cornell(_dot_)invalid (on 128.253.251.224)
Message-ID: <Xns94BD76C503D92asu1cornelledu(_at_)132(_dot_)236(_dot_)56(_dot_)8>
References: <c4eks5$9u5$1(_at_)newshost(_dot_)mot(_dot_)com> 
<Xns94BD62A5695F1asu1cornelledu(_at_)132(_dot_)236(_dot_)56(_dot_)8> 
<c4ep8f$bin$1(_at_)newshost(_dot_)mot(_dot_)com>
NNTP-Posting-Host: 128.253.251.224
X-Trace: news01.cit.cornell.edu 1080751231 3478 128.253.251.224 (31 Mar 2004 
16:40:31 GMT)
X-Complaints-To: usenet(_at_)news01(_dot_)cit(_dot_)cornell(_dot_)edu
NNTP-Posting-Date: 31 Mar 2004 16:40:31 GMT
User-Agent: Xnews/5.04.25
X-Face: 
#0:Oa+WV[,\dU+SJ\X%#!MhGkG;vsj^Tzl1KJHck]V;S8u}yvJ<rd?.0]p2-6jgTf.>p~GpGgD.mLo)IY,&yDRM1dV3z'Y'8D=+Y7k[|[~mGbV(<(8Im%IhZkC9.A.&]TGcwX9GKGgA,lqReCST$aDsGKy#zU~laO|oJiD$e"6&_tzrxT}K,X_e,FC&}P8J"x~ii,lr6)L}=tZI#cNU,7u]J"TLISliDF2pmIKR`ulX=X-sB2aM?f4wIG5Z_nXceH~5}E*t+vx!unlkVJ7]57x`%S1\gR{.1_^Gu2L'am[/=c]'7Hj1l^Yx!nCe40dFkW
Xref: atlantis.news.tpi.pl comp.lang.perl.misc:189342

"Richard S Beckett" 
<spikeywan(_at_)bigfoot(_dot_)com(_dot_)delete(_dot_)this(_dot_)bit> wrote in 
news:c4ep8f$bin$1(_at_)newshost(_dot_)mot(_dot_)com:

Is there an easy way to do this?

Yes there is. It is called checking the FAQ list before posting:

How do you know I didn't?

perldoc -q inside

Now, there's a word I would _never_ have associated with this problem,
thanks.

There are many ways of looking for what you need in the FAQ list. What I 
gave you is a short-cut that one figures out after finding the entry for 
the first time.

The first time I found that entry was by reading through perlfaq4:

DESCRIPTION
    This section of the FAQ answers questions related to manipulating
    numbers, dates, strings, arrays, hashes, and miscellaneous data
    issues.

Hmmmm .. You would have found the answer had you looked at the table of 
contents and then read perlfaq4.

-- 
A. Sinan Unur
1usa(_at_)llenroc(_dot_)ude (reverse each component for email address)

Path: 
atlantis.news.tpi.pl!news.tpi.pl!newsfeed.tpinternet.pl!wsisiz.edu.pl!newsfeed.gazeta.pl!opal.futuro.pl!news.task.gda.pl!newsfeed00.sul.t-online.de!t-online.de!diablo.theplanet.net!nntp.theplanet.net!inewsm1.nntp.theplanet.net!zen.net.uk!hamilton.zen.co.uk!193.60.199.26.MISMATCH!feed4.jnfs.ja.net!feed3.jnfs.ja.net!feed2.jnfs.ja.net!jnfs.ja.net!news.bham.ac.uk!not-for-mail
From: Brian McCauley <nobull(_at_)mail(_dot_)com>
Newsgroups: comp.lang.perl.misc
Subject: Re: multiple lines / success or failure?!
Date: 31 Mar 2004 17:50:30 +0100
Organization: Just me, doing my own thing
Lines: 15
Message-ID: <u91xn96jvt(_dot_)fsf(_at_)wcl-l(_dot_)bham(_dot_)ac(_dot_)uk>
References: <agpk60pbcbfp55llfu2pjkntt10pctfpui(_at_)4ax(_dot_)com> 
<c4dvm6$qqq$1(_at_)nets3(_dot_)rz(_dot_)RWTH-Aachen(_dot_)DE> 
<mvsl605pc2lg3eog9kvrco0cpthbi73caa(_at_)4ax(_dot_)com>
NNTP-Posting-Host: wcl-l.bham.ac.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sun3.bham.ac.uk 1080751639 15473 147.188.68.4 (31 Mar 2004 16:47:19 
GMT)
X-Complaints-To: usenet(_at_)sun3(_dot_)bham(_dot_)ac(_dot_)uk
NNTP-Posting-Date: Wed, 31 Mar 2004 16:47:19 +0000 (UTC)
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1
Xref: atlantis.news.tpi.pl comp.lang.perl.misc:189343

Geoff Cox <geoffacox(_at_)dontspamblueyonder(_dot_)co(_dot_)uk> writes:

I am getting a "can't find EOHTML string terminator anywhere before
EOF" message using above. Is there a typo?

Have you eleminiated the most likely cause that is explained when you
look that message up in the reference manual (perldiag)? 

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\

Path: 
atlantis.news.tpi.pl!news.tpi.pl!newsfeed.tpinternet.pl!newsfeed.news2me.com!canoe.uoregon.edu!hammer.uoregon.edu!logbridge.uoregon.edu!news.umass.edu!news-out.cwix.com!newsfeed.cwix.com!newsfeed2.sea.pnap.net!newsfeed.pnap.net!newsgate.mot.com!newshost.mot.com!not-for-mail
From: "Richard S Beckett" 
<spikeywan(_at_)bigfoot(_dot_)com(_dot_)delete(_dot_)this(_dot_)bit>
Newsgroups: comp.lang.perl.misc
Subject: Re: Array from a string.
Date: Wed, 31 Mar 2004 18:16:53 +0100
Organization: Motorola
Lines: 12
Message-ID: <c4euhp$dfo$1(_at_)newshost(_dot_)mot(_dot_)com>
References: <c4eks5$9u5$1(_at_)newshost(_dot_)mot(_dot_)com> 
<Xns94BD62A5695F1asu1cornelledu(_at_)132(_dot_)236(_dot_)56(_dot_)8> 
<c4ep8f$bin$1(_at_)newshost(_dot_)mot(_dot_)com> 
<Xns94BD76C503D92asu1cornelledu(_at_)132(_dot_)236(_dot_)56(_dot_)8>
NNTP-Posting-Host: zuk28-6171.ecid.cig.mot.com
X-Trace: newshost.mot.com 1080753529 13816 10.128.76.225 (31 Mar 2004 17:18:49 
GMT)
X-Complaints-To: motpost1(_at_)azmsg(_dot_)mot(_dot_)com
NNTP-Posting-Date: 31 Mar 2004 17:18:49 GMT
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
Xref: atlantis.news.tpi.pl comp.lang.perl.misc:189344

Hmmmm .. You would have found the answer had you looked at the table of
contents and then read perlfaq4.


OK, it's a fair cop! :-) I'll try harder next time.

Thanks for the help.
--
R.
GPLRank +79.699



Path: 
atlantis.news.tpi.pl!news.tpi.pl!newsfeed.tpinternet.pl!skynet.be!news.csl-gmbh.net!newsfeed.r-kom.de!news-nue1.dfn.de!news-han1.dfn.de!news.rz.tu-clausthal.de!not-for-mail
From: "Jan Biel" <jan(_dot_)biel(_at_)tu-clausthal(_dot_)de>
Newsgroups: comp.lang.perl.misc
Subject: [NEWBIE] newline question
Date: Wed, 31 Mar 2004 19:25:38 +0200
Organization: Clausthal University of Technology
Lines: 58
Message-ID: <c4eupc$mqe$1(_at_)ariadne(_dot_)rz(_dot_)tu-clausthal(_dot_)de>
NNTP-Posting-Host: boneman.heim7.tu-clausthal.de
Mime-Version: 1.0
Content-Type: text/plain;
        charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
X-Trace: ariadne.rz.tu-clausthal.de 1080753772 23374 139.174.247.15 (31 Mar 
2004 17:22:52 GMT)
X-Complaints-To: usenet(_at_)ariadne(_dot_)rz(_dot_)tu-clausthal(_dot_)de
NNTP-Posting-Date: Wed, 31 Mar 2004 17:22:52 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Xref: atlantis.news.tpi.pl comp.lang.perl.misc:189345

Hello!

From some tutorials on the web I managed to create a perl script which finds
and replaces certain occurences in text files via regular expressions.

Then something happened which I cannot really explain, so I hope you can
clarify it for me.

The original perl script looks like this:

-------------------------------
$filein = 'a.txt';
$fileout = 'b.txt';

open(INFO, $filein);
open(INFO2, ">$fileout");

@lines = <INFO>;

grep(s/\n//g,@lines);
grep(s/ab/found/g,@lines);

print INFO2 @lines;

close(INFO);
close(INFO2);
--------------------------------

where a.txt is a file containing:

--------------------------------
a
b
c
--------------------------------

The resulting b.txt contains:

--------------------------------
abc
--------------------------------

So the second regular expression is ignored.

But if I write two perl scripts where each executes only one of the regular
expressions it works with the result:

--------------------------------
foundc
--------------------------------

as expected.

What is the mystery here?

I hope this wasn't too confusing :)
Janbiel


Path: 
atlantis.news.tpi.pl!news.tpi.pl!newsfeed.tpinternet.pl!wsisiz.edu.pl!nntp.idg.pl!news.zanker.org!feeder.enertel.nl!nntpfeed-01.ops.asmr-01.energis-idc.net!newsfeed.kabelfoon.nl!nanites.nntp.kabelfoon.nl!not-for-mail
Date: Wed, 31 Mar 2004 11:34:52 -0600
From: John Bokma <postmaster(_at_)castleamber(_dot_)com>
Organization: Castle Amber - freelance software development
User-Agent: Mozilla Thunderbird 0.5 (Windows/20040207)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.perl.misc
Subject: Re: count files + dirs
References: <406ad87b(_at_)primark(_dot_)com> 
<20040331095631(_dot_)M19862(_at_)dishwasher(_dot_)cs(_dot_)rpi(_dot_)edu>
In-Reply-To: 
<20040331095631(_dot_)M19862(_at_)dishwasher(_dot_)cs(_dot_)rpi(_dot_)edu>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 21
Message-ID: <406b0193$0$24356$58c7af7e(_at_)news(_dot_)kabelfoon(_dot_)nl>
NNTP-Posting-Host: customer-XAL-18-140.megared.net.mx
X-Trace: 1080754579 nanites.nntp.kabelfoon.nl 24356 jbokma/200.66.18.140:64169
X-Complaints-To: abuse(_at_)kabelfoon(_dot_)nl
Xref: atlantis.news.tpi.pl comp.lang.perl.misc:189346

Paul Lalli wrote:

On Wed, 31 Mar 2004, Simon wrote:

$count=(_at_)files + 1;

Why are you doing this?  @files in scalar context gives the number of
elements in the array.  You should not be adding one to it.

[snip]

print "$count"-1;

what the heck is this??

Fix for the "Why are you doing this" :D

-- 
John                            personal page:  http://johnbokma.com/

Freelance Perl / Java developer available  -  http://castleamber.com/

<Prev in Thread] Current Thread [Next in Thread>