TECH: Re: Summary, web based mailinglist archives
From: | Paul Bennett <paul.bennett@...> |
Date: | Monday, October 25, 1999, 12:05 |
Boudewijn writes:
>>>>>>
On Mon, 25 Oct 1999, Paul Bennett wrote:
> If you're looking for "quick & dirty" interim fixes, I'm going to start a holy
> war by suggesting that a set of DBMs (with indices) plonked into some Perl
> hashes should do the trick, and Perls RE engine outperforms both grep and awk
> considerably. You could then squirt the text of the files into html as you
go.
>
> My appologies to any Python fans (You Know Who You Are), it's just that I'm
> learning Perl at the moment and have never studied Python at length. I'd
agree
> that a free-SQL backend might be better for a post-alpha project, however.
>
I've never tried Perl (beyond trying to read the documentation), but I
fancy that, say, 200 mb of text is a bit too much to regexp easily, even
for Perl - storing the header info and perhaps some keywords in a
database (doesn't matter much whether you pick a real database or use
dbm's - it's both equally easy), and indexing the text files themselves
with glimpse should be much more workeable. A few man-days from specs
to prototype, I guess.
<<<<<<
Yes. I was never suggesting REing a "monolithic bag o' bits" (TM). What I feel
it needs is a fairly compute- and space-intensive phase when a new message (set)
is added. Indexes of indexes and all that funky stuff seems to be the order of
the day, as well as a cute little trick that I call (after the guy who explained
it to me) "Julian" Indexes (more on this is available for the terminally
curious, it's not super techie, but if you've never come across it, it blows
your mind at first). I'm a great fan of sacrificing entry speed (and space) to
increase retreival speed. I'd happily have a file that took hours to add a new
entry and days to rebuild indexes, as long as the adding and reindexing was done
in the background. It's a subject I need to read up more on, if & when I get
the opportunity.
You could probably use BSPs to make/control the "top level" indexes, as well,
since we're talking about a fairly analog 2-dimensional domain, though my mind
boggles at exactly how this could be done.
Oh, and any of the O'Reilly books about Perl will make the learning process far
smoother. I just got the O'Reilly on "Programming with Qt", (it's all in C++
but that's no great headache) so I might be able to make more headway with
understanding Kura, as well...
<Beavis>O'Reilly kicks ass!</Beavis>
<Butthead>Huhuh, yeah, cool</Butthead>
(The only other reason I suggested Perl is that any fule with a sound knowledge
of grep, sed and awk can jump right in and boogie with minimal environmental
adjustment, <G> making a collaborative effort much easier.)
>>>>>>
> Just as soon as I can pick a linux and jump in with both feet, I'll start
> experimenting, if you like...
>
I'd advise you to stay clear of Suse - it tries to be clever, instead of
defering to the sysadmin. When the next Slackware comes out, I'll be
returning to it.
<<<<<<
Slackware == debian? I'd heard that debian had one of the most inhuman(e)
installers in the linux universe, maybe I've misread something.
Pb
*************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity
to whom they are addressed.
If you have received this email in error please notify the
sender. This footnote also confirms that this email message
has been scanned for the presence of computer viruses.
*************************************************************