Conlang: Re: CHAT: Summary, web based mailinglist archives (Boudewijn Rempt, Oct 25 '99, 11:37)

Re: CHAT: Summary, web based mailinglist archives

From:	Boudewijn Rempt <bsarempt@...>
Date:	Monday, October 25, 1999, 11:37

From:

Boudewijn Rempt <bsarempt@...>

Date:

Monday, October 25, 1999, 11:37

On Mon, 25 Oct 1999, Paul Bennett wrote:

> tal writes: > >>>>>> > > Having author, date, threading-information and subject in a database, > and grepping through the raw text would be a (quick and) workable > solution. As for sizes, I've guesstimated that the list nets in at > about 130 megs (unpacked) so far, growing with about 30 megs a year... > > <<<<<< > > If you're looking for "quick & dirty" interim fixes, I'm going to start a holy > war by suggesting that a set of DBMs (with indices) plonked into some Perl > hashes should do the trick, and Perls RE engine outperforms both grep and awk > considerably. You could then squirt the text of the files into html as you go. > > My appologies to any Python fans (You Know Who You Are), it's just that I'm > learning Perl at the moment and have never studied Python at length. I'd agree > that a free-SQL backend might be better for a post-alpha project, however. >

I've never tried Perl (beyond trying to read the documentation), but I fancy that, say, 200 mb of text is a bit too much to regexp easily, even for Perl - storing the header info and perhaps some keywords in a database (doesn't matter much whether you pick a real database or use dbm's - it's both equally easy), and indexing the text files themselves with glimpse should be much more workeable. A few man-days from specs to prototype, I guess.

> Just as soon as I can pick a linux and jump in with both feet, I'll start > experimenting, if you like... >

I'd advise you to stay clear of Suse - it tries to be clever, instead of defering to the sysadmin. When the next Slackware comes out, I'll be returning to it. Boudewijn Rempt | http://denden.conlang.org/~bsarempt