Theiling Online    Sitemap    Conlang Mailing List HQ   

TECH: Re: Summary, web based mailinglist archives

From:Paul Bennett <paul.bennett@...>
Date:Monday, October 25, 1999, 12:05
Boudewijn writes:
On Mon, 25 Oct 1999, Paul Bennett wrote:
> If you're looking for "quick & dirty" interim fixes, I'm going to start a holy > war by suggesting that a set of DBMs (with indices) plonked into some Perl > hashes should do the trick, and Perls RE engine outperforms both grep and awk > considerably. You could then squirt the text of the files into html as you
> > My appologies to any Python fans (You Know Who You Are), it's just that I'm > learning Perl at the moment and have never studied Python at length. I'd
> that a free-SQL backend might be better for a post-alpha project, however. >
I've never tried Perl (beyond trying to read the documentation), but I fancy that, say, 200 mb of text is a bit too much to regexp easily, even for Perl - storing the header info and perhaps some keywords in a database (doesn't matter much whether you pick a real database or use dbm's - it's both equally easy), and indexing the text files themselves with glimpse should be much more workeable. A few man-days from specs to prototype, I guess. <<<<<< Yes. I was never suggesting REing a "monolithic bag o' bits" (TM). What I feel it needs is a fairly compute- and space-intensive phase when a new message (set) is added. Indexes of indexes and all that funky stuff seems to be the order of the day, as well as a cute little trick that I call (after the guy who explained it to me) "Julian" Indexes (more on this is available for the terminally curious, it's not super techie, but if you've never come across it, it blows your mind at first). I'm a great fan of sacrificing entry speed (and space) to increase retreival speed. I'd happily have a file that took hours to add a new entry and days to rebuild indexes, as long as the adding and reindexing was done in the background. It's a subject I need to read up more on, if & when I get the opportunity. You could probably use BSPs to make/control the "top level" indexes, as well, since we're talking about a fairly analog 2-dimensional domain, though my mind boggles at exactly how this could be done. Oh, and any of the O'Reilly books about Perl will make the learning process far smoother. I just got the O'Reilly on "Programming with Qt", (it's all in C++ but that's no great headache) so I might be able to make more headway with understanding Kura, as well... <Beavis>O'Reilly kicks ass!</Beavis> <Butthead>Huhuh, yeah, cool</Butthead> (The only other reason I suggested Perl is that any fule with a sound knowledge of grep, sed and awk can jump right in and boogie with minimal environmental adjustment, <G> making a collaborative effort much easier.)
>>>>>> > Just as soon as I can pick a linux and jump in with both feet, I'll start > experimenting, if you like... >
I'd advise you to stay clear of Suse - it tries to be clever, instead of defering to the sysadmin. When the next Slackware comes out, I'll be returning to it. <<<<<< Slackware == debian? I'd heard that debian had one of the most inhuman(e) installers in the linux universe, maybe I've misread something. Pb ************************************************************* This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This footnote also confirms that this email message has been scanned for the presence of computer viruses. *************************************************************