Re: TECH: Re: Summary, web based mailinglist archives
From: | taliesin the storyteller <taliesin@...> |
Date: | Monday, October 25, 1999, 13:40 |
* Paul.Bennett@xncorp.com (Paul.Bennett@xncorp.com) [991025 14:29]:
> Yes. I was never suggesting REing a "monolithic bag o' bits" (TM). What
> I feel it needs is a fairly compute- and space-intensive phase when a
> new message (set) is added. Indexes of indexes and all that funky stuff
> seems to be the order of the day, as well as a cute little trick that I
> call (after the guy who explained it to me) "Julian" Indexes (more on
> this is available for the terminally curious, it's not super techie, but
> if you've never come across it, it blows your mind at first).
Is your 'Julian Indexes' indexing on words, the words pointing to the
documents that contain the words? This is called an 'inverted
filesystem' in IR, and is precisely -not- what would suit conlang-l,
I've already built a simple ir-system ('twas for class), and used one
month of conlang-mail (may 98?) as the main dataset. No go.
I could write on vector-search and clustering techniques and stemming
algorithms too if you like :)
<warning>I *am* a perfectionist</warning>
tal.
--
"Better living through conlanging"