Re: Programmers requested for dictionary
From: | Boudewijn Rempt <bsarempt@...> |
Date: | Friday, October 27, 2000, 21:54 |
On Fri, 27 Oct 2000, Peter Clark wrote:
> - Entries would be able to display conjugations, mutations,
> endings, etc. I will lapse into Russian here, since I haven't developed
> Enamyn far enough. Let's say I remember that the genitive plural of
> "djengi" (money) is irregular, but forgot what it was exactly. I should
> be able to type in "money" and learn that it is "deneg."
I've just been reading up on this. It turns out that in a fairly
large corpus of English (about 500.000.000 words), there are about
300.000 wordforms, i.e. inflected forms. I wonder whether that's about
constant. For instance, Homeric Greek is highly inflecting, but given
the size of the vocabulary (i.e. lexemes) _and_ the corpus (no need to
account for unattested forms), I wonder whether there would be more than
300.000 (or, to be on the safe side 500.000) inflected forms.
That of course means that's it easy to store every individual inflected
form. It's a poor database that can't handle half a million simple
records and relate them.
Coincidentally, that was also the approad I've already taken with
Kura ;-).
Boudewijn Rempt | http://www.valdyas.org