Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Programmers requested for dictionary

From:Boudewijn Rempt <bsarempt@...>
Date:Friday, October 27, 2000, 21:54
On Fri, 27 Oct 2000, Peter Clark wrote:

> - Entries would be able to display conjugations, mutations, > endings, etc. I will lapse into Russian here, since I haven't developed > Enamyn far enough. Let's say I remember that the genitive plural of > "djengi" (money) is irregular, but forgot what it was exactly. I should > be able to type in "money" and learn that it is "deneg."
I've just been reading up on this. It turns out that in a fairly large corpus of English (about 500.000.000 words), there are about 300.000 wordforms, i.e. inflected forms. I wonder whether that's about constant. For instance, Homeric Greek is highly inflecting, but given the size of the vocabulary (i.e. lexemes) _and_ the corpus (no need to account for unattested forms), I wonder whether there would be more than 300.000 (or, to be on the safe side 500.000) inflected forms. That of course means that's it easy to store every individual inflected form. It's a poor database that can't handle half a million simple records and relate them. Coincidentally, that was also the approad I've already taken with Kura ;-). Boudewijn Rempt | http://www.valdyas.org