Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Lexicon counting (was: Weekly Vocab #1.1.1...)

From:Iain E. Davis <feaelin@...>
Date:Tuesday, September 5, 2006, 0:31
I hope everyone forgives me for replying to both of you at once. ;)
----------------> Henrik Theiling wrote:
> I suppose everyone counts differently, so comparison is futile, but it's
fun anyway. Indeed. I look at the count for my list periodically, just to see how I'm doing. In recent times, it has been painfully slow because I've been lax about working on Taraitola.
> I'm so happy the lexicon grows and it would be frustrating to not count
it... *grin*. I understand completely.
> I never had a name-prominent lexicon, however, so I think it > was never very important to distinguish.
I was forced to clearly distinguish because of the way I generate my "dictionary" (see below)
> BTW, of course I do not count the inflected forms in the
...[snipped]...
> ten or so. It is always frustrating to enter a verb because of this --
verbs have
> just so many inflected forms!
I, too, only include the infinitive. None of the inflected forms of verbs are included. There are some things I did include that one could argue are inflected, such as subjective/objective pronouns, but they are included for ease of look up in the dictionary. ----------------------> Carsten Becker wrote:
> I'm counting my entries like this: I have a database that is > Ayeri -> English at first hand (it's reversible, but then you
What software are you using for your database? I use Excel as a flat file "database" and then use a macro to 'generate' a word document in dictionary style, if I desire. Which is rare, I prefer to use the spreadsheet for the advanced filtering, searching, sorting, etc.
> haven't got the pronunciation and whatnot for the English > words), so the main entry always an Ayeri word. The problem > is that my database does not accept sub-entries, so every
I have the same issue, although I don't believe Taraitola has any constructions like tapiao, so it is less of a concern.
> tapiao - to put; to set > tapiao dayrin - to save ("to put aside")
...[snipped] Hmm. Since each of those have a distinct meaning, I'd argue that in terms of counting, you should count them all. :)
> Where my German-English dictionary would list all those > entries just under "to put", my database makes a new record > out of all of these (unfortunately).
It probably would. But your English->German dictionary wouldn't, so you have to make some sacrifices somewhere. :). I have something of the same problem on the _other_ end. There are words that have distinctions that English doesn't make. So the 'english word' column/field can potentially have apparent duplicates. It doesn't matter too much to me, since the more important field is the 'definition' field. English word is merely for creating a 'index' of english-->Taraitola words (no meaning or adornment, just a pointer to the Taraitola word).
> As for names, I keep them in an extra list, so they are not counted.
Common Which is a reasonable separation. Arguably, when I generate the dictionary, mine _are_ separated...into Appendix C: Famous People and Places. In my data, though, the only real difference is that they're flagged as 'C' entries (for appendix C) instead of 'A' entries. We did similar things, just different approaches.
> expressions usually have their own entries as well. There are > not many expressions listed in the dictionary, though, just a
I have very few expressions and in fact, they are all out of date since I've never revisited them since I completely revised the phonology. They are stored completely separately, but they may pre-date the spreadsheet...
> handful. Futhermore, since Ayeri is an agglutinative > language, it has lots of suffixes -- these are also counted > as words, even the ones that only have a syntactical meaning.
We differ here...as I mentioned to Henrik, I don't list any suffixed forms. There are some exceptions where some affixes completely change the meaning, but for the most part, it is only the 'original' form. :)
> If you removed those from the list, you'd still have > something around 1300 words, maybe a little more or less than that.
Wow. Our discussion prompted me to add a 'statistics' worksheet, just to see what I had. I won't bore you with the full details, but a brief look: 916 Entries, 162 of which are "Names"/"Proper Nouns". I need to dig in and work my way through the swadesh list and all the weekly vocabs I haven't done yet...:) Feaelin

Replies

Henrik Theiling <theiling@...>
Edgard Bikelis <bikelis@...>