Re: Avoiding near-collisions in vocabulary coinage

From:	Veoler <veoler@...>
Date:	Tuesday, August 5, 2008, 10:05

|< < Post > >| << List/Tree >> Reference August 2008 Index

This is how I do it:

Assume that we have 24 consonants and 6 vowels, and that the roots have the
shape CVC, then there are 3456 possible roots. Only a third (or less) of
those will be implemented. So there are 1152 possible roots.

Then, I count it for all parts: every CV- have 24/3 = 8 allowed occurrences,
and every C-C have 2 allowed possibilities, every -V- occur 192 times. So,
if I have "kap" and "kep" then I won't accept another "kVp" in the lexicon.
This is to distribute both the redundancy and the semantic load evenly. So
I make sure that the number of possible roots are at least three times
bigger than the number of roots I expect to ever need (ca 5000, together
with a powerful derivational system, including some escape mechanisms).

Then I also have a thematic dictionary, making sure that I don't have
similar concepts to be similar, e.g. the three words for "I", "you" and
"he/she/it" shouldn't have ANY phoneme in common (in the same position),
every phoneme should differ at least two features. (So in Raikudu I have
čutai "I/me", loče "you" and peta "he/she/it". Well, I'm not too strict
about it, and less so in Raikudu than in my present project.)

When I developed Raikudu I had an Excel document where I had a column for
each substring of the roots, e.g. čutai, čtai, čuai, utai for one word, so
that I could easily see if there were any minimal pairs. I accepted a
minimal pair if a) the phoneme differed at least two features, and b) the
concepts aren't to similar.

--
Veoler

|< < Post > >| << List/Tree >> Reference August 2008 Index