Re: Iterative conlang design with corpus analysis, Or, Build one to throw away
From: | Jim Henry <jimhenry1973@...> |
Date: | Tuesday, April 18, 2006, 20:08 |
On 4/18/06, Larry Sulky <larrysulky@...> wrote:
> On 4/18/06, Jim Henry <jimhenry1973@...> wrote:
> ---SNIP---
> Jim, I didn't see anything in your post about "permutation density" (a
> term I made up because I don't know what it's really called). By this
> I'm referring to how many possible roots of a given form are actually
> used. For example, if you give definitions to 70% of the possible CV
> words (assuming those are possible words), then the permutation
> density there is 70%. Would you want to set some limits on permutation
> density? ---larry
Yes, I will; that's an effect of the scripts I've been writing
to generate phonologically redundant vocabulary. Just
to take a simple example, if you have CVC roots, with
five consonants and five vowels, my script would generate
25 redundant roots out of the space of 125; so 20%
permutation density, if you used all the 25 it generates.
The current draft phonology format generates
446 redundant C(C)V(C) roots out of a space of 5514,
so about 8% permutation density. This may probably
be reduced a bit further as I refine it to eliminate unwanted
consonant clusters. Or it may increase if I add
a couple more vowels (going from 6 to 8) to
compensate.
Out of the subset of those that are CV, of a maximum
48 that my self-segregation scheme would allow without
the redundancy criterion, the redundancy criterion
means I can use only 6. Probably three pronouns
and three other high-frequency particles, maybe
prepositions.
I probably won't use up all of those 400-500 monosyllabic roots by
the time I do the first corpus analysis and relex.
For one thing, I'll be using some disyllabics from the
start for terms I expect to be relatively rare but still
need a root for, like zoological genera, etc. But I may
have used up all the CVC subset (only 48).
--
Jim Henry
http://www.pobox.com/~jimhenry/conlang.htm